Update 21:00 – IGP shares are back. Welcome to igp-data!
Update 19:30 – the D-PHYS shares are back. IGP will take a little more time.
In order to guarantee sustained performance and availability of our storage system, we need to schedule a few storage maintenance windows. The first one will take place on Wednesday, 12.12.2018 at 16:00 and affect all D-PHYS and IGP group shares, but not IPA or galaxy (technically: windata/macdata, but not astrogate or ipa-data). The relevant shares will be offline for at least 3 hours.
For emergency cases, there will be read-only access to last night’s backup as described here.
Please note that these migrations will bring some overall changes to the D-PHYS storage setup:
- the SMBv1 protocol will be disabled on all file servers. It has a long history of security issues and we’ve migrated all clients to newer versions, so this should not affect anyone. However, there’s a small chance that we didn’t catch all connections, so please contact us if you experience any issues after the migration.
- all SMB protocol versions will be restricted to ETH-internal access. This step has been long overdue and since most ISPs block the necessary ports anyway, it shouldn’t affect too many users. What it means however: in the future, file server access from outside ETH requires VPN.
- IGP/D-BAUG will get their own front-end server igp-data. If you’re with IGP and have already switched your file server mounts from windata to igp-data, you’re good and don’t have to do anything. If you haven’t, you should do so before Dec 12 in order to get a seamless migration experience.
We’ll update this post as the migration progresses and as soon as the systems are back.
On Tuesday, October 2, starting at 07:00, we will migrate our groupware instance to another server. For about 1 hour you won’t have access to your calendar. If you’re one of the few people who also sync their email via groupware, mail will be offline too (you can always use webmail). After the migration your clients should just reconnect and resume syncing. If you notice any issues after we’re done, please get in touch.
Update Wed 07:45: migration completed, please let us know if you experience any problems.
After a long (11 years) phase of stability in the D-PHYS network, we are preparing a pretty extensive network reorganization for 2018. This is mainly driven by ever-increasing information security requirements mandated by ETH. The D-PHYS network has traditionally been very open and we will try to keep it that way, but we need to implement some modifications. The basic premise is to partition our current /21 network (2048 IP addressess) into smaller groups that better represent the types of machines in those networks. This will then allow us to tailor each group’s firewall rules to the services needed by those machines. The roadmap looks like this:
- Rearrange hosts in current /21 net to align with future VLAN boundaries
- Partition the /21 net into smaller VLANs
- Migrate individual subnets from our DHCP server to that of ID. This will also allow us to assign IPv6 addresses
- Migrate the subnets into different virtual private zones (VPZ)
- Assign and fine tune firewall settings on the different VPZ
As usual, we’ll try to implement these steps as smoothly as possible. However, a migration on this scale will not go entirely without issues. Step 1 will entail an IP address change for quite a number of hosts. We’ll make sure that our dyndns host names (foobar.dhcp.phys.ethz.ch) will be in sync with the new addresses, but this only works for properly configured DHCP hosts. Here’s how you can help: if you have any hosts in the 220.127.116.11/21 D-PHYS network that are statically configured (non-DHCP), please get in touch with us ASAP. The same is true if you’re using hard-coded IP addresses from that range instead of host names. We’ll need to deal with those hosts individually.
In the course of 2018 we’ll keep you updated on project progress and announce specific dates when we implement changes.
Update: since Informatikdienste are currently drafting an even more comprehensive Hönggerberg network reorganization that will deeply impact our plans as well, this project is currently on hold until we know more. Stay tuned.
Owners of our group shares so far always had to contact us in order to have members added or removed to/from the underlying LDAP group. One of the benefits of the recent LDAP migration is that we can now offer a web interface for LDAP group member management.
If you’re the owner of a group share and would like to be able to perform user management yourself, please get in touch with me. You can also use this interface to edit your group report settings.
Update 07:25 The migration is complete and our mail server is back online. Please let us know if you notice anything peculiar. This concludes our multi-step migration to the new mail server hardware
In order to finalize the upgrade of the D-PHYS mail server, we schedule a maintenance downtime on
Tuesday, March 27, between 06:30 and 08:00 in the morning
During that time it will not be possible to send or receive emails. In particular, incoming external emails will not be lost, but held on the sender’s side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored.
We will update this posting once the mail server is back online.
New location for mail filtering rules, forwarding and vacation auto-replies
After the migration, all mail-related settings will be consolidated into the Roundcube Webmail interface:
- spam filtering rules (whitelist, blacklist)
- forwarding of your emails to a different account
- setting a vacation or out-of-office auto-reply message
- defining rules to automatically file incoming mails into specific folders
This will make configuring your email settings easier and also give you more options than before (for example, the out-of-office auto-reply can now be configured to automatically terminate at the end of your absence).
Please refer to our readme for details on how to customize these settings in the future. Feel free to contact us if you have any questions.
The current settings of all active users have been converted and imported.
In technical terms we are migrating from
procmail to sieve. In particular the hidden text file
~/.procmailrc in the user’s home folder will be ignored after the migration.
As already described in this past posting, we have recreated our LDAP server infrastructure and will now retire the old server. For the last 4 weeks we’ve been sniffing for LDAP queries that still use the old server and we’ve addressed each of those requests individually. Since we can’t guarantee to detect each and every single network packet, now is your last chance to migrate to the new servers in case you haven’t done so already. The old server will go offline on
Friday, March 16
Please let us know if you have any questions.
Update 07:25 Migration finished, welcome on the new mail server!
We schedule a maintenance downtime for the D-PHYS mail server on
Wednesday, January 24, between 07:00 and 08:00 in the morning
During this period, sending and receiving new emails will have interruptions, thereby delaying incoming and outgoing mails. In particular, incoming external emails will not be lost, but held on the sender’s side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored. The IMAP server will not be affected, so all email clients should have continuous access to the existing mailboxes.
This maintenance window will be used to migrate the first part of our mail server infrastructure to the latest version of the operating system and new hardware with fast SSD storage.
New location for SpamAssassin user preferences
We re-designed how our mail server is parsing the user’s configuration for the spam filtering. Currently one has to edit the hidden text file
~/.spamassassin/user_prefs in the home folder. Starting from next Wednesday the spam filtering rules can be edited more conveniently through the settings in the Webmail interface. This will allow users to easily
- accept mail from a given sender and never mark it as spam (whitelist)
- reject mail from a given sender and always mark it as spam (blacklist)
- set the threshold score required for any message to be considered as spam
The existing user preferences have been parsed and all of the above settings have been imported into the new setup. The contents of
~/.spamassassin/ will be ignored after the migration. Please contact us if you have questions regarding your advanced SpamAssassin rules.
This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2017:
- Account expiry: in early 2017 we finished assessing all ~7600 D-PHYS accounts and blocked the expired ones. We also tied all D-PHYS accounts to their nethz counterparts wherever possible. This allows us to make use of ETH’s employment information from now on. While we were at it:
- New LDAP servers: Since implementing account expiration meant touching most aspects of our identity management infrastructure anyway, we decided to completely overhaul our LDAP user database. We reworked the LDAP schema (the original one dating back to the early 90s) and set up a 3-way replicating OpenLDAP cluster.
- Windows Server Cluster: Several mission critical Windows Server instances have been moved to a newly created Windows Cluster. This complements last year’s Linux cluster.
- Storage: in 2017 the disk space occupied by data and backup grew from 1.3 PiB to 1.6 PiB, making this a very slow year as far as storage growth is concerned.
- Server room migration: in August we had to move most of D-PHYS’s servers three rack rows down in the HIT D 13 server room. We now have a solid foundation for our servers for the next years.
- Outages: apart from the above-mentioned migration, some short-term network interruptions and the unfortunate file server issues of late our systems have been very stable in 2017.
- Web server upgrade: in January we upgraded the operating system on the D-PHYS web server. We also used the occasion to clean up a lot of legacy cruft.
- OS upgrades: 2017 brought new OS versions for almost every system: the Windows 10 rollout picked up steam, High Sierra arrived on the Macs and Ubuntu 16.04 on the remaining Linux workstations.
- eXile: we migrated the configuration management from Puppet to Ansible and then re-installed all eXile gateways in a fully automated way with the latest Debian release.
- UCC: we laid the technical groundwork and performed implementation tests for the upcoming UCC rollout which will replace the existing ETH telephony system with an all-IP based solution.
- IT security: we participate in and support the ETH-wide IT security initiative.
I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.
Happy Holidays and see you in 2018!
Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We’re now back to using an older kernel.
Update 13.12.: we’re cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We’re now also turning on some performance improvements again, so you should see a speed increase when browsing files.
Update 06:45: group shares are back. Please let us know if you encounter any problems.
As some of you might have noticed, we’ve had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn’t. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware – for that we need a scheduled downtime on
Monday, December 11, starting at 06:00
during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.
We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we’re trying to locate and eliminate the root cause. We’re monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!
executive summary: you only need to read this if you run a service or tool that uses our LDAP server
A surprisingly large number of people at D-PHYS run services or use tools that connect to our LDAP server to obtain user information. If you are among those, this post is meant to inform you that our LDAP infrastructure is about to change and you need to take action in order to keep your service up and running. You can read about the details and technical background here. The situation right now is:
- The new servers are running and sync with the current master.
- We have started migrating services from the old server to the new ones.
- The old server will be turned off in 2018.
- You can now start to migrate your service / tool to the new LDAP infrastructure.
- In early 2018 we will start searching for clients that still use the old server and address them individually.
So if you’re affected, please change your LDAP connection according to the documentation or get in touch if you have any questions.