Update 07:25 Migration finished, welcome on the new mail server!
We schedule a maintenance downtime for the D-PHYS mail server on
Wednesday, January 24, between 07:00 and 08:00 in the morning
During this period, sending and receiving new emails will have interruptions, thereby delaying incoming and outgoing mails. In particular, incoming external emails will not be lost, but held on the sender’s side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored. The IMAP server will not be affected, so all email clients should have continuous access to the existing mailboxes.
This maintenance window will be used to migrate the first part of our mail server infrastructure to the latest version of the operating system and new hardware with fast SSD storage.
New location for SpamAssassin user preferences
We re-designed how our mail server is parsing the user’s configuration for the spam filtering. Currently one has to edit the hidden text file
~/.spamassassin/user_prefs in the home folder. Starting from next Wednesday the spam filtering rules can be edited more conveniently through the settings in the Webmail interface. This will allow users to easily
- accept mail from a given sender and never mark it as spam (whitelist)
- reject mail from a given sender and always mark it as spam (blacklist)
- set the threshold score required for any message to be considered as spam
The existing user preferences have been parsed and all of the above settings have been imported into the new setup. The contents of
~/.spamassassin/ will be ignored after the migration. Please contact us if you have questions regarding your advanced SpamAssassin rules.
This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2017:
- Account expiry: in early 2017 we finished assessing all ~7600 D-PHYS accounts and blocked the expired ones. We also tied all D-PHYS accounts to their nethz counterparts wherever possible. This allows us to make use of ETH’s employment information from now on. While we were at it:
- New LDAP servers: Since implementing account expiration meant touching most aspects of our identity management infrastructure anyway, we decided to completely overhaul our LDAP user database. We reworked the LDAP schema (the original one dating back to the early 90s) and set up a 3-way replicating OpenLDAP cluster.
- Windows Server Cluster: Several mission critical Windows Server instances have been moved to a newly created Windows Cluster. This complements last year’s Linux cluster.
- Storage: in 2017 the disk space occupied by data and backup grew from 1.3 PiB to 1.6 PiB, making this a very slow year as far as storage growth is concerned.
- Server room migration: in August we had to move most of D-PHYS’s servers three rack rows down in the HIT D 13 server room. We now have a solid foundation for our servers for the next years.
- Outages: apart from the above-mentioned migration, some short-term network interruptions and the unfortunate file server issues of late our systems have been very stable in 2017.
- Web server upgrade: in January we upgraded the operating system on the D-PHYS web server. We also used the occasion to clean up a lot of legacy cruft.
- OS upgrades: 2017 brought new OS versions for almost every system: the Windows 10 rollout picked up steam, High Sierra arrived on the Macs and Ubuntu 16.04 on the remaining Linux workstations.
- eXile: we migrated the configuration management from Puppet to Ansible and then re-installed all eXile gateways in a fully automated way with the latest Debian release.
- UCC: we laid the technical groundwork and performed implementation tests for the upcoming UCC rollout which will replace the existing ETH telephony system with an all-IP based solution.
- IT security: we participate in and support the ETH-wide IT security initiative.
I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.
Happy Holidays and see you in 2018!
Update 20.12.: the strange intermittent permission problems some of you experienced could be traced back to a kernel regression. We’re now back to using an older kernel.
Update 13.12.: we’re cautiously optimistic that the problems have been fixed. Since Monday the file server has survived everything we threw at it. The culprit seems to be an Infiniband switch that sporadically disconnected under heavy load. We’re now also turning on some performance improvements again, so you should see a speed increase when browsing files.
Update 06:45: group shares are back. Please let us know if you encounter any problems.
As some of you might have noticed, we’ve had some service quality issues with our group share server in the last few months. While not all interruptions are under our control (Informatikdienste lately have been very busy upgrading the ETH network, causing various network disruptions), we do have a problem with the group share server: it runs fine for weeks on end until it suddenly doesn’t. To this day we have not been able to pinpoint the underlying problem, despite having changed a lot of parameters, both software and hardware. Our next step will be replacing the kernel on the disk backends and switch some hardware – for that we need a scheduled downtime on
Monday, December 11, starting at 06:00
during which the group shares will be unavailable for about 90 minutes. This affects all D-PHYS and IGP shares except the Astro and newly migrated IPA ones. We will post an update when the system is back.
We do apologize for the inconvenience these service issues might have caused you. Please bear with us while we’re trying to locate and eliminate the root cause. We’re monitoring the situation 24/7 and try to react as quickly as possible whenever a problem occurs. But wait! You can help! There seems to be a correlation between crash probability and large scale small file I/O. This means you should, whenever possible, avoid reading or writing a lot of small files and bundle your data into fewer and larger files. This also increases performance!
executive summary: you only need to read this if you run a service or tool that uses our LDAP server
A surprisingly large number of people at D-PHYS run services or use tools that connect to our LDAP server to obtain user information. If you are among those, this post is meant to inform you that our LDAP infrastructure is about to change and you need to take action in order to keep your service up and running. You can read about the details and technical background here. The situation right now is:
- The new servers are running and sync with the current master.
- We have started migrating services from the old server to the new ones.
- The old server will be turned off in 2018.
- You can now start to migrate your service / tool to the new LDAP infrastructure.
- In early 2018 we will start searching for clients that still use the old server and address them individually.
So if you’re affected, please change your LDAP connection according to the documentation or get in touch if you have any questions.
ISG sits on a pile of old hardware that for various reasons cannot be used in our setup any more. Various people have expressed interest in and that still might be useful for certain scenarios (e.g. lab use or tinkering at home). We will therefore host a grab-your-used-piece-of-hardware session with mostly TFT monitors (15″ – 19″), a few Mac Pros (2010) and printers, free of charge for ETH-internal use, prices for private use according to the rules: Wed Oct 18 in HPT H floor, between 11:00 – 13:00.
As usual, some rules apply:
- this goes to all D-PHYS members
- no registration necessary. Just come by and take whatever is left.
- all items come as they are. We do not have any details or specifications
- there’s no warranty or service whatsoever. All devices have successfully been turned on, but that’s it
- if your item doesn’t turn on, you can bring it back within 5 days and get a full refund (if it wasn’t free in the first place)
- no OS, no software, no manual, no keyboard, often no cables. You get one piece of hardware. All HDs are blank
- all proceeds go to the D-PHYS funds, not ISG
- bring cash
Update Thursday 01:45: we hit some unexpected problems with the non-Astro group shares. Everything is back now, please let us know if you expericence any problems..
Some months ago, we were informed by Informatikdienste that we would have to migrate our two water cooled racks in the HIT server room due to upcoming remodeling. This move will take place on
Wednesday, August 23, starting at 16:00
and last for several hours. During this time, all our IT services will be unavailable, including login, e-mail, storage and ISG-hosted websites. Incoming e-mail will be kept back and delivered afterwards. We will give our best to have login and e-mail back up within the first two hours, but group drives will take a bit longer due to the sheer amount of hardware we have to move.
We apologize for any inconvenience. Unfortunately, this migration cannot be performed on a weekend as we might have to interact with our colleagues at Informatikdienste, but it will ensure secure and enduring operation of our servers in the future.
some impressions from the migration – thanks to the whole team!
As announced previously, about a year ago ISG was tasked by the department board to devise a workflow to expire D-PHYS accounts which so far had a life expectancy of ∞. In summer we started blocking accounts that were virtually unused, which almost by definition went very smoothly. Now we will start addressing accounts of users who are still using our services but do no longer have an affiliation with D-PHYS. They will receive an email informing them of a 1 month grace period before the account gets blocked. This posting is meant to serve as a reminder to everybody that this process is underway and questions may arise.
The project is explained in more detail in our readme.
On Thursday, January 19, starting at 08:00, we will OS upgrade the main D-PHYS web server. All websites hosted on zwoelfi*.ethz.ch will be down for several hours and will gradually come back as we progress. This does not affect the department website, the institutes and many of the group websites. However our groupware, the wikis and many special interest sites will be inaccessible. Note that if you’re using the ActiveSync connector via groupware to sync email to your cell phone or Outlook, this won’t work either. Temporarily use webmail while we bring back groupware as one of the first services.
Update 17:30 – due to an inordinate amount of user files on the web server the upgrade took a bit longer than anticipated. Now almost all websites should be back online, please let us know if you encounter any problems.
In the last few weeks we discovered some attempted attacks on the Windows Remote Desktop feature from sources outside of ETH.
In order to protect both your machines and our network, we decided to block RDP access from ETH-external networks. If you still need access from outside the ETH network (e.g. from home) you have to first open a VPN connection to ETH and then start the Remote Desktop client.
More information about installing the VPN client is available here.
This post is meant to give you a short overview of what has been accomplished in D-PHYS IT by ISG this year. We’ve been hard at work to further improve and extend our services for you, our customers. Some highlights of 2016:
- New team member: Sven Mäder joined ISG this year to replace Axel in our Linux server team.
- Account expiry: you might have heard that D-PHYS decided to phase out old accounts in the future. We spent the last year laying the technical groundwork for a smooth and painless implementation of this policy change. One first visible result is our new account portal.
- Printing: in summer we integrated student printing into the pia printing system which means that we now have a comprehensive printing solution for the whole department. The D-PHYS print server will be shut down in early 2017.
- Storage: in 2016 the disk space occupied by data and backup grew from 929 TiB to 1.3 PiB, again increasing the yearly growth rate. We are now using 60-disk toploader chassis to maximize storage space-per-volume.
- Outages: we scheduled two maintenance windows, on April 14 and September 5, in order to perform hardware and system upgrades. Together with a network upgrade by Informatikdienste on September 15, these were the only noteworthy downtimes in 2016.
- Docking network: in fall 2016 we migrated most of the department’s network sockets to the 802.1x-enabled docking network. While there is little immediate benefit for most of us, this is a prerequisite for future network projects like the upcoming Unified Collaboration & Communication (UCC) project.
- Wifi: in early 2016 we developed and installed a portable wifi probe that eventually led to the discovery of one of the underlying problems causing ETH’s wifi woes. Since then, wifi has been much more stable.
- OS upgrades: 2016 brought new OS versions for almost every system: the Windows 10 rollout picked up steam, Sierra arrived on the Macs and Ubuntu 16.04 on the Linux workstations.
- Cluster: we built and deployed a new high-availability cluster setup for our virtual servers this year.
- Core services: a lot of infrastructure work has happened in the background to ensure smooth operation and seamless growth of our services in the future. Examples are: new ActiveDirectory servers for our Windows users, migrating our webserver certificates to Let’s Encrypt, a facelift for most of our websites to match the AEM design and an upgrade of our iPXE boot screen.
- IT security: we participate in and support the ETH-wide IT security initiative and also worked hard to make the mandated n.ethz password change as humane as possible.
I would like to take this opportunity to thank my whole team for their hard and dedicated work all year long.
Happy Holidays and see you in 2017!