The Informatikdienste are upgrading the routers in our HIT/D/13 server room causing a downtime of the network of about 1 hour on Thursday morning, 15th September, between 05:30 and 07:30. Please note that various services will not be available during that time.
Posts Tagged ‘maintenance’
In order to perform some core service upgrades, we schedule a server maintenance window on
Monday, September 5, starting at 17:00 and lasting for approximately 3 hours.
Most D-PHYS IT services will be affected by that downtime, including logins, file servers and e-mail services.
E-mails coming in during the downtime will be held on the sender’s side and will arrive at D-PHYS with a delay. Sending e-mails won’t be possible during the window.
We’ll update this posting as soon as things are back to normal.
Update Monday 18:30 We managed to complete the migration ahead of time, everything should be back to normal. If you still encounter any problems, please let us know.
On Thursday, July 21, starting at 0700 we will upgrade our groupware solution. The update is expected to take approximately one hour. As this upgrade will be a major one (version 14.3 to 16.1), you might notice a few changes. In particular:
- if you didn’t log out before we start the upgrade, your browser might be in an inconsistent state and the new version might refuse to work. Just log out manually in this case.
- if you’re using the ActiveSync/eSync/Exchange protocol for calendar synchronization, you will have to recreate your sync account on your (mobile) device. Just delete your existing account on the device (your groupware data will still be on our server) and add it again (see documentation). We’re sorry for the inconvenience, but the synchronization code was rewritten to be faster and more stable and is not compatible to existing accounts. We have tried to identify all current ActiveSync users and will inform them via email.
- the web interface will get a new look (see screenshot)
We have thoroughly tested the new version internally. If you find any problems after the upgrade, please get in touch. Thanks.
Update Thursday 08:00: Upgrade complete. Please let us know if you find any issues in the new version.
In order to perform system-level maintenance work, we schedule a maintenance downtime of the D-PHYS mail server today (Thursday, 2nd of July 2015) starting at 17:00. We expect the downtime to take about one hour.
During the downtime all mail services (sending mail, receiving mail, accessing mailboxes, webmail, etc.) will be unavailable. Mails sent to D-PHYS users during the downtime will be held back on the sending side and will be delivered after the downtime.
We will post an update as soon as mail services are back.
Update 19:30: Took a little bit longer than expected, but everything is back to normal now.
UPDATE 23:00 – maintenance finished, queued mails have been delivered.
As a probable aftermath of last week’s power outage we are experiencing some issues with the file system on our home directory server which can only be repaired offline. We therefore schedule a maintenance window
Today, Monday Feb 2, 2015, starting at 22:00
The duration of the downtime cannot be estimated but should not exceed two hours. During this time you will not be able to access your home folder or receive new D-PHYS email. All incoming mail will be queued for later processing.
Thank you for your understanding.
We have scheduled a software maintenance of the D-PHYS mail server for tomorrow, Wednesday, the 18th of June 2014, starting in the late afternoon around 5pm. A downtime of all D-PHYS mail services during the evening will be part of the maintenance. The downtime is expected to take approximately 15 to 30 minutes.
During the downtime sending and receiving e-mails will not be possible and the web mail service will be not available. Incoming mails during the downtime will be delayed.
Additionally there will be a downtime of our “BackupPC” backup service for laptops and lab PCs due to server relocation on Thursday (19th of June 2014) starting around 9am.
On Thursday, the 9th of January 2014, starting in the late afternoon, we will run multiple software updates on the D-PHYS mail server. We do expect multiple downtimes throughout the evening, partially of single mail services, partially of the whole mail server.
This will likely also delay the delivery of incoming mails up to several hours.
Update, 22:30: Everything back to normal.
UPDATE Thu 12.09. 07:30 If you’re trying to connect to a SMB share from an unmanaged Windows machine, you have to use “ad\USERNAME” instead of just “USERNAME” from now on.
UPDATE 21:15 apart from the IGP group shares (which will be back in a few hours) all systems are back to normal. Please let us know if you experience any problems.
In order to upgrade the operating system on several core infrastructure servers of the Department, we schedule a general maintenance downtime on
Wednesday September 11, starting at 17:00, lasting for several hours.
Most services will be affected and unavailable during that time, as they require an authentication with your D-PHYS account (email, file server, print server, managed workstations). Note that, even though you will not be able to check your emails or send new ones, all incoming mails will be received and safely delivered to your inbox afterwards.
Please make sure to save all open documents before 17:00 on that day.
Since we will also change the way file server mounts are authenticated, users who haven’t updated their passwords in a very long time might not be able to mount their home directories or group shares after the migration. If you run into this problem on Thursday morning, please first change your password. If the issue persists, contact us.
We will post an update when things are back to normal.
Tuesday, January 29, starting at 07:15 am
there will be a short downtime of our LDAP server since we have to perform some maintenance work. You will not be able to log in or use our file services during this downtime. The expected duration is < 15 min however, so most of you won’t even notice.
Note: this is a purely anecdotal posting about our struggles with some performance bottlenecks in the last few months. If you’re not interested in such background information, just skip.
You might have noticed that since about January 2012 using our file and mail servers hasn’t been as smooth as usual. This posting will give you some background information concerning the challenges we encountered and why it took so long to fix them. Let’s begin with the file server.
Way back in the days (i.e. 5 years ago), when the total file server data volume at D-PHYS was about 10 TB, we used individual file server to store this data. When one server was full, we got a bigger one, copied all the data and life was good for another year or two. Today, the file server data volume (home and group shares) is above 150 TB and growing fast and this strategy doesn’t work any longer: individual servers don’t scale and copying this amount of data alone takes weeks. That’s why in 2009 we started migrating the ‘many individual servers’ setup to a SAN architecture in which the file servers are just huge hard drives (iSCSI over Infiniband, for the technically inclined) connected to a frontend server that manages space allocation and the file system. The same is true for the backup infrastructure, where the data volume is even bigger.
This new setup had to be developed, tested and put in place as seamlessly and unobtrusively as possible while ensuring data access at all times (apart from single hour-long migrations). The SAN architecture was implemented for Astro in December 2010 and has been running beautifully ever since. In 2011 we laid the groundwork to adopt this system for the rest of D-PHYS’s home and group shares and after a long and thorough testing period the rollout happened on January 5, 2012. Unfortunately, that’s when things got ugly.
At first, we noticed some exotic file access problems on 32bit workstations. It took us some time to understand that the underlying issue was an incompatibility with the new filesystem using 64-bit addresses for the data blocks. As a consequence we had to replace the filesystem of the home shares. Independently we ran into serious I/O issues with the installed operating system, so we had to upgrade the kernel of the frontend server and move the home directories onto a dedicated server. In parallel, we had to incorporate some huge chunks of group data while always making sure that nightly backups were available. All this necessitated a few more migrations until we finally achieved a stable system on March 28.
The upshot: what we had hoped to be a fast and easy migration turned out to cause a lot of problems and take much longer than anticipated, but now we have a stable and solid setup that will scale up to hundreds or even thousands of TB of data.
See live volume management and usage graphs for our file servers.
As for the mail server, matters are to some extent related and partly just coincidental in time. The IMAP server does need access to the home directories and hence also suffered when their performance was impaired. But even after having solved the file server issues, we still saw single load peaks on the IMAP server that prevented our users from working with their email. Again, we put a lot of time and effort into finding the reason. As of April 13, we’re back to good performance and arrive at the following set of conclusions:
- a covertly faulty harddisk in the mail server RAID seems to have impaired performance
- CPU load of the individual virtual machines on the mail server was not distributed across the available CPU cores in an optimal way
General mail server load:
- while incoming mail volume doesn’t increase much, outgoing mails have grown 50% in the last year alone
- more and more sophisticated spam requires more thorough virus and spam scanning, increasing the load on the mail server
- our users have amassed 1.1 TB of mail storage (up from 400 GB in January 2010), which need to be accessed and organized
We’d like to thank you for your patience during the last 4 months and apologize for any inconvenience you might have had to endure. In all likelihood the systems will be a lot more stable in the future, but of course we’re constantly working to ensure the D-PHYS IT infrastructure is able to keep up with the fast growing demand of disk space (the data volume has tripled in the last year alone). We’ve learned a lot and we’ll put it to good use.