Posts Tagged ‘mail server’

Hardware failure on mail server

Thursday, August 27th, 2020

Update 09:45: the mail server could be recovered. It'll take some time for all queued incoming mail to be delivered.

Early this morning, a RAID controller in the D-PHYS mail server died. We're trying to get replacement parts later today. In the meantime, there's no access to email.
We apologize for the inconvenience.

Mail server maintenance on Tue, March 27

Friday, March 23rd, 2018

Update 07:25 The migration is complete and our mail server is back online. Please let us know if you notice anything peculiar. This concludes our multi-step migration to the new mail server hardware

---

In order to finalize the upgrade of the D-PHYS mail server, we schedule a maintenance downtime on

Tuesday, March 27, between 06:30 and 08:00 in the morning

During that time it will not be possible to send or receive emails. In particular, incoming external emails will not be lost, but held on the sender’s side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored.

We will update this posting once the mail server is back online.

New location for mail filtering rules, forwarding and vacation auto-replies

After the migration, all mail-related settings will be consolidated into the Roundcube Webmail interface:

  • spam filtering rules (whitelist, blacklist)
  • forwarding of your emails to a different account
  • setting a vacation or out-of-office auto-reply message
  • defining rules to automatically file incoming mails into specific folders

This will make configuring your email settings easier and also give you more options than before (for example, the out-of-office auto-reply can now be configured to automatically terminate at the end of your absence).

Please refer to our readme for details on how to customize these settings in the future. Feel free to contact us if you have any questions.

The current settings of all active users have been converted and imported.

In technical terms we are migrating from procmail to sieve. In particular the hidden text file ~/.procmailrc in the user's home folder will be ignored after the migration.

Mail server maintenance on Wed, Jan 24

Friday, January 19th, 2018

Update 07:25 Migration finished, welcome on the new mail server!

We schedule a maintenance downtime for the D-PHYS mail server on

Wednesday, January 24, between 07:00 and 08:00 in the morning

During this period, sending and receiving new emails will have interruptions, thereby delaying incoming and outgoing mails. In particular, incoming external emails will not be lost, but held on the sender's side and will be delivered after the migration. Outgoing mail will be kept in your mail client until the connection is restored. The IMAP server will not be affected, so all email clients should have continuous access to the existing mailboxes.

This maintenance window will be used to migrate the first part of our mail server infrastructure to the latest version of the operating system and new hardware with fast SSD storage.

New location for SpamAssassin user preferences

We re-designed how our mail server is parsing the user's configuration for the spam filtering. Currently one has to edit the hidden text file ~/.spamassassin/user_prefs in the home folder. Starting from next Wednesday the spam filtering rules can be edited more conveniently through the settings in the Webmail interface. This will allow users to easily

  • accept mail from a given sender and never mark it as spam (whitelist)
  • reject mail from a given sender and always mark it as spam (blacklist)
  • set the threshold score required for any message to be considered as spam

The existing user preferences have been parsed and all of the above settings have been imported into the new setup. The contents of ~/.spamassassin/ will be ignored after the migration. Please contact us if you have questions regarding your advanced SpamAssassin rules.

Hardware Maintenance Downtime of Mail Server Today 5pm

Thursday, July 2nd, 2015

In order to perform system-level maintenance work, we schedule a maintenance downtime of the D-PHYS mail server today (Thursday, 2nd of July 2015) starting at 17:00. We expect the downtime to take about one hour.

During the downtime all mail services (sending mail, receiving mail, accessing mailboxes, webmail, etc.) will be unavailable. Mails sent to D-PHYS users during the downtime will be held back on the sending side and will be delivered after the downtime.

We will post an update as soon as mail services are back.

Update 19:30: Took a little bit longer than expected, but everything is back to normal now.

Some incoming mails lost between Jan 9, 6pm and Jan 13, 11am

Tuesday, January 14th, 2014

On Monday morning we found out that large incoming mails (1 MBytes or larger) were dropped without leaving any error messages in our log files. These mails were lost between Thursday (Jan 9) evening 18:27 and Monday (Jan 13) morning 11:06. Some indicators (i.e. spam filter rules for this case) lead us to estimate the number of about 560 broken local deliveries to about 300 unique recipients.

If you expected e-mails with attachments close to 1 MB or larger within this time frame there is a high likelihood that they got lost. The only information we still have about these mails are sender, recipient and arrival date and time. If you were one of these recipients, please contact the sender to send it again.

You can check on this web page if mails you should have received were lost. You'll have to log in with your D-PHYS account and will see sender (or mailing list) of and time when the lost mail arrived. Additionally we'll inform all affected recipients individually, too.

The problem occured after one of the software updates on Thursday which brought stricter code checking, and is solved since Monday morning 11:06.

The issue was caused by a long standing and subtle programming error in the check which prevents bigger mails from being inspected closely by the main spam filter for performance reasons. It was only triggered upon local mail delivery, so mails sent from D-PHYS to outside D-PHYS were not affected. E-mails to D-PHYS mailing lists (or other mailing lists) with archive should be available in the according mailing list archives.

We're truly sorry for any inconvenience this may have caused and have already taken measures so that similar issues won't result in mail loss from now on.

Update: it happens to the best of us: Gmail for iOS bug might cause data loss

Short Mail, License and Condor Server Downtime on Thursday Evening

Wednesday, May 15th, 2013

We will perform a maintenance reboot of several infrastructure servers on Thursday, 16th of May 2013, starting around 5pm. This will cause a downtime of all e-mail related services (receiving + sending mail, accessing webmail and non-cached mails) as well as the license server for IDL, the control server for Condor grid computing, and most hosted virtual machines.

We expect the downtimes to be short and sequential.

Update 17:50: Everything went fine, we're back to normal.

The Art of Scaling

Thursday, April 19th, 2012

Note: this is a purely anecdotal posting about our struggles with some performance bottlenecks in the last few months. If you're not interested in such background information, just skip.

You might have noticed that since about January 2012 using our file and mail servers hasn't been as smooth as usual. This posting will give you some background information concerning the challenges we encountered and why it took so long to fix them. Let's begin with the file server.

Way back in the days (i.e. 5 years ago), when the total file server data volume at D-PHYS was about 10 TB, we used individual file server to store this data. When one server was full, we got a bigger one, copied all the data and life was good for another year or two. Today, the file server data volume (home and group shares) is above 150 TB and growing fast and this strategy doesn't work any longer: individual servers don't scale and copying this amount of data alone takes weeks. That's why in 2009 we started migrating the 'many individual servers' setup to a SAN architecture in which the file servers are just huge hard drives (iSCSI over Infiniband, for the technically inclined) connected to a frontend server that manages space allocation and the file system. The same is true for the backup infrastructure, where the data volume is even bigger.

This new setup had to be developed, tested and put in place as seamlessly and unobtrusively as possible while ensuring data access at all times (apart from single hour-long migrations). The SAN architecture was implemented for Astro in December 2010 and has been running beautifully ever since. In 2011 we laid the groundwork to adopt this system for the rest of D-PHYS's home and group shares and after a long and thorough testing period the rollout happened on January 5, 2012. Unfortunately, that's when things got ugly.

At first, we noticed some exotic file access problems on 32bit workstations. It took us some time to understand that the underlying issue was an incompatibility with the new filesystem using 64-bit addresses for the data blocks. As a consequence we had to replace the filesystem of the home shares. Independently we ran into serious I/O issues with the installed operating system, so we had to upgrade the kernel of the frontend server and move the home directories onto a dedicated server. In parallel, we had to incorporate some huge chunks of group data while always making sure that nightly backups were available. All this necessitated a few more migrations until we finally achieved a stable system on March 28.

The upshot: what we had hoped to be a fast and easy migration turned out to cause a lot of problems and take much longer than anticipated, but now we have a stable and solid setup that will scale up to hundreds or even thousands of TB of data.
See live volume management and usage graphs for our file servers.

As for the mail server, matters are to some extent related and partly just coincidental in time. The IMAP server does need access to the home directories and hence also suffered when their performance was impaired. But even after having solved the file server issues, we still saw single load peaks on the IMAP server that prevented our users from working with their email. Again, we put a lot of time and effort into finding the reason. As of April 13, we're back to good performance and arrive at the following set of conclusions:

Particular issues:

  • a covertly faulty harddisk in the mail server RAID seems to have impaired performance
  • CPU load of the individual virtual machines on the mail server was not distributed across the available CPU cores in an optimal way

General mail server load:

  • while incoming mail volume doesn't increase much, outgoing mails have grown 50% in the last year alone
  • more and more sophisticated spam requires more thorough virus and spam scanning, increasing the load on the mail server
  • our users have amassed 1.1 TB of mail storage (up from 400 GB in January 2010), which need to be accessed and organized

Bottom line:

We'd like to thank you for your patience during the last 4 months and apologize for any inconvenience you might have had to endure. In all likelihood the systems will be a lot more stable in the future, but of course we're constantly working to ensure the D-PHYS IT infrastructure is able to keep up with the fast growing demand of disk space (the data volume has tripled in the last year alone). We've learned a lot and we'll put it to good use.