A brief analysis of the ongoing anuhosting.net outage

While we wait and watch the painfully slow process of restoring our anuhosting.net shared hosting and reseller server, let me write a brief analysis of what happened and what’s going on.

Last year, we spent almost £20,000 updating our server and network hardware in our Amsterdam 1 datacenter, to accommodate continued growth in shared hosting and reseller hosting services. We put in 10Gbps Ethernet, new 16-core/256GB RAM servers and pure SSD RAID storage arrays that are over 10 times faster than spinning disks.

We planned (and still plan on) continuing to invest in our physical infrastructure this year, including adding high speed, replicated, redundant network attached storage arrays and new backup servers.

The server that failed this morning was just 8 months old. We can only assume at this point we ended up with 2 SSDs from a bad batch, and they ended up failing at almost the exact same time, leaving no time for the RAID to rebuild.

The storage for our anuhosting.net server was on the RAID array that failed, and was backed up daily to a separate backup server. Those incremental backups run over 1Gbps Ethernet, and run at night in the background.

The problem we are facing today is that the sheer volume of data needed to restore the server is taking many many hours to copy from the backup server’s spinning SATA disks, over 1Gbps Ethernet. Our (woefully inadequate) backup plan for a disaster like this was to take a fresh copy of all the data, put it on a new server and fire it up. We do this sort of thing regularly for routine jobs like cloning servers or testing, but what we failed to account for in this case is how long it would take to copy all the data.

Plan B was that instead of copying all the data, we’d spin up a new virtual machine and connect it directly to the data stored on the backup server. The theory was at least we could get sites back online, even if they ran somewhat slowly. Around mid day today, we made the call to implement plan B. Again though, we failed to account for the I/O bandwidth required to run anuhosting.net, and almost as soon as we booted it up the server crashed due to insufficient I/O.

So we came up with plan C: continue copying the data from the backup server in the background, and start minimal services on anuhosting.net so we could at least restore some functionality.

That’s where we’re currently at: DNS and email are running, while the restore process continues in the background (albeit at a slower pace, due to increased I/O load).

Once the restore process completes, we will temporarily shut down DNS and email services again while we synchronise the latest changed data to the new server, and boot up from local storage. At this point we will be able to start up the Apache/PHP/MySQL servers again, as well as DNS and email.

We don’t know exactly how long this will take, at a guess 4 hours.

We know where we went wrong, we know what needs to be done to fix our infrastructure going forward. All we can ask is for continued patience and understanding from our customers while we keep working to restore service. Be assured we are working as fast as we possibly can.

Scheduled replacement of SpamTitan server

As part of our ongoing commitment to replace end of life hardware with ever faster and more reliable equipment, the time has come to replace our SpamTitan server.

After processing over 135.5 million incoming emails, filtering out 80.4% and passing 19.6% as clean, it’s time to retire our SpamTitan hardware appliance and replace it with a newer, faster virtualised SpamTitan.

The upgrade is scheduled for 2016-02-19 at 17:00 GMT.

We expect the service interruption to incoming email processing to last no more than 10 minutes, during which time all existing relay settings, domain and user profiles will be transferred to the new server. However existing quarantined mail will not be transferred, so if you require access to previously quarantined mail after the 19th at 17:00, you’ll need to log in to https://oldspamtitan.anu.net/ instead of https://spamtitan.anu.net/

If you have any questions please do not hesitate to contact us.

Happy holidays

We would like to take this opportunity to wish all our customers a Merry Christmas (if Christmas is something you celebrate) and Happy New Year (for users of the Gregorian calendar).

We’ll be manning our support desk as usual over the holiday period, and keeping an extra watchful eye on our monitoring systems to make sure operations continue to run smoothly.