[Resolved] ams2-cloudmin.anu.net crash

The Cloudmin VM management system for our Amsterdam 2 datacenter crashed last night (11th October). The crash was due to a bug in the VM status collection system which caused it to use excessive resources and eventually run out of memory altogether.

We have resolved this issue by restarting the Cloudmin server. There is also an unofficial patch for this bug which we have now applied, pending the next maintenance release of Cloudmin which will fix this known issue.

Services affected: DNS resolution for ams2-cloudmin.anu.net zone, Web GUI VM management for VMs in Amsterdam 2 datacenter, API/customer portal management of VMs in Amsterdam 2 datacenter.

[Resolved] Partial outage on Xen cloud in Amsterdam 1 datacenter

A power distribution unit failure at our Amsterdam 1 datacenter has this morning taken out about 1/4 of our Xen hosts located in Amsterdam 1. Engineers are en route to switch it out.

In the mean time we have rebooted all affected virtual servers using spare capacity on the remaining Xen hosts. No loss of data has occurred as our redundant centralised storage servers have not been affected.

Our customer portal, shared.anu.net Lasso/PHP hosting server and a handful of customer VMs were briefly affected by the outage but have all now been restored.

SpamTitan TLS negotiation issue

It came to our attention this morning that email from Gmail accounts was not getting through to our Hosted Email customers. After running every test we could think of we discovered Gmail’s servers were failing TLS negotiation with SpamTitan, and not failing back to unencrypted SMTP. We have temporarily disabled TLS support on our SpamTitan server and mail is now coming through again. Any messages sent from Gmail since we restored SpamTitan yesterday morning will be delivered to your inboxes shortly.

This issue may also have affected some other email providers, though Gmail is the only major provider we are aware of that was behaving this way.