April, 2011


22
Apr 11

Amazon EC2 Outage (2011/04/21)

A day to remember!

It was an interesting day, at around 8:40 (GMT) we started getting notifications from our monitoring instruments that our database cluster was failing over, one node at a time.

Now this alone normally wouldn’t be a problem because we host with Amazon EC2 so new nodes should automatically be firing up and joining the cluster, thus keeping the cluster alive.

But it was no “Normal” day, and new nodes didn’t fire up and the cluster did go down.

About Amazon EC2

Its safe to say that Amazon EC2 is an amazing implementation of Cloud Computing and the mechanisms they have in place to allows companies like ours to scale, and stay up are the best i have seen in the past 12 years of working in and running internet companies.

Amazon allow us to create “instances”, these are VM running our own software stack. They allow us to create an infinite number of them, and we can switch them on and off again at will.

Amazon also have a storage solution called EBS (Elastic Block Storage) which allows us to provision virtual disks that can be attached and detached to instances at will.

For the last four years this system has given us ~99.9% uptime and has allowed us to scale to meet our customer demands.

Continue reading →