AWS after 3 years

Posted: 24/10/2011 in System Administration
Tags: , , ,

Hi All,

Following the idea in a previous report about using AWS, this is a report after 3 years.

Differences? No, my idea is not changed: a cloud solution is better than a physical server.

The main event/problem happen during these last three years was the August 2011 power outage. During the blackout, a single zone was affected with a real downtown around 48 hours. The recovery of the zone was completed in more than a week.

At first time, I thought that it was a very long time, but after some reflections, I understood that this kind of problems reach the limit of our technology. I remember some years ago when I worked on a physical server in a farm in Miami. During a night in the building, a switch started to burn with all consequences you can imagine. The farm was down for about 24 hours.

You can follow all best practice you know but there’s always something unpredictable who breaks your eggs in the basket. So what is important? Have a plan B to recover everything.

An example on a physical server you usually use a backup where your usually store the application data, only in few situations you backup all files of a server. So restore a server means to reinstall the OS, all user applications and reconfigure every service. This may takes a lot of time.

On AWS, this is very fast: usually you have static copy of your server called AMI, who is different from the running copy of your server called Instance. They are stored in two different stores located in two different zones, one in the store of AWS (S3) and the other on the physical server. Creating an Instance from S3 takes less than 10 minutes, a big advantage respect to reinstall a server. Creating an AMI is easy too using an Instance, but in this case the time is a little bit longer. What is really important is that you have all instruments to do that, and you have not to buy a specific software or a special DAT resource.

Backing to AWS August disaster, a zone was affected, putting down some all services but not completely: in my case I lost the raid data disk but not the instance, others lost their instances. All restore actions takes less than two hours, mainly spent moving files from snapshots to volumes and restore the data.fs, plone/zope store.

At the end, AWS confirms my ideas, the infrastructure is very powerful, and it requires less administration time compare a physical server.

Cheers.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s