Summary of today, 12th June 2011:
Today was all about storage. There are a few options in an EC2 instance and it is important to understand the consequences of each. Elastic Block Storage has been very straight-forwrad, but I needed to learn more about instance storage, commonly called ‘ephemeral storage’.
Ephemeral storage, I believe, is disk-based storage on the host (that’s good enough for me anyway). This storage option does not persist across compute stops or terminations but does across reboots (neat). Most compute types are allocated storage capacity ranging from 1.6Gb – 1.7Tb or so, but no drives are mapped to said storage from the default Amazon AMIs, and I don’t think they can be added to a running instance. Ephemeral storage is attractive because it is free (included in price of the server and given the right application makes an attractive option). Not only free storage but also free transfer/IO. Amazon charges nominally for IO requests($.10 per 1 million but that will add up quick for an enterprise) for EBS, as well as storage based on size used.
Some use cases covered…
1) Add ephemeral storage to my instance. Start instance from ami having only root (C:\) with ephemeral drives 0 and 1. You can’t control the drive letter, as far as I know. Also of note: the device names and options differ depending on the server class (micro, small, large, etc.). For example, the t1.micro doesn’t even offer ephemeral storage, while the m1.xlarge (extra large) offers 4 ephemeral drives.
2) Start a new instance, but use existing data. Given the hypothetical server dies but data okay, I am looking at ways of returning to service. So for this use case, I wanted to use an existing volume of data (that was previously attached to a running instance).
First variation was to use a snapshot. I was able to launch a new instance an map the block devices directly to snapshots using Amazon SDK. The snapshot is of a single EBS volume. The default behavior on starting new instances is new EBS volumes with same configuration as image are created. This worked flawlessly and was very straght-forward.
Second variation was to use an existing volume. With a few API call, one can attach existing volumes to stopped instance. If the volume was already attached to another instance, a detach is required or an error is returned.
3) Experience how ephemeral storage works with server reboots, stops and starts, and new images. I launched an instance via Amazon SDK giving it 2 ephemeral drives via Block Device Mappings (this added 2 drives of instance storage to an instance, which was made from an image that had zero ephemeral drives mapped). They worked great, by the way. Then, I stopped and started the instance. Result: the data from the drives (G,H) was gone, but the drive mapping persisted across stops. The documentation says the data will survive a reboot, however. I then created an image from this instance to find out first hand what happens to ephemeral storage. After a few minutes, I was able to launch an instance from this new image and check to see if drives there. It was the same result: the data is gone, but the drive mapping persists across image creation. This was for an instance of the same type running when the image was created. Curious, I launched a micro, which doesn’t offer ephemeral storage. As suspected, the ‘G’ and ‘H’ drives were not there. I suspect it is a similar story for xxl types with 4 drives mapped launched as Large types or others with only 2 available.
4) Determine approach for managing this infrastructure. While the AWS console is a welcome feature, it does not support all of the AWS features. A console alone is not appropriate for managing an infrastructure; there are too many tasks that need to be automated, some requiring dynamic configuration. A powerful infrastructure like Amazon AWS needs a powerful set of command line tools. I am looking at several options here, and there are many routes one could go, such as:
- boto (one-man show, python, read about it in book, not interested-there are tools closer to home)
- Amazon EC2 command line tools. These look rich and full-featured, and they are produced by Amazon, but it requires Java VM and a touch of configuration for Windows. Still attractive, however.
- Powershell to script classes in the Amazon SDK for .Net. I did a quick sample: list buckets for my s3 account and worked the first time-always a good sign.
All this leading up to the big question about managing an enterprise class SQL Server in the cloud. Backup and recovery forthcoming… Some helpful refs on that topic: http://www.brentozar.com/archive/2008/10/running-sql-server-2005-on-amazon-ec2/
http://friism.com/ec2-sql-server-backup-strategies-and-tactics