The Amazon folks have gone public today with the next EC2 feature: persistent storage. The official information is found in Jeff Barr's blog entry and in Matt's forum post. Calling persistent storage a feature is an understatement - it revolutionizes EC2 and enables usage patterns that any big-iron SAN user would die for.
What does this persistent storage look like? We've been testing it for awhile and are thoroughly impressed. The Amazon folks are clearly still fine-tuning a lot of the details, but basically you can create storage volumes in the cloud next to the server instances you launch in the cloud. Think of having a really big SAN in the cloud in which you can create volumes of up to 1TB each with a single API call, or with a simple click in the RightScale UI (we support storage volumes on our site coupled with some neat automation and an array of prepackaged solutions). You can mount one or multiple volumes on an instance and they appear just like the other local drives, so you can format them as you like, set up striping, and do other useful things.
The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and then create new volumes from the snapshots. The snapshots are great for durability. Once a snapshot is taken it is stored in S3 with all the reliability attributes of S3, such as redundant storage in multiple availability zones. This essentially solves the whole backup issue with one simple API call or click in the RightScale UI. You can also easily restore a snapshot by creating a fresh volume from it. This feature is useful beyond just restoring a backup; you may restore to another instance where you now have a clone of the data and can do whatever you want to it.
The Cool Stuff
There are so many great uses for the storage volumes that it's impossible to write them all up in a single blog post, and we obviously haven't thought of them all. The first usage scenario we looked into is running a database. Up to today the only setup for a mission-critical database we recommended was using two instances with real-time database replication and frequent backups to S3. We've installed our Manager for MySQL replicated setup for many customers and it works well. We use MySQL replication for redundancy and frequent (like every 10 minutes) backups to S3 on the slave to guard against the unlikely event of simultaneous failure of both instances located in different availability zones.
With storage volumes, the Manager for MySQL setup works even better. Instead of having to tar up the database files and upload them to S3, we can just take a snapshot. And in order to initialize a slave we simply create a volume for it from the last snapshot of the master and launch the replication: no more rsync of the data is necessary. It's nice to see how all the automation we've built stays in place with the new Amazon capabilities and saves just as many headaches as before. It just gets turbocharged by the storage volumes.
In addition, the storage volumes enable slightly lower-end database offerings. Since the storage volumes are more durable than local instance storage, a lot of the risk of losing it all if the instance dies goes away. It is now possible to run a single instance with the database data living on a storage volume and to take frequent snapshots to get backups onto S3. Should the instance die, it is simple to launch a fresh one using the same storage volume. Typically it would take only a few minutes for the new instance to come up and take off where the old one stopped. Of course this setup has more downtime when compared to the redundant database setup, and one has to be careful in setting everything up to minimize the time it takes to mount the volume and to ensure a successful database recovery.
Just as the storage volumes enable the reliable use of single-instance databases, they also enable single-tenant appliances in EC2. It is now possible to host the data for a single-tenant virtual appliance on a storage volume and mount it on an instance. Decoupling the data from the instance means that you can start a customer on a small instance and if they outgrow it, you can migrate them almost seamlessly to a large and later an x-large instance, all using the same storage volume. Beyond an x-large a couple of interesting options are possible to increase performance further, such as striping multiple storage volumes. EC2 brings virtual appliances to the next level.
The S3 snapshots enable some different and intriguing usage scenarios. Suppose you're doing some DNA matching against a genome data set on 1,000 instances. In addition to firing up 1,000 instances on a whim, you can, also on a whim, clone a nicely prepared snapshot of the data set 1,000 times to create 1,000 volumes, one for each instance, so they can all independently crawl over the data set. This type of massive (essentially read-only) cloning really opens up new possibilities in running such large computations in a cost-effective manner.
Summing It Up
I'll stop here, but clearly the cloud has just squared in size. Two years ago, when I started on EC2, there were only small instances available and the sentiment was that in order to get the horizontal scalability and pricing of the cloud you had to accept inferior features. In the meantime we've gotten multiple instance sizes plus recently remappable IP addresses and availability zones. That already indicated that computing in the cloud would soon surpass computing in traditional colos or in your own data center not just in scale and price, but also in feature set. With the addition of the storage volumes with all the cool snapshot features, it's now a fait accompli: cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines. It's going to be like agile software development: if you want to survive as an Internet/web service, you will have to compute in the cloud, or your competitors will leave you in the dust by being able to deploy faster, better, and cheaper.
Update: Werner Vogels, Amazon's CTO, also blogs about the storage volumes in all-things-distributed with a little more background perspective. The Amazon folks are getting pretty coordinated with news appearing at the same time on their blogs and the forums. Maybe I missed it, but I don't think they even press release this stuff.