RightScale Blog

Cloud Management Blog
Cloud Management Blog

Amazon EC2 Reaches Next Level with Persistent Storage Volumes

The Amazon folks have gone public today with the next EC2 feature: persistent storage. The official information is found in Jeff Barr's blog entry and in Matt's forum post. Calling persistent storage a feature is an understatement - it revolutionizes EC2 and enables usage patterns that any big-iron SAN user would die for.

The Basics

What does this persistent storage look like? We've been testing it for awhile and are thoroughly impressed. The Amazon folks are clearly still fine-tuning a lot of the details, but basically you can create storage volumes in the cloud next to the server instances you launch in the cloud. Think of having a really big SAN in the cloud in which you can create volumes of up to 1TB each with a single API call, or with a simple click in the RightScale UI (we support storage volumes on our site coupled with some neat automation and an array of prepackaged solutions). You can mount one or multiple volumes on an instance and they appear just like the other local drives, so you can format them as you like, set up striping, and do other useful things.

The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and then create new volumes from the snapshots. The snapshots are great for durability. Once a snapshot is taken it is stored in S3 with all the reliability attributes of S3, such as redundant storage in multiple availability zones. This essentially solves the whole backup issue with one simple API call or click in the RightScale UI. You can also easily restore a snapshot by creating a fresh volume from it. This feature is useful beyond just restoring a backup; you may restore to another instance where you now have a clone of the data and can do whatever you want to it.

The Cool Stuff

There are so many great uses for the storage volumes that it's impossible to write them all up in a single blog post, and we obviously haven't thought of them all. The first usage scenario we looked into is running a database. Up to today the only setup for a mission-critical database we recommended was using two instances with real-time database replication and frequent backups to S3. We've installed our Manager for MySQL replicated setup for many customers and it works well. We use MySQL replication for redundancy and frequent (like every 10 minutes) backups to S3 on the slave to guard against the unlikely event of simultaneous failure of both instances located in different availability zones.

With storage volumes, the Manager for MySQL setup works even better. Instead of having to tar up the database files and upload them to S3, we can just take a snapshot. And in order to initialize a slave we simply create a volume for it from the last snapshot of the master and launch the replication: no more rsync of the data is necessary. It's nice to see how all the automation we've built stays in place with the new Amazon capabilities and saves just as many headaches as before. It just gets turbocharged by the storage volumes.

In addition, the storage volumes enable slightly lower-end database offerings. Since the storage volumes are more durable than local instance storage, a lot of the risk of losing it all if the instance dies goes away. It is now possible to run a single instance with the database data living on a storage volume and to take frequent snapshots to get backups onto S3. Should the instance die, it is simple to launch a fresh one using the same storage volume. Typically it would take only a few minutes for the new instance to come up and take off where the old one stopped. Of course this setup has more downtime when compared to the redundant database setup, and one has to be careful in setting everything up to minimize the time it takes to mount the volume and to ensure a successful database recovery.

Just as the storage volumes enable the reliable use of single-instance databases, they also enable single-tenant appliances in EC2. It is now possible to host the data for a single-tenant virtual appliance on a storage volume and mount it on an instance. Decoupling the data from the instance means that you can start a customer on a small instance and if they outgrow it, you can migrate them almost seamlessly to a large and later an x-large instance, all using the same storage volume. Beyond an x-large a couple of interesting options are possible to increase performance further, such as striping multiple storage volumes. EC2 brings virtual appliances to the next level.

The S3 snapshots enable some different and intriguing usage scenarios. Suppose you're doing some DNA matching against a genome data set on 1,000 instances. In addition to firing up 1,000 instances on a whim, you can, also on a whim, clone a nicely prepared snapshot of the data set 1,000 times to create 1,000 volumes, one for each instance, so they can all independently crawl over the data set. This type of massive (essentially read-only) cloning really opens up new possibilities in running such large computations in a cost-effective manner.

Summing It Up

I'll stop here, but clearly the cloud has just squared in size. Two years ago, when I started on EC2, there were only small instances available and the sentiment was that in order to get the horizontal scalability and pricing of the cloud you had to accept inferior features. In the meantime we've gotten multiple instance sizes plus recently remappable IP addresses and availability zones. That already indicated that computing in the cloud would soon surpass computing in traditional colos or in your own data center not just in scale and price, but also in feature set. With the addition of the storage volumes with all the cool snapshot features, it's now a fait accompli: cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines. It's going to be like agile software development: if you want to survive as an Internet/web service, you will have to compute in the cloud, or your competitors will leave you in the dust by being able to deploy faster, better, and cheaper.

Update: Werner Vogels, Amazon's CTO, also blogs about the storage volumes in all-things-distributed with a little more background perspective. The Amazon folks are getting pretty coordinated with news appearing at the same time on their blogs and the forums. Maybe I missed it, but I don't think they even press release this stuff.

Comments

[...] Frontier. Amazon will be launching mountable volume storage to EC2 soon enough. rightscale has a nice post of why this is [...]
[...] Амазон обновил свой сервис EC2, теперь доступна возможность бекапа на S3, Amazon takes EC2 to the next level with persistent storage volumes « RightScale Blog [...]
Rod, thanks for leaving me a Joyent advertising comment! ;-) I like what you offer and think that your service is excellent for many users. Also, if there's a way to work together we'd love to talk! Regarding the specifics of your comments, the fundamental difference is on-demand. With AWS I can acquire or drop resources on a whim. At first, being able to just launch a bunch of servers or create 10 volumes from a snapshot "just like that" is a wow-experience and feels like a luxury, but after using this stuff for almost two years I just can't go back. If you're familiar with software development, when version control first became widely used it was a revelation to be able to tell a developer who was working on a special feature to "just branch the project and commit your changes to the branch". Well, now we have gotten used to "just clone the staging system and test your changes there", where the staging system is really a multi-server set-up and cloning that takes 5 minutes of clicking and editing a few web forms. We do this so routinely now that I can't go back. Some scientific computing people we're talking to are drooling over being able to create tens to hundreds of clones of a terabyte volume "just like that" because it allows them to get their job done quicker and better. Looking at Amazon's storage volumes just feature by feature without considering the on-demand scaling is missing the big picture.
At Joyent, we already give you access to "real" storage. Each accelerator comes with a dedicated static IP address and a dedicated drive space. Our new accelerators make sure that you have access to all that drive space locally. You had mentioned that Amazon was not yet disclosing the speed of this new offering. It is likely that developers will experience degraded I/O speeds as multiple users try to run high traffic DBs on the same storage volume. Its a basic issue of physics. You can only run so much through a single storage device. At Joyent, we have solved this problem by associating large local drives with each Joyent Accelerator. The other question that has not been covered yet is how much this will cost. Amazon charges extra for bandwidth, extra for static IPs, extra for S3 backup, extra for S3 requests and now, it looks like they are going to charge extra for persistent storage. BTW, @SearchAllDeals.com , at Joyent, you can mount shared drives across multiple instances. There is no question about it that RightScale has a tremendous product. But, you guys should consider giving your uses alternatives. Joyent provides just such an alternative.
Thanks for the excellent report. We've been using EC2 for a year and this was great news.
[...] chance of losing ~10 minutes of data were the price you had to pay for hosting databases on ec2. But now that’s gone. Once again Amazon focuses on a simple tool that can be used in a variety of ways. The only [...]
[...] thus making it a shared drive. What it all means is that AWS/EC2 has gone up a few notches in terms of reliability. This reliability will go a long way towards the company offering service-level agreements to [...]
[...] April 14: Amazon just added another new feature, in which persistent storage “volumes” can be added to EC2 implementations. Thanks to [...]
[...] thus making it a shared drive. What it all means is that AWS/EC2 has gone up a few notches in terms of reliability. This reliability will go a long way towards the company offering service-level agreements to [...]
SearchAllDeals: thanks for the kind words! Werner's blog states "As to be expected with a volume abstraction only one instance can have the volume mounted at any given time." I don't know that I would agree with the "as to be expected" piece, I would have expected to be able to mount a volume on multiple instances such that I could use a cluster filesystem like GFS to access it, or at least to mount it read-only on multiple instances. Hopefully that'll be high on Amazon's list for V2...
Great post! I've migrated out of 'degraded instances' one too many times to understand the importance of persistent storage, and decoupling data from instance. Do you know if multiple instances can mount the same volume? That would solve another set of problems and enable more cool stuff to happen. Btw, keep up the good work at Rightscale!
[...] Thorsten vok Eiken at RightScale, who has been testing the service, talks about the implications of this feature and says his company is making tools to make it easier to use these services. [...]
fiidgets: thanks for the note. We actually have a lot of customers who *do* have sysadmin experience. They really understand how much time and headache RightScale is saving them, so they tend to be our most vocal supporters. mmc: ahhh, there's always something more to look forward to... I don't know that I would couple "become stable" with "non-beta". When S3 removed the beta tag we didn't really see any difference, did we? It's not like Amazon is currently saying "oops, we goofed, but hey, it's still beta"! They are dead serious, beta or not. And "stable"? Have you purchased colo or hosting at a larger scale (say several racks plus associated bandwidth)? So far Amazon is as stable as anything I've seen. I'm totally with you on datacenters in Europe, though.
Actually for a true "fait accomplit" we need all these nice features to become stable (non-beta) + us in Europe need a local data center to host our EC2 images for lower latency. Besides it would be nice if Amazon improves security (and no I am not going to list all the security problems. Contact me privately if you want to know the problems).
[...] since the initial launch is static IP numbers. Early tester (and reseller) Thorsten von Eicken is enthusiastic: The feature that really makes the storage volumes sizzle is the ability to snapshot them to S3 and [...]
Widget developers need to give EC2 a serious look. Those without sys admin experience will appreciate service of RightScale.
<strong>Amazon Blows Away Objections...</strong> Amazon must have been burning more midnight oil than usual lately.Within the last two weeks, they've announced three new features that basically eliminate any remaining objections to their AWS computing platform.Elastic IP Addresses&#160;Elastic IP ad...
This is a major feature for everybody, but specially for those running database dependent applications like me :) Static IP and now persistant storage - this is huge!
[...] Read RightScale&#8217;s experience with EC2 and persistent storage: &#8220;it’s now a fait accomplit: the cloud adopters will have much more computing horsepower and flexibility at their fingertips than those who are still racking their own machines.&#8221; addthis_url = 'http%3A%2F%2Flaurent.pierssens.com%2F2008%2F04%2F14%2Fec2-with-persistent-storage-private-beta%2F'; addthis_title = 'Private+beta+for+Amazon+EC2+with+persistent+storage'; addthis_pub = 'lfp'; [...]
Jason: thanks! We're using one of the standard wordpress layouts with a few custom tweaks. Love wordpress!
Good Layout and design. I like your blog. I just added your RSS feed to my Google News Reader. . Jason Rakowski
It would make much sense to publish your test data regarding speed, latency etc. Just my 20 cents..
Ron: I'd love to, but the Amazon folks prefer to keep control over which details are made public at this stage. Also, until it's all released, performance may still change. All I can say is that I'd love to use it in production as it is today.
[...] And this brings us back to one of my favorite topics, Amazon, who announced a persistent storage feature for EC2. Before I blabber along on how cool that is, just read this from RightScale&#8217;s Thorsten vok Eiken. [...]
[...] mitigating any problems. If it really bothers you then hang on a few months for Amazon&#8217;s new persistent storage volumes, which are probably exactly what you are looking [...]
Great article, im looking forward to this feature, i have linked back to this article from my blog...
[...] now offers essential features not previously available - automatic scaling , Manager for MySQL and Persistent Storage. The Amazon Web Services platform empowered the entrepreneur, but advanced technical skills were [...]
1). Congratulation to the entire RightScale team !$! 2). I was able to create volume and snapshot successfully but I was not able to attach my existing server (image). I click on 'Add server for boot attachment' link and got 'No servers available' message. Thanks for your help.
Posted by TJH (not verified)   Ι   August 22, 2008   Ι   01:29 PM
TJH: you can only attach the volume to instances in the same availability zone. We may not have displayed things well enough to make this obvious, sorry for that.
[...] Dominion Solutions partner RightScale is ready to roll with support for the new volumes. I talked with Thorsten who told me, &#8220;We [...]

Post a comment