People often ask us what the biggest innovation is that RightScale offers beyond other forms of server management. The real answer is the integrated approach that puts together all the parts needed to architect multi-server deployments, launch them, monitor and manage them, and then cycle back to re-architecting, re-launching, etc. But if there is one aspect of the overall integration that is special to RightScale, it's the methodology of dynamic server configuration using ServerTemplates and the ability to publish these ServerTemplates and meter their usage.
We recently took a look at how our customers and ISV partners use ServerTemplates and just published some of our findings in a press release. Among the interesting facts that surfaced, our customers have developed a total of 42,500 RightScale ServerTemplates for Cloud Deployments as of 2011, doubling the number from the previous year. Of those, 42% of the customer-developed ServerTemplates were built from RightScale ServerTemplates and 58% were created from scratch or from partner templates. I was myself intrigued by the sheer number of ServerTemplates and looked a bit into what our users are doing. Of course a good number of the derived ServerTemplates come from customers making improvements to the ones we've published, and many of them are good fuel for our roadmap. But the vast majority of customer-created ServerTemplates are variations that are specific to the customer's environment or have very customer-specific software. Looking at these numbers is really a good reminder that while most organizations use standard software, and most sysadmins try to adhere to standard ways for installing and operating this software, the reality for the leading companies is that customization is a vital necessity and RightScale supports just that.
The ServerTemplate methodology is also uniquely suited to supporting multiple cloud implementations. This ability to leverage our and our partners' ServerTemplate investment has led to the just announced partnership with Tata Communications where we'll support their cloud offering with our multi-cloud ServerTemplates. The Tata clouds are based on Cloud.com's system, which means that customers will be able to deploy servers based on the same RightScale ServerTemplate in Amazon EC2, Tata's InstaCompute cloud, as well as the customer's in-house private Cloud.com cloud (if they have one). What really excites us in enabling this level of portability is that it leads to more business for everyone! Most of our larger EC2 customers have plans for other clouds, whether public clouds by other providers or private clouds, and in every single case the result will be more EC2 consumption because the overall goal isn't to move EC2 servers back in-house but to enable more parts of the business to leverage cloud computing! Dynamic Configuration The dynamic configuration of servers through ServerTemplates is key to all this. The way we see a server coming together within RightScale is:
- The cloud provides a virtualization container, the "hardware," with requested resources for processing, memory, disk, and network
- The cloud then boots a machine image which sets up the operating system and the RightLink agent
- RightScale runs the set of boot scripts identified in the server's ServerTemplate, these perform several types of tasks:
- install the full complement of software that the server needs to operate,
- add additional hardware resources to the server, for example, disk volumes restored from snapshot, assignable ip addresses, etc, and
- configure all the software and bring the server into an operational state
- RightScale starts collecting monitoring data and generates alerts according to the alert specifications of the ServerTemplate
Where we've innovated in the face of the new opportunities offered by cloud computing is in moving the boundary between what's cast into the machine image vs. what's done dynamically at boot time. Traditionally in the virtualization world there is little difference between an image and a server. An image is just a snapshot of a server, or perhaps its embodiment when the server is not running and has no hardware resources allocated to it. While this methodology works in the cloud as well, it doesn't allow the benefits of the cloud to be leveraged.
How to Fully Leverage the Cloud
To fully leverage cloud architecture the role of images has to change. An image has to be the basis for launching and relaunching many servers. This is how auto-scaling happens, this is how failing servers get replaced quickly, this is how IT becomes agile and can spin up or rev servers. The key insight needed is that fewer things should be cast in the image and more needs to be customized on demand to each server's role and configuration. The truth is that to leverage the cloud we have to recover the operating system as an abstraction layer!
What has happened is that the operating system, the application, and often also the application data are mushed together in machine images. Remember that one of the main roles of an operating system is to abstract the specifics of the hardware and present a portable machine abstraction to applications. We need to preserve this in the cloud so we can replicate and move servers and server configurations!
For example, as a server configuration moves from development to production the hardware (really the virtualization container) can change drastically. It may move from an internal dev cloud to an external production cloud or simultaneously to multiple internal and hosted production clouds around the world. Each time the operating system needs to be able to change while the stack on top of it remains the same. Some of the typical OS changes include: 32-bit vs. 64-bit platforms, uniprocessor vs. multiprocessor, slight changes in OS revision/configuration, changes in disk layout, hypervisor changes, etc.
The best way to recover the abstraction layer is to dedicate the machine image to contain only a cloud-optimized version of the operating system. This is exactly what we do with RightImages. These are base OS installs that we've been producing for years that provide effective abstraction across multiple clouds and virtualization containers. They consist of an optimized operating system install, the RightLink agent, and a few other very common supporting software packages. RightImages serve as the foundation upon which ServerTemplates dynamically assemble the rest of the software stack at boot time. This methodology lets us reliably stand up the same server configuration across a large number of clouds and machine types.
Often users get hung up about what happens at boot time versus what is already present on the image from a performance perspective. The performance of installing software at boot is a non-issue in most cases, but not always, and that doesn't really break the model. Many of our customers create their own images that simply have more software pre-installed. Some good examples are pre-installing the version of Java a customer uses across many servers, or pre-installing SharePoint on Windows as it takes a long time to install. The advantage is that booting is faster, but the disadvantage is that a larger portion of the stack becomes more difficult to manage. It helps to remember that you're effectively adding this software to the operating system abstraction, so to speak.
Keeping the application software separate from the operating system adds great runtime and maintenance flexibility. For example, you can test something out on a running server by adding a new script, making sure it works, committing the ServerTemplate, then rolling forward all running servers to that revision of the ServerTemplate to have it applied immediately. With bundled images, you would have to push the script to all machines manually and re-snapshot an image to use for future instances. Or, If you’re an ISV publisher of a solution on RightScale, and you make a simple bug fix to a script on a ServerTemplate, anyone using that solution can pull in the bug fix onto their existing servers where they may have added additional scripts and applications. When ISVs deliver their software through pre-bundled images, customers have to get the new image with any fixes, then reinstall and re-configure any applications and take a new snapshot - very tedious! In short, ServerTemplates are flexible like playlists on an iPod, where virtual machine images are like burning and shipping a CD.
A related trend that we've found ourselves pursuing is to be more procedural and less declarative. At first blush a declarative template that states "this server needs X Y and Z" may seem better than one that cobbles things together in a script, but we've had relatively little success with that beyond simple demo examples. Disk volumes are a case in point. It may seem natural to specify in the ServerTemplate that it needs a 1TB volume, or perhaps even a volume restored from a specific snapshot (this is what the EC2 API supports). But in reality a server will need to restore data from a snapshot backup, generally the most recent consistent backup of the role the server is about to take on. This ends up involving listing recent snapshots, filtering for those pertaining to the role and that are tagged as being consistent (in many cases several volumes have to be backed up and are then tagged as consistent if all the snapshots complete correctly within the allotted timeout), and then selecting the most recent one. After attaching the volume, it needs to be mounted and appropriate recovery procedures often have to be run. All this ends up being procedural, and the best methodologies we've seen are to use configuration languages that don't repeat work that has already been performed and are idempotent, such as Puppet and Chef.
Thoughts on AWS' CloudFormation
In this context, it’s interesting that Amazon decided to use a declarative approach in the new CloudFormation (CF) templates. The CF templates are mostly at a different level of abstraction from ServerTemplates in that they focus on putting together multiple servers and their associated resources. We've been using our macro feature for that purpose for some time and have published a number of "getting started" macros to set up demo deployments as well as partner macros that setup clusters with software from multiple vendors. We've again found that past the simple examples, we really want to have procedural control over the setup of a server cluster, if only to be able to let the user input some parameters, like the number of app servers, and adjust how everything unfolds accordingly, or handle the cases where some resources already exist and should be used as-is.
Another area where we've taken a broader approach: CloudFormation really only targets setup (of course that's the most juicy part where money starts flowing if you’re the underlying cloud provider ;-) ). Servers also need actions performed on them during operation and often their roles and complexity are no different from initial setup. For example, when you run a RightScale operational script to re-mount the latest production snapshot on a staging database, that's really no different at the core from what happened when the server originally launched. Or, if you choose to relaunch the staging server to get the latest snapshot, then the operational script to reconnect all the app servers to the new database is the same as the app server to database connection when all the servers were originally launched. A dynamic runtime configuration system is needed really at all three stages of a server's lifespan: launch, runtime, and decommissioning.
What I hope becomes clear through these examples is the value of the integrated approach that RightScale offers. We see tremendous leverage in integrating all the pieces that developers and sysadmins need to architect, deploy, manage, track, audit, and recycle servers in one platform. We are leading the cloud industry in that respect, and we also still have a lot of cool new features under development in the pipeline!