One of the distinguishing features of RightScale is that from day one we've focused on automating the configuration management of servers. The reason is simple: you can't get the benefits of the cloud if you have to spend a lot of manual time configuring each launched server. The cloud itself only solves part of the problem: calling EC2's runInstances API is just the beginning of a rather complex process that gets the server into full operation. After all, most launched servers are intended to go into operation/production and not just sit around idle waiting for someone to log in.
After doing a webinar on ServerTemplates, I thought it would be good to go back and write-up why the core of RightScale does what it does. It's on the long side so you may want to skim over the sections that you're already familiar with.
Why Machine Images Don't Work
The standard method of operation in the virtualization world is to work with machine images. This translates to: launch a server with an image that is "close" to what you want, then log in and install the software you need plus make any config changes, then create an image from that server ("bundling" in EC2 parlance). Later on, when you need an instance of that server, you launch the image and hopefully it comes up ready to go. While this process may sound simple, it's actually anything but that and takes quite some time. I went through the bundling treadmill back in 2006 and concluded this wasn't productive. The reasons are simple:
Images are too monolithic. Everything on a server is bundled up in one image file which makes it difficult to manage a collection of images. Change the version of one software package you commonly use and you have to recreate all images that happen to use that package. This quickly gets out of hand.
Images are opaque. From the outside it's hard to tell what's in an image. Even if you fire it up it's not convenient to poke around to figure out what's installed and how it's configured. Try determining the difference between two images: not a pleasant task.
Images are too big. They are unwieldy to work with. Take two images of different versions of your app server. More than 90% of the bits are typically identical, (often more than 99%). But finding the interesting ones that differ is like finding a needle in a haystack. This is ridiculous and contributes to making images hard to work with.
Images are too static. You can't fully configure each server. When you launch the tenth app server it needs to know it's number 10 and not nine. When you launch a test app server it needs to know it's in a test deployment (e.g., don't send alert emails to the ops team), yet you want the same image to be used in test as in production, because otherwise, what are you really testing? So you need some dynamic configuration mechanism to "personalize" each server at boot time.
The bottom line is that my experience with images has been very frustrating. I know that tools exist to help manage them, but I'm not convinced this is a productive avenue. This why late in 2006 I set out to build what we now call ServerTemplates – and I can't imagine going back.
The ServerTemplate Concept
The idea behind ServerTemplates is to boot any server from a small set of very generic images and configure the server dynamically at boot time. We noticed that the Linux package managers are very fast and running
yum install apache or
apt-get install apache takes just a few seconds, so there is little value in baking such software into an image. There are special cases where every second of boot time counts, but those are very limited and at that point it's still possible to create a more specialized image for that purpose.
Simplifying a bit, a ServerTemplate is composed of the following pieces:
- a group of settings to define the type of server - i386 vs. x64, etc.
- a reference to a base image that is to be booted
- a list of scripts and Chef recipes that are to be run at boot time to install and configure all the software
The illustration on the right shows the layers that are typically found in a ServerTemplate, starting from the bare virtual machine at the bottom, the OS, and various layers of software packages, and finally the application at the very top. On Linux we prefer to boot bare bone images that contain just the OS, while on Windows it is often required to pre-install some of the larger software packages on the image and use the dynamic ServerTemplate functionality primarily to configure these apps.
What happens at boot time is the following:
- When launching a server, RightScale passes the server its identity in the launch call using a crypto token that uniquely identifies the server to RightScale. This is important because at any one time many servers may be booting from the same image.
- When it comes up, the server contacts RightScale with its token to obtain instructions on what to download and run.
- RightScale also sends the server a set of variables that can be configured on the website. This is the way dynamic information is fed into the config to specify things such as the server's name, test vs. production, names/IP addresses of other servers it needs to contact, and so forth.
- The server then typically downloads packages from a distribution mirror and runs a set of scripts to install and configure everything.
- Throughout the process, the server sends audit entries to RightScale so that it's possible to monitor the progress on the web site and also for the persistent audit record.
ServerTemplates provide a very modular building block approach to managing server configurations. In practice many different constituencies contribute to a ServerTemplate. Vendors, OS distributions, and RightScale provide the lower layers in the form of standard software packages. The sysadmin or operations team often provides standard configurations for fleet-wide software, such as logging, intrusion detection, user account management, network config, etc. Developers provide higher layers, such as app server install and the application code itself. The modular approach makes it easy to integrate all the pieces and especially to manage the update process.
The modularity of ServerTemplates also enables flexible software development and test practices. In our case we use the same building blocks to create a large variety of ServerTemplates that are appropriate for the different stages from development to production:
- In production we use many specialized servers, so we end up with many ServerTemplates, for load balancers, app servers, API servers, and so on.
- For staging we start to aggregate functions so we are running fewer servers. This saves money and also simplifies updates a bit. To achieve this, we combine scripts that are in different ServerTemplates in production onto fewer ServerTemplates.
- For test setups we combine again so we have a number of test systems without having to launch and manage too many servers.
- Finally, developers often use an "all-in-one" ServerTemplate for their development and testing. This ServerTemplate combines all the building blocks in a single ServerTemplate.
The beauty here is that we can reuse the exact same RightScript and Chef cookbook building blocks that we use in production for the other stages of development. This reduces set-up time and issues where developers test configurations that have little resemblance to what will go into production.
Making ServerTemplates Reliable
In IT management there is often a tension between flexibility and reliability: if everything can change at any moment it's hard to lock down a reliable and reproducible configuration. We discovered this early on and spent a lot of engineering resources to provide a good reliability harness around ServerTemplates to solve the problem. Our solution has a number of aspects:
- ServerTemplates are version controlled, so you can commit a version and come back to it at any later point. If you want to relaunch a server with last year's version of a ServerTemplate you can. Or perhaps you just want to see a diff of what has changed since.
- We mirror the Linux distribution mirrors such that a booting server retrieves the packages from a local redundant set of fast servers.
- We also keep a daily snapshot of the Linux distribution mirrors such that when you relaunch a server with last year's version of a ServerTemplate it can retrieve the software packages as they were at that point in time. This is under user control: you can freeze the repos to any day of your choice.
RightScale reliably launches thousands of servers using ServerTemplates each day. Keeping this machinery reliable as both RightScale and the underlying clouds we manage evolve at breakneck pace is a top priority for us and involves a significant amount of engineering. One of the tricks we use to stay ahead of the curve is that we use ServerTemplates ourselves: we leverage RightScale to manage RightScale. The benefits in terms of automation, control and reliability have been incredible, and at this point we cannot imagine going back to a pure machine image model of operation.
ServerTemplates are a leap forward over images, they enable servers to be launched and dynamically configured at run time – no tweaking machine images. This is the underpinning of truly automated infrastructure. They also support launching reliable infrastructure – you know what you’re launching at all times, it is completely repeatable. Finally, they are built on reusable components, which saves untold time in creating new or similar configurations, or expanding from test and dev to production.