RightScale Blog

Cloud Management Blog
RightScale 2016 State of the Cloud Report
Cloud Management Blog

Migrating to Docker: The Second (Harder) Half

Migrating to Docker: The Second (Harder) Half

The RightScale engineering team is moving the entire RightScale cloud management platform comprising 52 services and 1,028 cloud instances to Docker. This article is the fifth in a series chronicling “Project Sherpa” and is written by Ryan Williamson, senior software engineer, and Tim Miller, vice president of engineering.

Weeks 3 and 4 and 5 of Project Sherpa have been a blur. Exhaustion is setting in with some of our lead “sherpas” going way above and beyond by working some long hours.

Our plan going into the project was to focus on a cookie-cutter implementation, which we knew would be the only way to stay sane as people were ramping up on these new technologies and implementing them in their given service or application. About a week in, we started running into situations where every team was learning, liking what they saw, and then lobbying for “super great changes” to the core templates and methods of deployment. These were oftentimes in direct opposition to other teams’ ideas, leaving Mark Dotson, our head Ops sherpa, longing for a T.G.I. Friday’s-style button that said, “Great idea! Let’s talk about implementing <insert great idea here> after we are done with this but for now let’s do this tested method even though it isn’t as great!"

Our first “done” milestone for each service was defined as a 1:1 replacement of the “traditional deploy” object with Dockerized versions — including the automation needed for that service — deployed in integration. After that, Mark would take that work and get it ready for staging and production, including making sure that all of the services worked together. To add some structure to the project, we split our services into three groups based on how long they would take to Dockerize and which services needed to logically get deployed together. We created a set of target dates for Group 1 apps to go to staging first, then Group 2, and lastly Group 3.

It was working pretty well, as each group of services came “due,” but we started seeing situations where teams were getting blocked because our branches of master for building the integration envs were changing so quickly to stay in line with the merging code to support the :docker: versions of things. This meant that if each team was not continually merging master into its feature branches, the team would end up with parts of the integration envs breaking.

For example, if we merged in changes to master to our widely used tagservice, that meant everyone needed to update tagservice, otherwise your (seemingly) unrelated services would suddenly start failing because you simply did not realize that the dependent apps were not working. This made it harder for folks to collectively have a stable config to try to get to done, which slowed everyone down. We got there, but this became a common pain point. At this point, our head Ops sherpa Mark, putting in 15 hours a day, added a button to his shirt that said, “Merge master into your feature branches . . . like all the time!"

Now at week 5, we’re running a little behind schedule. Group 1 services went into production 10 days behind schedule, and Group 2 services are just heading into production. Group 3 only has 3 apps in it and is expected to be in production next week. Where we lost most the time was in the integration phase: the time after we are dev complete and the containers need to move to staging where the nightly regressions must all pass before the containers are put in production. Part of the delay was caused by some of the services taking longer than we expected to get to “dev done,” but the major factor was that the manual process to wire new containers into our staging environment took quite a bit longer than we had anticipated

Staging at this point now consists of both containerized services and traditionally deployed services, and making it all work together was a very complex and time-consuming task. For quality reasons we wanted one Ops person to do that since our existing automated tools won’t work in this “halfway done” environment. This did create a bottleneck, but it has helped having one person “really” know how it was wired together. Ultimately, I think our “burn the boats” approach was the right one, even though our estimates were off by a couple weeks.

Redefining the Day-to-Day for Developers

For our development teams, Project Sherpa has brought with it two major changes to the day-to-day activities of developers: our branching scheme in GitHub and how we develop an application locally.

Previously we had master, staging, and production branches in GitHub to mirror the code that was running on our various systems. We are now doing away with our traditional staging and production branches and defining them as pointers in time to commits on the master branch, which drives our workflow. As code hits master, our automation builds an image tagged “latest,” which will be promoted through the stack until it ships to production. This improvement helps us move even further in our pursuit of continuous delivery.

One of our goals for Project Sherpa was to increase velocity and iteration in development and test by leveraging Docker for local development on developer laptops. We needed to provide two mechanisms for a developer to interact locally with each of our microservice offerings:

  1. Provide a mechanism for any developer to check out code from an application’s repository and stand up the application and all its dependencies in containers.
  2. Provide a mechanism for a developer to use the developer’s local source code to build and set up an application (either locally or within a container) while leveraging existing containers for any dependencies.

The first method is useful when a developer is writing code that interacts with a service but the developer is not the domain expert of the supporting service; the faster you can configure and bootstrap that service the more time you can spend doing the thing you actually want to do. We went with docker-compose.yml and the standard Docker Compose tool for this use case. A simple “docker-compose up” and you are off to the races with an application that has installed itself, created the schema for and seeded its own database, and is configured to talk to any other microservices it may have as dependencies.

The second use case is for a developer who is actively working on an application within his domain expertise so that he can run all of the dependencies that are needed within containers and yet still iterate quickly on the code that he is writing himself without having to figure out all the intricacies of setting up his development environment. For this we leverage an open source RubyGem written by Tony Spataro, an architect here at RightScale.

So Far So Good

Our all-in move to Docker has already delivered some of the results we expected, including drastically sped-up iteration times for our developers and our QA team. In the past when testing a bugfix, our QA team would spin up an integration environment and run their test suite to reproduce an issue. Then they would pull code onto a single service (or potentially multiple services) to get the system in a state where it was running the code containing the fix and then re-run their suite to verify that the bug was resolved and that no new regressions were introduced. This could take upward of an hour or two. In our new world, our QA team was able to run the services they needed locally, test it as a baseline, pull the Docker image containing the fix, re-run their suite locally, and complete their task in less than 20 minutes. That’s a 3-6x improvement for each bug fix. Not too shabby.

And we’re also already seeing some of the cost savings that we expected along with increased server density, but more on that in our next and final blog where we’ll share all of the stats from Project Sherpa.

We’re now reaching the final ascent phase where we can see the summit and, just like when climbing a real mountain, it’s time for us to dig deep and power through to the top.

To learn more about how we are using Docker, watch our on-demand webinar, Overcoming 5 Common Docker Challenges: How We Did It at RightScale, which covers:

  • Tips for securely producing high-quality, easy-to-manage Docker images
  • Options for multi-host networking between laptops and the cloud
  • Orchestrating Docker in production: service discovery, configuration, and auditability
  • Increasing host density with Docker
  • Dynamic monitoring and alerting for Docker containers

Watch the on-demand webinar.

This article was originally published on FierceDevOps.

RightScale 2016 State of the Cloud Report

Read More Articles on Migrating to Docker:
Migrating to Docker: Getting the Flywheel Turning
Migrating to Docker: Barely Controlled Chaos
Migrating to Docker: Why We’re Going All in at RightScale
Migrating to Docker: Halfway There
Migrating to Docker: The Final Results