The first day of the RightScale Compute 2013 conference was packed with information, but one day wasn’t enough time for us to share all the information we brought about cloud strategies and futures. Two days wasn’t enough either, but that’s all we booked the conference center for. Here’s a rundown of Friday’s agenda for Day Two.
Friday began with a customer spotlight (watch the video) that featured three prominent RightScale customers. Joe Emison, CTO of BuildFax, began the session by talked about setting up a hybrid cloud environment and optimizing storage and networking while using both CloudStack and AWS. BuildFax uses five ephemeral virtual machines for processing incoming data sets, and kills them when each set is done. Its original cloud deployment was on AWS, but the bill was higher than the company could afford. BuildFax wanted the high availability and disaster recovery benefits of a hybrid cloud architecture, and wanted to use some commodity hardware it already had on hand. The company knew its hardware would fail from time to time, so it needed to be able to fail back to AWS as needed.
BuildFax implemented a hybrid cloud using AWS, CloudStack, and AWS Storage Gateway. Data gets stored in S3 and the company pulls EBS volumes off as needed, and RightScale ServerTemplates™ with Chef are used to manage it all. The company was able to cut its EC2 bill in half, and Emison thinks it can cut it more by running more variable loads in the private cloud. It has a better business continuity plan because it is not solely reliant on AWS US East, and if the company’s own office building burns down, it can be back up in 20 minutes.“AWS is essentially a warm backup that only costs us $200 a month.”
Key quote: “Multi-cloud is about people cost-savings as well, because you don’t need people at 4 a.m. to get your systems back up if you go down.”
Next, Roy Ellis, principal QA engineer at Progress Software, talked about automating deployments for the Progress Arcade SaaS platform using the RightScale API and tags. Progress created its own vending machine to “help our application partners to SaaS.” He said Progress has 2,000 customers, of which about 150 use cloud. The company maintains 250 accounts on RightScale that it uses for customers’ deployments. It uses RightScale Tags to filter owners of projects and create “doors.” Tags share IP addresses among deployments. They use RightScripts™ and customize them.
Key quote: “Shame on AWS for going down, but shame on you for not having a backup.”
Finally, Chris Henry, head of technology at Behance, an Adobe company, talked about how his company optimized its infrastructure by melding together physical hardware and cloud infrastructure. Behance is a portfolio site and social network for creative professionals. It gets more than 51 million page views per month, mostly from creative professionals uploading images. Yet only two people run its infrastructure, thanks to the company’s use of Rackspace as a hosting provider. Behance was the first RackConnect customer, and uses Rackspace public and private clouds with RackConnect to connect them, which yields very low latency conditions between the two. It has spun up HA servers for image hosting in a self-healing cluster that is always up. It uses MongoDB to run a Facebook-style activity feed - “you can scale it right out of the box and set up shards and write to them at the same time” - and found that as writes were happening performance was not degraded. RightScale helped Behance to iterate the ServerTemplates they use to power their site.
Key quote: “You have to engineer everything you write for failure.” RightScale CEO Michael Crandell and RightScale CTO Thorsten von Eicken then took the microphone to preview some developments to expect from RightScale this coming year, including:
- Enhancement of the RightScale API
- New development around workflow automation
- Improving existing cost forecasting tools
- New services offerings: "we are very hands-on with customers who need onboarding and strategy consulting”
- Building out the multi-cloud landscape and giving customers more choice
After a break we moved on to the day’s breakout sessions.
Deployment Checkup: How to Regularly Tune Your Cloud Environment
In this session (watch the video), RightScale Senior Services Architect Brian Adler shared how RightScale helps customers perform deployment checks, and offered tips for people who want to do their own deployment checks to save money and improve performance. “‘If it ain’t broke, don’t fix it’ can be a bad decision,” he said.
Deployment checks should focus on ways to improve cost optimization, server utilization, high availability and disaster recovery (HA/DR), security, and best practices. To optimize costs, look for unused and unneeded resources. For instance, if you have EBS volumes you haven’t had attached in a long time, you probably don’t need them. Be realistic about how long you really need to hold onto EBS snapshots. Watch out for elastic IPs - you pay for inactive EIPs on AWS. Increase storage capacity only when you need it. Don’t over-buy or over-allocate, and use Recipe to allocate as you need it. Compress or eliminate cross-region and cross-cloud bandwidth.
You may find the greatest savings by tuning your server utilization, Adler said. Utilize Reserved Instances - PlanForCloud.com can help you figure this out. And choose the right size instance for the task at hand. Find the right fit for memory - you want to be running at 70 to 80 percent of memory consumed. Use monitoring and alerts to find small problems before they grow. Look for trends, not spikes, and look for under-utilization. Act conservatively and react early.
Also consider HA/DR. Is your provider set up to absorb regions dying? Avoid single points of failure, because if you have one, you aren’t HA. Spread the load by using multiple load balancers. Replicate data across availability zones, distribute servers in each tier across multiple AZs, and back up across regions and clouds for DR and failover. Set up monitoring and alerts ahead of time.
As for security, you should use security groups if they’re provided by your cloud. Make sure you know why ports are open in your firewall and who is using them.
On the topic of best practices, have things you need automated and ready to go, and when necessary trigger them manually. We don’t recommend image bundling, Adler says, unless - and this is rare - you need to do a manual install of software and boot time is unacceptable to respond to a dynamic event. Also:
- Use a ServerTemplate with a base RightImage™, and configure your server at boot time.
- Use EBS-backed images to speed boot times if you must, but remember you pay for storage for these each month.
- Use deployments as application containers.
- Set up naming conventions that make sense for your usage and stick with them.
- Commit your ServerTemplates - don’t use them from HEAD.
- Use CREDS for all sensitive inputs.
- Use auto-scaling in combination with right-sized instances for each task at hand.
- Automate all operations - allow no manual changes.
In closing, best practices promote operational efficiencies, and you should design in HA/DR from the start.
Understanding Virtual Networking in the Cloud
Josep Blanquer, RightScale’s chief architect, began this session by saying, “Networking is messy, even in the cloud.” Different cloud providers pick different designs based on what they think you want to manage, and use different naming conventions and API semantics between clouds. Cloud software can be so heavily customized during installation that even for the same cloud you can have different implementations across zones without knowing it.
So, instead of grooming an army of experts on cloud networking, let others do that for you: “Maintain control without having to be bogged down with non-business details.” Blanquer then talked about differences in how AWS, Google Compute Engine, and CloudStack handle networking, which you can best see by watching the video.
RightScale helps you manage cloud networking by abstracting network resources such as:
- IP address bindings
- Security groups
- Network ACLs
- Routing tables
From a RightScale perspective a cloud has multiple networks. A network defines an isolation perimeter and has a CIDR block, and subnets further segment networks into CIDR subblocks. Networks contain security groups, routing tables, and network ACLs, but “notice a conspicuous absence of IP addresses”; we use IP address bindings, which are a combination of instance + IP address + port, and enable multi-cloud traffic through abstraction. Using RightScale you can see the entire network through a single pane of glass. Coming soon we will be creating synthetic resources to fully abstract the networking process so you don’t have to think “security groups” or “firewall” or whatever - you will be able to specify all server configuration in a consistent way.
Delivering SaaS Using IaaS
Senior Product Manager Shivan Bindal presented an overview of the RightScale feature set and described its functionality at a high level (watch the video).
Our customers vary in experience and cloud maturity, he said, so we help them adopt cloud in a way that makes sense for them. User permissioning and accounts enables organizations to have organized environments. Many customers use development sandboxes and staging and production environments.
Eating our own dog food helps: Every employee at RightScale can log in to RightScale and access the cloud to do the development work we need them to do in order to ship product. Developers have access to test accounts where they can create deployments and servers as needed. QA has control of staging accounts where they do rolling code updates from developers and run automated regression and manual functional tests. Operations staff have access to a separate environment with production accounts that run RightScale on RightScale.
But what about when you have a customer who wants to take a legacy environment and launch it in the cloud? That’s the time for “SaaS-ification” - instantiating deployments with single-tenant application stacks per customer. Not economical? Cost efficiency is irrelevant if you price your app appropriately, but such a process requires ingenuity for operational management, taking into consideration the vast number of potential customers and potential application stack upgrades.
Of course complexity exists in cloud - but where do you want it to exist? It should be in your app, not in the management of it. Don’t push it up into the stack.
Once you get up and running in production, then you can optimize production deployments. One RightScale customer chose a unique way to do this. It used AWS DirectConnect to have low latency between an AWS public cloud and a private cloud data center that was geographically close to an AWS region, and it had the same app running in two separate cloud environments.
Bindal introduced RightScale customer Chris Szymansky, director of engineering at The Resumator, who recommended organizations moving to the cloud “iterate on your architecture, don’t perfect it.” He also suggested:
- For disaster recovery, have a passive region ready to go and replicate to it as needed.
- Set triggers for scaling.
- Know when to shed old technology.
- On that note, use MySQL when you need it, but also use tools such as Redis and Solr when it makes more sense.
PHP, Mobile, and RightScale: The Right Way to Do Mobile
Kent Mitchell, senior director of product management at Zend Technologies, led a session on PHP and mobile. He said that PHP runs 40 percent of the cloud, that 72 percent of PHP developers using cloud based services and APIs, and that 66 percent of PHP developers expect to work on mobile apps in 2013.
PHP excels at interfacing with other systems, and mobile apps need an agile approach since they are updated all the time. You need a system to allow you to do this, and that is where cloud and RightScale come in.
Mitchell demoed Zend’s suite of mobile solutions, as you can see on the video. Zend uses the Apache Cordova PhoneGap emulator for mobile development. Zend Studio builds in the ability to deploy right to RightScale, and single billing through RightScale is available for true utility-based consumption - all of the tools necessary to build a mobile app and deploy it on a scalable platform.
Operations Playbook: Monitoring and Automation
Our Chris Deutsch and Raphael Simon bragged a little about our RightScale Operations staff - 7 people managing more than 700 cloud servers across 5 continents. One factor that helps them do their job is that RightScale runs on RightScale.
A demo, which you can see in the video, shows how we do all monitoring using collectd, for which we’ve written custom plugins, including ones to get information from Cassandra databases. How do we get these custom plugins into our deployment? By using a RightScript to attach the plugin. Pro tip: Have a boot script that auto-tags your instances. It makes automation much easier.
RightScale makes heavy use of automation APIs to deploy upgrades and manage servers, among many other things. We created a tool called Chimp - a Ruby gem for running commands on servers managed by the RightScale platform - and we’ve released it as open source software. We use Chimp to do rolling upgrades of application servers, since they can’t all go down together. Chimp takes options for controlling concurrency so that we can upgrade in waves of a set number, and it has a “dry run” mode that does everything in sequence except actually run the script.
HA and Fault Tolerance: AWS + RightScale
Amazon Web Services Solutions Architect Miles Ward talked about architecting in high availability and fault tolerance using AWS and RightScale. He noted that cloud can fail from faults in facilities, hardware, networking, code, and people, and defined fault tolerance as the ability of a system to continue operating properly, though perhaps at a degraded level.
Ward said fault-tolerance is not binary - there are degrees of risk mitigation. By using the cloud, you have the advantage of no up-front capital expense, relatively low cost, and a self-service infrastructure that’s easy to scale up and down, all of which lead to improved agility and time to market.
Many of AWS’ services are fault-tolerant - you can see them enumerated if you watch the video.
Ward talked about a recovery time objective (RTO) - a time period in which service must be restored to meet business continuity planning objectives - and a recovery point objective (RPO) - an acceptable data loss as a result of a recovering from a disaster or catastrophic event. The goal is to figure out the best RTO/RPO ratio, and in that decision, cost is a huge factor.
Application owners are ultimately responsible for availability and recoverability. They must balance the cost and complexity of HA efforts against the risks they are willing to bear.
Best practices for HA that Ward suggested include avoiding single points of failure, using two availability zones, replicating data across AZs and backing up and replicating across regions for failover and disaster recovery, and setting up monitoring and alerts for problem resolution and failover operations. He advocated designing for failure: Use DNS to support multiple load balancers that send traffic to multiple app servers that use a replicated master/slave database setup that is backed up by S3 spread across two AZs. Use Elastic Network Interfaces (ENI) in a Virtual Private Cloud (VPC). Consider distributed NoSQL databases with the same distribution considerations described above.
To mitigate risks, assess each application and define RTO and RPO. Design for failure, starting with the application architecture. When you implement, consider best practices and factor in cost, complexity, and risk. Document your processes and automation (operations). Test frequently.
Performance: Key Elements to Consider in the Cloud
Craig Irwin is vice president for Channel and Alliances at Apica, a company that has been a RightScale partner for the last two years. He talked about how to proactively identify bottlenecks, improve performance, and optimize the cloud environment (watch the video). Part of improving performance is preparing for the unexpected and knowing how to respond, because when things go wrong, a snowball effect makes more things go wrong.
Irwin offered a number of tips:
- Have “minimalistic start and landing pages,” because small is fast.
- Make extensive use of front-end cache systems. He suggests the open source Varnish caching application.
- Implement a scaling and queuing system. Redirect excess traffic using load balancers. And for those users who wind up waiting anyway, create informative “waiting” pages.
- Test your solution for peak loads before launch.
Just what should you test?
- Is the site stable?
- When does it crash?
- Can my application scale?
To test your capacity against load, create a test environment, and use it. Then load and check your test findings. Identify the back-end calls - you’re liable to find that database calls don’t kill your application, but lack of caching does. Check the delivery of static content, too.
Have performance and uptime targets. Establish a baseline and a response time average, but don’t optimize the average, work with the exceptions. Remove the 10 worst transactions every month.
Outage Proof Your Applications
This session was jointly led by RightScale’s Brian Adler and Sanket Naik, vice president of cloud operations at Coupa, which provides a suite of cloud-based financial applications. Naik said that Superstorm Sandy helped drive Coupa’s plans for HA/DR. Because it needed “zero data loss,” Coupa went with a warm DR 99.99 percent uptime plan, which translates to no more than 4.23 minutes down each month.
Some of the material in this session covered the same ground as that of the HA and Fault Tolerance session outlined earlier. I’m going to forgo repeating that information and suggest you watch the video.
When it comes to disaster recovery, make sure you know who will do what to get yourself back up and running. Don’t have people thinking “someone else” is doing a necessary task. Develop expertise in-house or get outside help.
There are several levels of disaster recovery, each with increasing costs. Multi-region cold DR, the cheapest, is not recommended for organizations that require rapid recovery. Warm DR is the generally recommended solution because it combines minimal costs with fairly rapid recovery, but because it employs the public Internet, security and latency may be considerations. Hot DR is warm DR with all the backup servers kept running. As you might imagine, this comes at a high cost, but it allows for rapid recovery. The highest tier is multi-cloud HA, a live/live scenario in which traffic goes in all directions at a massive cost. If you can afford this, we’d like to talk to you about sponsoring RightScale Compute 2014.
HIPAA in the Public Cloud: The Rules Have Been Set
“HIPAA compliance in public cloud is achievable - don’t let anyone tell you otherwise,” said RightScale’s Phil Cox in this session. However, you must pay attention to privacy, security, and breach notification.
For security, you need to maintain reasonable and appropriate administrative, technical, and physical safeguards on electronic protected health information (ePHI). You should therefore perform a risk analysis and implement governance practices, defined staff roles, access management, training and awareness, and a program review.
Cox explained the Omnibus Rule change and discussed what qualifies a company as a business associate vs. a conduit (watch the video). Business associates are also subject to the Omnibus Rule. “This is the kicker,” Cox said. “A service like Dropbox that stores persistent data becomes a business associate in the eyes of Health and Human Services, so they are required to be HIPAA-compliant even if they never intend to view that sensitive information or only do so on a random or infrequent basis.”
Business associates need to sign a Business Associate Agreement (BAA). Not all public cloud providers will:
- Azure will
- Datapipe may on a case-by-case basis
- AWS has made no public statement
- Google Compute Engine, Rackspace, and SoftLayer currently will not
And RightScale? Cox said, “I believe in the security of RightScale enough that I personally would sign a BAA” if it were a customer requirement.
DevOps Stories: Getting to Agile
Uri Budnik, RightScale's cloud evangelist, and Arindam Mukherjee, senior manager of DevOps at Blackhawk Network, a provider of prepaid and financial payments products, led this session on using DevOps and agile development coupled with the cloud’s ability to provide infrastructure (watch the video).
Mukherjee said Blackhawk employs 700 people of whom 300 are engineers and 40 are IT people. What do his developers - and all developers - want, he asked? The freedom to work anywhere with the tools they choose. Therefore, the first step to set up DevOps is to create a self-service portal for developers. By using RightScale ServerTemplates, Blackhawk has taken the process of deploying environments from two months down to 20 minutes. The company practices continuous integration - check in code, kick off a build, and the build script will fire up servers if they are not up. That kicks off a smoke test, then creates a deployment and builds a deployed app to that environment. They run unit tests then shut down the deployment.
Of course with that kind of freedom to create environments you now have to track costs and plan ahead. PlanForCloud.com can help with that. For DevOps, how you bring the organization together is critical.
At Blackhawk, the IT staff joined the dev team’s scrum. Lessons learned:
- Take ownership of applications.
- Embed ops people into the development process.
- Enable developers to self-provision environments.
- The DevOps philosophy and RightScale ServerTemplates can simplify application lifecycle management.
- Create a dashboard for production operation tasks.
- Surface cost information to people who manage budgets.
- Think about how to architect for the cloud, where adding more infrastructure is no longer a bottleneck.
Marketing at Scale: Delivering Marketing Campaigns in Record Time
RightScale Vice President of Marketing Kim Weins began her presentation by citing Gartner’s prediction that CMO spending will pass CIO spending on technology within five years. Because marketing departments need to roll out mobile apps, websites and other initiatives quickly to hit campaign launch dates, they need the ability to move on a dime. In a sense, marketing is like a rogue IT department - with its large budgets, it will hire someone to build what it needs.
Because marketing departments want technology that’s scalable, fast to market, and cost effective, the cloud and RightScale are a natural fit for their use. With the cloud and RightScale, you get instant provisioning - push-button access to “cloud in a click” - and developers are able to self-provision. You can auto-scale both up and down - for instance, you can set minimum and maximum of numbers of servers and pre-provision ahead of known usage spikes. You can design just the environment you need, determine how it scales, and predict your costs over time.
RightScale is the technology behind successful digital campaigns from well-known brands, and Weins presented details of hugely scalable marketing campaigns by Lady Gaga and Mars (the candy maker, not John Carter’s vacation spot), as well as ongoing projects at Mattel, InterContinental Hotels, and Sony Music, which you can see in the video.
RightScale can be particularly helpful on the services side. We have white-glove campaign launch services - ask us (well, pay us) and we’ll do it all for you, your marketing team, or your agency. We’ll set up a dev/test/production environment for you and support you for a defined period leading up to and through your campaign.
As was the case yesterday, I don’t have recaps of every session, but you can watch the videos for Using the Cloud for Mobile, Social, and Games, Connecting the Clouds, IT-as-a-Service: Back to the Future, and Uncovering New Opportunities with the HP Public Cloud. Friday at 3:30, RightScale Compute 2013 wrapped up, and our customers, partners, and staff headed home. If you attended the event, we’d like to know what you thought of it - what could we do better next year? And if you couldn’t make it, tell us in the blog comments section here how you liked our coverage of the conference via the blog and by video.