RightScale Blog

Cloud Management Blog
RightScale 2014 State of the Cloud Report
Cloud Management Blog

Redundant MySQL Setup for Amazon EC2

In order to deploy websites and services onto Amazon EC2 everyone needs the same components, so we're building them. One of the most requested and most critical pieces is a good database setup, and MySQL is clearly the highest in demand. Not that a good PostgreSQL or Oracle setup wouldn't be of interest or wouldn't be equally possible, just that more people are asking us (and paying us) for MySQL. 

What we've built is a MySQL master/slave setup with backup to Amazon S3. The setup consists of one MySQL master server which is the primary database server used by the application. We assume it runs on its own EC2 instance but it could probably share the instance with the application. We install LVM (Linux volume manager) on the /mnt partition and place the database files there. We use LVM snapshots to back up the database to S3; this means that we get a consistent backup of the database files with only a sub-second hiccup to the database.

MySQL Master and Slave Setup

Snapshots for backup are actually more complicated than that. We have to acquire a read-lock on all tables, and this could block things if there is a long running query ahead of us. So there's a timeout and retry loop that needs to balance off locking up the database and getting the backup done.

Using the snapshot backup we set up a slave instance which then starts replicating in real time from the master. This means that all changes to the master are propagated with milliseconds of delay to the slave, so should the master instance fail, there's an up-to-date backup. On a master failure we promote the slave to master and set up a fresh slave. Note that in most databases the slave lags extremely little behind the master. The main situation where the slave starts lagging is when there is a lot of write activity going on in the master. Under heavy write load the slave is slower at applying the replication to its copy than the master on the same hardware because the slave uses only a single thread to apply all changes while the master has one thread per client connection, so it can overlap network communication, CPU processing, and disk I/O using multiple threads.

Periodic backups are taken off both the slave and master instances. There is little penalty for acquiring a read lock on the slave and performing the snapshot and subsequent backup, so it can be done every few minutes without any real impact (unless the slave has trouble keeping up as described above, in which case it's probably time to move to multiple slaves). We also take infrequent backups on the master, say once a day, in order to guard against any problems introduced by replication.

While the MySQL replication is well proven and used by many large sites in heavy production, there are failure scenarios. First of all, the application should use innodb tables exclusively because myisam tables are not transactional and have a number of scenarios where replication fails. Even with innodb tables failures are possible. For example, it is possible to write non-deterministic queries in SQL, and since MySQL uses logical replication the slave re-executes the query, and it may end up using a different execution order than the master, resulting in different data in the database. Ouch. One example is a create table with an auto-index key using a select from an existing table. The insertion order and hence the keys in the new table depend on the order in which the select is executed, and if it's executed in a different order in the slave from the master you will end up with an unusable slave DB! (Been there, done that, it still hurts.) Thus: do back up your master every now and then to be able to recover from such problems. (If you're paranoid, fire up an instance every few hours, load up a backup, and run a few consistency checks - it'll cost you less than a buck a day to ensure the database backup is good. That's cheap insurance.)

All the goodies described above are controlled through the RightScale web interface. Want a new slave? Just press the "set up slave" button! Want a backup? Press "backup" on the master or on the slave. The list of functions we have now includes:

  • launch database instance
  • restore from S3 backup and configure as master
  • configure as slave, using DB transfer from master for initial state
  • promote slave to master
  • backup to S3
  • daily backups to S3 from master
  • 10-minute backups to S3 from slave

We obviously still have a lot of work ahead of us to improve the flexibility of the setup. One thing to note is that you are in control of what is executed on the database servers, so they are not opaque virtual appliances. If you need to tweak our database install, slave setup, backup, or other code, it's all available in scripts that you can modify. (Of course the more you modify the less we can help when things go wrong.) Also, currently all these functions are "automated" in the sense that you make a decision, push the right button, and things happen. We are adding monitoring and we will add triggers that will cause master-slave failovers automatically.

If you are interested in using our MySQL master/slave set-up, please contact us at sales@rightscale.com. This stuff is not available with the free RightScale accounts.


Archived Comments

Pete Any plans on making this available as a feature you can pay for without signing up for one of those expensive packages?

Thorsten Pete, thanks for the interest. Obviously “expensive” is in the eye of the beholder. After having spent the days putting all this together, I would rate the price as really low. Together with the help you get to put it all into operation, it’s probably too low to make a profit. Maybe a bit down the line when we have it fully automated and cookie cutter we can reduce th price. When I started using EC2 I realized that it’s really easy to blow any savings away in sysadmin time. You end up launching and installing servers at an insane rate, because you can, and because it brings sooo many side-benefits. You can do things you didn’t even conceive of before. But someone or something has to install and manage all these servers. If it’s “someone” then that’s really expensive really fast. Hence “something” = RightScale. I hope that when you sum it up at the end of the month the sysadmin you didn’t need to hire thanks to RightScale saves you much more money than the check you write us. YMMV of course.

dennis I think the prices are great for established businesses, but I think that the smaller startups (who are doing the sysadmin stuff on their own time) would love to be able to launch some of the database stuff as part of the free account…

Paul Nice use of LVM for the snapshots, just be mindful that long running queries can cause the read lock to take ages to be in the place and have the database in a consistent state. Pete Zaitsev mentioned in one of his cookbook presentations that using FLUSH TABLE is a way to speed the process up. I am also reviewing the mysql-table-checksum from the MySQLtoolkit as a way to make sure the master and slave remain completely in sync. Interesting set of tools as well. Have Fun Paul

Thorsten Paul, thanks for the pointers. Yes, we understand that long running queries can delay the read lock. And if you’re not careful, your read lock request can block a pile of other queries behind it, so it’s not good to just sit there and wait for the read lock to go through. We use a relatively short timeout on the read lock and try again, and again.

Comments

[...] Redundant MySQL set-up for Amazon EC2 What we’ve built is a mysql master/slave set-up with backup to Amazon S3. The set-up consists of one mysql master server which is the primary database server used by the application. We assume it runs on its own EC2 instance but it could probably share the instance with the application. We install LVM (linux volume manager) on the /mnt partition and place the database files there. We use LVM snapshots to back up the database to S3, this means that we get a consistent backup of the database files with only a sub-second hiccup to the database. [...]
Hi there. There is good insight given in the article. Question I have has to to with rebooting. At any moment you can loose an instance which is fine. But Amazon, for maintenance may reboot the machine without you knowing exactly when this will happen. You may also want to reboot for server updates (though this would be scheduled). Let's say Amazon admins reboot or for some reason it reboots itself, I assume the slave detects the loss of master and will take over and potential initiate launch of a new slave. Can you explain what you are doing for a reboot situation when you don't want folks on the other end of your data seeing the app is dead. Obviously the original master will be coming back up also.
Posted by David (not verified)   Ι   June 14, 2008   Ι   06:40 AM
[...] Redundant MySQL set-up for Amazon EC2 « RightScale Blog What we’ve built is a mysql master/slave set-up with backup to Amazon S3. The set-up consists of one mysql master server which is the primary database server used by the application. We assume it runs on its own EC2 instance but it could probably share (tags: mysql replication s3 rightscale) [...]
David, good question. First of all, reboots are actually very rare. I suspect they have more to do with transient hardware failures (like double memory ECC errors) than AWS doing. We currently don't automatically promote the slave to master as this is a very delicate operation. Gotta ensure the master is really not servicing requests first, etc.
Hi all, Is it possible using rightscale's interface to implement, at the same time, a redundant MySQL Set-up and a replication of many slaves containing only a subset of the tables present in the master? Architecture: 1 master, 1 or + slave(s) replicated with all tables for redundant purpose 1 or + slave(s) replicated with a subset of the tables for performance purpose As you can guess, the application will be a 90%read-10%write. I think it may be possible using 3 different AMI and add/start/stop. Regards, Marcello
Posted by Marcello (not verified)   Ι   August 14, 2008   Ι   02:23 AM
It is definitely possible, but not something we provide out of the box. All the scripts for the replication are open and modifiable by our customers, so you'd have a good starting point. Some of the details are that we do an LVM snapshot followed by a copy of the files to initialize the slave. Unless you use distinct data files all InnoDB tables end up in one file and would get copied to all slaves. If that's acceptable, all you'd have to do is modify the set-up of each slave to only keep replicating the appropriate tables. If you want to avoid the initial replication of all the data then more work is required. Hope this helps.
[...] a persistent filesystem, developers on Amazon's cloud computing platform have had to come up with sophisticated solutions involving backing data up manually from EC2 to S3 to get the desired [...]
Nitin: with EBS you still want a redundant set-up, it's by far the fastest way to switch to a fresh instance if the current one has a problem. Also, if something bad happens within the availability zone, you may not be able to get to your volume... Better have a slave running in a 2nd zone!
Is that still necessary with persistent block storage?
Posted by Nitin (not verified)   Ι   October 27, 2008   Ι   10:52 AM
[...] MySQL master-slave setup for EC2 – but “We currently don’t automatically promote the slave to master as this is a very delicate operation. Gotta ensure the master is really not servicing requests first, etc.” [...]

Post a comment