TvE 2100

At 2100 feet above Santa Barbara

How to Enable Logfile Collection and Browsing

Logfile collection and browsing will soon be available for EC2 instances through the RightScale Dashboard. This is a placeholder for information on how to enable it for your instances. Stay tuned…

Launch Your Rails App in Minutes on Amazon EC2 Using RightScale

I previously showed-off our Rails-with-Mephisto server template so now it’s time for the Rails-all-in-One server template which makes it very easy to get your own Rails application running on Amazon EC2. We’ve made this template public, so any RightScale user can use it for free!

Let me show how you can configure and deploy your own Rails app in just minutes by using our publicly available Rails All-In-One server template.

Before we begin…

Before jumping into configuring the server template to launch your app, there are two important choices that we need to make: “How will we deploy application code to the server?” and “How will we initialize the database?”

Well, there are multiple ways in which we can proceed on either question, and the good news is that all are possible using RightScripts and we support several options with the templates we provide. Here’s what our Rails-all-in-one template supports:

  • Application code:
  • Download and install your rails application code from a tarball in S3.
  • Checkout your rails application using an external svn repository.
  • Boot with capistrano already set up, and remotely deploy your app with ‘cap deploy’.

  • DB initialization:

  • Download and apply a mysqldump file from S3.
  • Run a rake task to initialize and/or populate the DB.
  • Boot with an empty DB.

Let’s start with the option of downloading the code and a mysqldump from S3, because these methods are supported out-of-the-box with our template. The example below focuses specifically on this option and I’ll follow-up shortly with a post describing the other options.

Step 1- Prepare your data

The only thing that you need to do in this step is to tar up your Rails application at your base directory, take a mysqldump of the database and store these two files somewhere accessible in S3. For example, to tar up my Rails application in my development machine I could do something like:

# cd /home/rails/myapp
# tar czf /tmp/myapp.tgz .

To generate a file with the initial DB contents, I would first populate or bootstrap the DB with the data I want to start with, and then copy the resulting file to S3. For example, to generate a dump of the “myapp_production” database I could do something like:

# suffix=`date “+%Y%m%d%H%M%S”`
# mysqldump –single-transaction -u root myapp_production | gzip -c > /tmp/myapp_prod_dump-$suffix.gz

Having generated the code tarball and the mysqldump files, we just need to upload them to S3. There’s a million and one ways in which to do that, so everyone can use his tool of choice. For convenience, the RightScale site contains an integrated S3 browser and file manager so that no external tools are necessary to perform these tasks.After logging into RightScale, go to “Manage > Storage”, select the bucket where to put the files (or create one if you don’t have any) and use the form at the bottom of the page to upload the 2 files.

For example, if I wanted to store the files I generated (i.e., “myapp.tgz” and “myapp_prod_dump-20070913224229.gz” ) into the ‘bucket_for_myapp’ bucket, the page showing the listing of the bucket would look (approximately) like:

S3 browser view

Step 2- Configure your template

Now that we have the two files ready in S3, it’s time to configure the template for our Rails application. This is done in “Design > Server Templates”, where we locate the “Rails all-in-one” template in the “Premium/Public” section, and click on the “edit” icon next to it. The edit page looks like this:

server template before edit

As you can see, there are a few variables that you can use to customize to your taste. However we’ve pre-set most of them to match common setups. The net effect is that you can probably get by with filling out only 6 text boxes.

Before I continue with the example of my Rails application called “myapp”, let’s address some security pre-requisites. When you try this with your own RightScale account, you need to make sure that you have an “ssh key” and select an appropriate “security group”. Since you are configuring a web server you must allow traffic to port 80 in the selected security group. If you have never done that before, please refer to the “quick troubleshooting” in our blog entry for reference. Once you’ve selected a valid ssh key and security group, you can proceed to fill out the main template configuration.

Back to the example! There are only 4 things that I need to do: 1. Fill in APPLICATION with the name of my Rails app: “myapp” 1. Fill in APPLICATION_CODE_BUCKET and APPLICATION_CODE_PACKAGE with the location in S3 where I have stored the code tarball. That’s “bucket_for_myapp” and “myapp.tgz” respectively 1. Fill in the DB_SCHEMA_NAME with the name of the mysql database my application needs to use: “myapp_production” 1. Fill in the DB_MYSQLDUMP_BUCKET and DB_MYSQLDUMP_PREFIX with the location in S3 where I have stored the mysqldump file: “bucket_for_myapp” and “myapp_prod_dump-20070913224229.gz”

Here we are ready to hit the save button:

server template ready to save

Step 3 - wait, there is no step 3!

That’s right, since this is a simple example I just need to click on the launch button, sit back and let the RightScale infrastructure take care of the rest:

server launch

But what happens if my cool application needs some special gems to be installed in order to run? No problem, it’s as simple as editing the template again and filling in the OPT_GEMS_LIST with a space-separated list of the gems needed. (Currently the RightScript that installs these custom gems will not handle gem installations that require manual intervention, for example, gems in which one must select the appropriate architecture to install.)

Can’t wait for the server to boot, even though it takes only a couple of minutes. Less than the time to get on the phone with a traditional hosting company sales person and place an order… Soon the server will become “operational” in the taskbar on the left of the screen. Clicking on the instance name brings up the instance information page from which the “dns name” link (something like ec2-67-00-0-00.z-1.compute-1.amazonaws.com) leads straight to the web site on the instance. Tada! my application is live!

An interesting perk from using this template is that the database gets automatically backed-up to S3 every night. The backup is tagged with a timestamp and stored in the bucket configured in the template (i.e., the DB_MYSQLDUMP_BUCKET and DB_MYSQLDUMP_PREFIX variables). Also, the next time the template is launched again, the initial DB contents will be automatically restored from the latest of the mysqldump backups that match the configured prefix (i.e., the file that matches the prefix and that has the highest number suffix).

As you see, deploying your rails application doesn’t take much at all when using the RightScale infrastructure and the new Rails all-in-one template. Once you’ve done it once, it would probably take less time to configure a brand new application than reading this blog entry. In a follow-up article I’ll explore some of the details of the provided server templates and RightScripts and will show how to take advantage of it to fully customize your Rails application, however non-standard and unique it is.

If you’re interested in learning more about RightScale, please contact us at sales@rightscale.com The Rails all-in-one server template is available with RightScale’s free accounts, but more complex set-ups are reserved for the premium accounts.

How to Enable Monitoring for EC2 Instances

Monitoring will soon be available for EC2 instances through the RightScale Dashboard. This is a placeholder for information on how to enable it for your instances. Stay tuned…

Setting Up Site on EC2 With RightScale

The key to a successful site setup on Amazon EC2 is scalability and redundancy. RightScale makes this easy by providing server templates and multi-server deployments. To get started, let’s take the simplest case: a single server set-up. We have a free “Rails all-in-one” server template that is excellent not just to play around with, but also to use as development server, staging server, or even as production server for small sites that don’t need more horsepower or much redundancy.

All-in-one site Our Rails all-in-one is described in more detail elsewhere, but you see on the right what’s involved: it runs Apache as a reverse proxy in front of 4 Mongrel/Rails processes all backed by a simple MySQL installation. Last, but not least, we set-up cron jobs that run a mysqldump every 10 minutes to Amazon S3 so you have your data safe in case the instance dies unexpectedly. Apache in the front can be set-up to serve up static and cached pages, it can do HTTP and/or HTTPS, it can canonicalize the hostname (e.g. redirect http://mysite.com to http://www.mysite.com), and it can serve-up a maintenance page while you’re updating your app. Oh, of course Apache load balances across the 4 Mongrels too!

Redundant site

Ready for more? You’re almost ready to launch for real, you expect some traffic soon and don’t want to be reliant on a single server anymore. Time to upgrade to a fully redundant site architecture using 4 servers! Redundant site The set-up almost all our customers use consists of two front-end servers and two back-end database servers giving us full redundancy. We use this ourselves for the RightScale site itself! Let’s walk through the set-up from beginning to end.

It all starts when a user types http://www.mysite.com into the browser. The browser does a DNS lookup and gets two IP addresses which are the public IPs of the two front-end instances. The browser picks one and tries to connect. If it fails, it rather quickly tries the other, this gives you the fault tolerance you need in case one of the instances dies or has other problems. Also, having multiple IP addresses for your site is the only form of fail-over that browsers support, see this page for additional details.

The first thing the request from your browser hits is Apache, which has the same roles as in the all-in-one server: dealing with SSL, canonicalizing the hostname, serving up static files, putting up a maintenance page, and anything else you might want a full-fledged web server for. For requests destined to your application, Apache acts as a reverse proxy and forwards the request to HAproxy on the same machine.

HAproxy is a very nice piece of software that proxies and load balances requests to back-end servers. We use it for HTTP here, but it can also do plain UDP and TCP load balancing, for example for DNS or mail servers. We chose HAproxy because it has good support for health checks and the ability to redirect requests to alternate servers if a back-end fails mid-way. HAproxy is set-up to send a request to each back-end process (Mongrel/Rails in our example) to ensure that it’s running properly. It then only forwards requests to servers that respond. While Apache can do load balancing across multiple back-end servers as well using mod_balance_proxy it does not include health checks. What this means is that when a sever goes down it has to send live customer requests to it every few seconds to see whether it has come back up. This means that while any Mongrel process is down on any server your customers are going to be impacted because some of their requests are being sent into a black hole. Not nice…

HA proxy forwards the request to one of the Mongrel/Rails processes on either of the two servers. Load balancing across both servers is nice because it means that you can shut the Mongrels on one server down to update the code without impacting customers at all.

Everything on the front-end servers is open source software except for your application. So we need a way you can get you app code onto the instance at boot time, and a way you can update the code. Note that for major upgrades we always recommend to launch fresh instances so you keep the old ones around for a day, just in case you want to switch back. (Hey, that’s really cheap insurance at only $2.40 per day per server!) We provide two different RightScripts to do minor code updates: one pulls the code from a tarball located on S3, the other does an svn export from your subversion repository. We recommend the S3 route for production use because else starting new servers depends on the availability of your svn repository and often the svn export is the slowest portion of the entire instance boot process. But sometimes the svn route is just so much more convenient, specially if you’re playing with a test set-up where you change the code frequently. In addition, for Rails, we set-up the app code directory structure the same way capistrano does, so you can point your capistrano config file at your instance and do a “cap update”. Again, something we don’t recommend for production servers but really handy for test and dev boxes.

Behind the front-end servers we place two replicated MySQL instances managed through our Manager for MySQL with backups to Amazon S3. We use frequent backups from the slave server where the load of the backup itself doesn’t affect production and daily backups from the master as added security.

Scalable redundant site

For a fully redundant and scalable site we recommend an architecture that is a natural extension from the 4-server set-up using more of the same components. We basically add a number of Mongrel/Rails application servers and hook them into the load balancing rotation on the two front-end servers. This array of app servers can now be expanded and contracted as warranted by the load on the web site: expand to handle surges in traffic when your PR and marketing lands a success, contract at night when the load on your site goes down and you’d rather hold on to your $$. The wonderful thing is that with this set-up you are paying for the average cost of your hosting needs, not for a once-a-month peak!

Scalable site

If you look closely we’re running the app server on the two front-end load balancing instances. We find that the load balancing takes very few resources and that there’s room for some application cycles. Using HA proxy it’s easy to have less traffic go to the local app servers than to remote dedicated instances. The reason we keep the app on the front-end instances (as opposed to switching to pure load balancing instances) is that this way there are always two app servers available even if the array is scaled back to zero servers. Or put differently, when your site is under minimal load at 4am it scales down to 4 instances as opposed to 6. If the load-balancing or serving of static files becomes a significant load, it is of course possible to switch of the app serving on the front-end or, alternatively, to add 2 additional from-end load balancing instances.

The way we currently handling the changes to the load balancer config when servers come online is to automatically edit the config file using operational RightScripts and do a seamless restart of HAproxy which ensures that no connections are dropped in the change.

If you are interested in using our site setups please don’t hesitate to try out the free Rails all-in-one server template and please contact us for more at sales@rightscale.com. The multi-server set-ups are not available in pre-packaged form with the free RightScale accounts.

The 10-minute EC2 Server

The new Rails All-In-One server template we just made public makes it really easy to get your own Rails application running on Amazon EC2. And it’s all free to boot!

The server template is a collection of RightScripts that install Apache, mod_balancer, Mongrel, and MySQL, and a backup cron script all on one EC2 instance. All you need to do is to specify where your code is located and launch the whole thing! This all-in-one server template is excellent for a number of purposes: * kick the tires of Amazon EC2 – see your own app running and play around * launch a simple site – many small sites don’t need more, the traffic load isn’t high, and if the site is down a couple of hours every few months because of some problem with the instance then that’s not the end of the world * do some development – if you need an extra dev server then this is yours cheap, whether for a few hours or for days * try something out – want to turn your app upside-down and see how it holds up, don’t mess up your own server or laptop, instead launch-wreck-discard an EC2 instance as many times as you please

Make no mistake: this server template is neither a black box nor a toy! All the configurations are available for you to inspect and modify. You can clone the template and replace the pieces you want to design differently. You can also add additional functionality or even split the server in two and run the database on a separate instance for a better performance. This is not a canned set-up like most hosting shops provide: instead, it’s a starting point that you can customize to your needs and wants. If you’re set to grow and are looking into multiple servers, check out the site architectures we recommend.

We’re readying a short how-to that takes you through launching your own Rails app using the server template step-by-step. In the meantime, you can also easily launch a demo server with Mephisto: it’s the same template but we have adapted it to get Mephisto onto it.

OK, so how do you get started? Here are the steps:

Log into RightScale

Log into RightScale, or, if you don’t have a RightScale account yet: * sign-up for a free account. * Go to your email inbox, look for the validation email, click on the link to get as couple of hours of EC2 time * Alternatively, if you have an EC2 account enter your credentials into RightScale (Settings > My Account > Credentials) and thereby get more features enabled * Create an SSH key, if you don’t have one: Design > Ec2 > SSH key

Launch a server running Mephisto for you

Mephisto is a blog engine written in Rails and the server you are about to launch has a generic Rails set-up plus the Mephisto app so you can see something in action with the fewest steps. 1. Swing over to the Server Templates using the Design menu at the left and locate the Mephisto All-In-One v1 demo server template: * Before being able to launch it, you need to specify an SSH key (so you can SSH into the server) and a security group (corresponds to ingress firewall settings): click on the edit icon on the right * Select an SSH key and a security group, ignore all the other settings, and hit “save” at the bottom (yes, we’re making this easier soon) * Now hit the “Launch” button at the top of the page and you see a page with all the settings you can change for this server template. Ignore most of them for now and put something (root@localhost will do) into the ADMIN_EMAIL field, which is empty. Then hit “Launch” again at the bottom. 1. Watch the instance that will run a demo app appear as “launched” in your Recent Tasks pane on the left: it will take 2-3 minutes to start booting, and then another 6-8 to install and configure all the software. 1. Sit back, relax and watch the server go through its boot process until it shows “operational” in the Recent Tasks pane. This may take 6-8 minutes. We’re working on reducing the boot time: most of it is actually taken by the gem install commands! 1. Once your server is operational, you are ready to use your brand new server. Go to Manage > Active Servers and click on the server’s DNS name (ec2-67-000-0-00.z-1.compute-1.amazonaws.com or similar), that should bring you straight to your own Mephisto instance! * Quick troubleshooting: if everything looks ready but all connections to your server simply time out make sure you have ports 22 (SSH) and 80 (HTTP) open in your security group setting: Design > EC2 > Security Groups, add IPs: “tcp 0.0.0.0/0 ports 22..22” and “”tcp 0.0.0.0/0 ports 80..80”. 1. Start using the app. For example, edit the url by appending “/admin” and log in using the default Mephisto user/password (i.e., “admin” and “test”), hit “create new article”, type “Hello World!”, save, go back to the root URL of your server and voila! your first article in Mephisto on your server on EC2. Woot! 1. Remember that you have just launched servers that are inexpensive, but do cost money. You can check your active instances at Manage > Servers > Active Servers, and terminate any which you’ve finished testing.

So this canned Mephisto all-in-one template is a good example to get your feet wet in bringing up a complete rails up in EC2. But unless you actually need Mephisto, this doesn’t get you that much. If you are a Rails developer you will want to bring up your own app…so stay tuned for our coming step-by-step guide to launching your own Rails “All-In-One” server using RightScale.

Setting Up MySQL With the RightScale MySQL Manager – Part 1

Setting up a redundant MySQL master/slave database using RightScale has become rather easy using our recently introduced Manager for MySQL. Let’s start by setting up the master. For this we define a “MySQL master” server template. A server template is pretty much what the name implies: it’s a template for a fully configured server that can be launched with one click of a button. Each server template is based on an AMI (Amazon Machine Image) and then adds a number of boot and operational RightScripts. (For more details see the rationale for RightScripts.) A boot RightScript is a script (bash, perl, ruby, …) augmented with input parameters and file attachments. It runs during the launching of an instance and typically configures a software component. An operational RightScript is similar to a boot script but can be run from the RightScale web dashboard anytime after the instance becomes operational. The example below will make this easy to understand.

Below is a screen shot of the freshly created “Demo MySQL master” server template with two RightScripts added. The first one switches the /mnt 160GB partition to LVM (linux volume manager) so we can take snapshot backups. The second one installs MySQL onto the server, ready to act as a master node.

MySQL server template with two RightScripts

Let’s take a look at the mysql install RightScript below to see what’s going on. First of all, most of it is a bash script. It’s augmented by a number of yum packages that will be installed before the script runs and a file attachment at the bottom that contains the my.cnf config file.

MySQL set-up RightScript details

The first two lines of the script are worth mentioning: they pull in all the EC2 meta-data and user-data that is passed into the instance at launch time. These are available from EC2 and the RightScale boot-up scripts fetch them, parse them, and store them into convenient bash, perl, and ruby include files that can be easily pulled into scripts. (We don’t actually use them in this particular script, we’re just in the habit of including this stuff everywhere.)

The script itself moves the MySQL data files onto the LVM volume previously created and sets things up ready for replication.

The next step is to complete the set-up by adding more RightScripts from our library to enable monitoring of the MySQL replication, to install S3 and SSH credentials (there are more secure ways to do this than embedding them into RightScripts, but this is the easiest for this demo), to add the Ruby MySQL gem and the RightScale MySQL Manager tools. The resulting config looks like this:

all boot scripts

Next come the operational scripts. The cool thing about these scripts is that they can be invoked from the RightScale web site. Here are the four operational RightScripts we need from our library:

all operational scripts

The four scripts do the following: * backup the database to S3 using an LVM snapshot followed by pushing the data files to S3 * restoring the database from S3 * initializing the server as a slave DB, this is useful after a fail-over to convert the master into a slave * promoting from slave to master, this is useful after a fail-over

Once the server template is launched the operational RightScripts are all available as buttons on the server’s page. This currently looks like this:

buttons for operational RightScripts

These scripts can take parameters. For example, the DB restore script needs an S3 path prefix, so if we hit that button we get a page to enter the missing values, including a drop-down box with a series of values that RightScale already knows about.

restore parameters

As soon as the script is launched it shows up in the recent tasks box as queued, and later it will complete:

restore task pending restore task completed

And very conveniently, by clicking onto the task we get an audit entry showing the log file of the script’s execution so we can verify that all is well, or troubleshoot if something went awry.

restore log file

We now have a MySQL master server up and running and we loaded the initial database data from S3. Now on to setting up a slave server in the next part of this blog series…

Amazon EC2 Changes How MySQL Is Used

Amazon EC2 will change the way MySQL is used: it suddenly opens a whole slew of new possibilities. What’s really exciting is that it can also simplify the management of MySQL which enables powerful automation as provided in the RightScale Manager for MySQL. The factors that enable this are: * it takes <10 minutes to fire up a fresh MySQL instance, and there’s a virtually limitless supply * there is virtually no cost to keeping old database instances around while setting up fresh ones * there is virtually no cost to firing up temporary database instances for special tasks

Let’s take an example: you have 1 master and 2 slaves hanging off it. Your master fails. You determine that slave 1 is at a more advanced position than slave 2, so you promote it to master. How do you proceed to get back to 1 master + 2 slaves? Without some risky magic you can’t roll slave 2 forward to the same position as slave 1 and start replicating from slave 1, so slave 2 is more or less useless. On EC2 you can discard the master and slave 2 and fire up two new slaves that you set-up fresh to replicate from the new master. In addition you can keep slave 2 around for a few hours until everything is stable again just in case a problem develops with slave 1 or you discover that you need to roll back a few transactions because something caused a problem.

This is very different from a physical set-up where you would have promoted slave 1 to master, “fixed-up” the original master machine and made it a slave of the new master, and where you would have wiped the data from slave 2 to set it up fresh. Both of these actions sound simple, but in real life they often end up being very stressful. Ideally you don’t want to touch the old master so you can do in-depth troubleshooting so you find out what went wrong and can fix the problem. But you also badly need the old master machine to be back in the cluster making for a difficult choice. Wiping data from slave 2 is also not an easy decision: what if there is a problem with slave 1, or what if you really need to run some of your read-only applications in degraded mode off the now-frozen slave 2 until you have enough machines in the cluster to take the full load? Again, on EC2 all this becomes easier: you fire up fresh instances and simply keep the old machines around until you’re confident that the new ones are ready to take over and you’ve gotten all the information you wanted off the old ones.

If you are interested in using our Manager for MySQL, please contact us at sales@rightscale.com. This stuff is not available with the free RightScale accounts.

Maintenance

We’re sorry for the inconvenience, but RightScale is currently being upgraded. The upgrade started 8/23/2007 around 00:20am PST and is expected to take 20 minutes. We appreciate your patience!

Surviving a MySQL Master DB Crash

Nothing is more heart-arresting than to find out that your database machine has died. Site down. Data gone. Life s…

That’s what happened to one of our customers yesterday morning, right when they were featured on some prominent sites. The Amazon EC2 instance hosting their master DB died. Fortunately they had tested the master-slave set-up using our Manager for MySQL, so everything was set-up to recover quickly. They IM’d me so I could help should things go wrong. We waited a couple of minutes to see whether the machine was just rebooting, but to no avail. So we hit the “promote to master” on the slave instance, and here’s the log of what happened:

[2007-08-21 16:24:45] [ServerActionsWorker] : Executing: 'Executing action: DB promote to master'
[2007-08-21 16:24:46] [ServerActionsWorker] : Using MasterDB DNS ID: 2577432 .
[2007-08-21 16:24:46] [ServerActionsWorker] : Using SlaveDB DNS ID: 2577433 .
[2007-08-21 16:24:54] [ServerActionsWorker] : No slave argument given...assuming localhost
Using C interface for mysql, client version 5.0.22
Server doesn't appear to be logging binary logs, configuring and restarting server with binary logging
Locking slave (and enabling writes)
[2007-08-21 16:28:04] [ServerActionsWorker] : Process 7927 has the lock. terminating others.
Written read_only changes to new master conf file
Stopping master (if alive), noting position, making RO, stopping and unconfiguring replication
Previously connected master db-p-master.company.com not reachable...
...Warning: assuming old master is dead and that the current contents of the Slave is the latest and best we can get.
Promoting slave...
Waiting until it catches up (if alive), stopping and unconfiguring replication, 
 unlocking tables and setting up replication privileges
Retrieved new master info...File: mysql-bin.000001 position: 98
Stopping slave and misconfiguring master
granting rep rights...
done with rights...
Unlocking tables
Demoting old master...
Changing Master DB DNS...
OK. Result: DNSID 2577432 set to this instance IP: 10.255.47.70
Mission accomplished.
[2007-08-21 16:28:04] [ServerActionsWorker] : Server action successully completed

Woot! The slave promoted to master just fine. At that point we had to bounce the Mongrel servers because, as far as we can tell, ActiveRecord just doesn’t switch to the new DNS entry for the DB in any reasonable amount of time. After verifying that the site was back up and fixing an ancilliary server that wasn’t pointing to the proper database DNS entry, we laucnhed a fresh slave with another button press.

MySql after failure

Phew, all this within about a half hour, including initial reaction and troubleshooting time and follow-up cleanup work. Everything we put in place with Manager for MySQL worked like a charm!

Redundant MySQL Set-up for Amazon EC2

In order to deploy web sites/services onto Amazon EC2 everyone needs the same components, and so we’re building them! One of the most requested and most critical pieces is a good database set-up, and mysql is clearly the highest in demand. Not that a good postgresql or oracle set-up wouldn’t be of interest or would be equally possible, just that more people are asking us (and paying us) for mysql…

What we’ve built is a mysql master/slave set-up with backup to Amazon S3. The set-up consists of one mysql master server which is the primary database server used by the application. We assume it runs on its own EC2 instance but it could probably share the instance with the application. We install LVM (linux volume manager) on the /mnt partition and place the database files there. We use LVM snapshots to back up the database to S3, this means that we get a consistent backup of the database files with only a sub-second hiccup to the database.

MySQL master and slave

Well, the snapshots for backup are actually quite a bit more complicated than that. We have to acquire a read-lock on all tables and this could block things if there is a long running query ahead of us. So there’s a timeout and retry loop which needs to balance off locking up the database and getting the backup done.

Using the snapshot backup we set-up a slave instance which then starts replicating in real-time from the master. This means that all changes to the master are propagated with milliseconds of delay to the slave, so should the master instance fail, there’s an up-to-date backup. On a master failure we promote the slave to master and set-up a fresh slave. Note that in most databases the slave lags extremely little behind the master. The main situation where the slave starts lagging is when there is a lot of write activity going on in the master. Under heavy write load the slave is slower at applying the replication to its copy than the master on the same hardware because the slave uses only a single thread to apply all changes while the master has one thread per client connection, so it can overlap network communication, cpu processing, and disk I/O using multiple threads, which the client can’t.

Periodic backups are taken off both the slave and master instances. There is very little penalty for acquiring a read lock on the slave and performing the snapshot and subsequent back-up, so it can be done every few minutes without any real impact (unless the slave has trouble keeping up as described above, in which case it’s probably time to move to multiple slaves). We also take infrequent backups on the master, say once a day, in order to guard against any problems introduced by replication.

While the mysql replication is well proven and used by many large sites in heavy production, there are failure scenarios. First of all, the application should use Innodb tables exclusively because myisam tables are not transactional and have a number of scenarios where replication fails. Even with innodb tables there are failures possible. For example, it is possible to write non-deterministic queries in SQL and since mysql uses logical replication the slave re-executes the query, and it may end up using a different execution order than the master, resulting in different data in the database. Ouch. One example is a create table with an auto-index key using a select from an existing table. The insertion order and hence the keys in the new table depend on the order in which the select is executed, and if it’s executed in a different order in the slave from the master you will end up with an unusable slave DB! (Been there, done that, it still hurts.) Thus: do back-up your master every now and then to be able to recover from such problems. (If you’re paranoid, fire-up an instance every few hours, load up a back-up, and run a few consistency checks – it’ll cost you less than a buck a day to ensure the DB backup is good, that’s cheap insurance.)

The best of all is that all the goodies described above are controlled through the RightScale web interface. Want a new slave? Just press the “set-up slave” button! Want a back-up, just press “backup” on the master or on the slave. The list of functions we have now are: * launch database instance * restore from S3 backup and configure as master * configure as slave, using DB transfer from master for initial state * promote slave to master * backup to S3 * daily backups to S3 from master * 10-minute backups to S3 from slave

We obviously still have a lot of work ahead of us to improve the flexibility of the set-up. One thing to note is that you are in control of what is executed on the database servers, so they are not opaque virtual appliances. If you need to tweak our database install, slave set-up, backup, or other code, it’s all available in scripts that you can modify. (Of course the more you modify the less we can help when things go wrong.) Also, currently all these functions are “automated” in the sense that you make a decision, push the right button, and things happen. We are adding monitoring and we will add triggers that will cause master-slave failovers automatically.

If you are interested in using our mysql master/slave set-up, please contact us at sales@rightscale.com. This stuff is not available with the free RightScale accounts.