How to run MySQL migrations without any downtime

Jean Rougé
Tripping Engineering
4 min readApr 3, 2017

--

Here at Tripping.com, one of our main repositories is a Ruby on Rails application, with a MySQL database on the backend. Like many other startups, we initially chose MySQL as the database because it is the standard choice to make when building fast on a Rails application.

Where MySQL falls short

However, MySQL does come with a number of drawbacks.

Most migrations require downtime

The most severe limitation of MySQL is its inability to run migrations without locking tables. This means that whenever you run a migration on a given table, your application is unable to read from or write to that table. This in turn means that if you need to e.g. add a column to one of your biggest, most-used tables, you will need to take your application down while the migration runs. This can last for minutes… or hours.

Single point of failure

The second area where MySQL falls short is the lack of an out-of-the-box high availability solution. The standard way to set up MySQL is to have a master with one or more slaves - when the master fails, a manual task needs to be taken where a slave is promoted to be the new master and all other slaves are given the new master as their source to ensure that all data is in sync again. Here at Tripping.com, we strive to make our infrastructure resilient enough to be able to automatically recover from any two servers going down before needing manual intervention. In particular, having our MySQL master as a single point of failure is not acceptable to us.

We fixed these two issues by using MariaDB Galera Cluster together with Hashicorp’s Consul.

Galera is nothing new… it already existed under the Romans!

MariaDB Galera Cluster

MariaDB is a drop-in replacement for MySQL. It can be used as a cluster, and allows us to get rid of the master-slave paradigm. In a Galera cluster, all nodes are equivalent- no master, and no slaves. You can write to, and read from, any nodes, and they always keep in sync with each other using row replication.

This works in theory, but in practice, one has to be careful to not write conflicting data to two nodes at the same time, since that can lead to sync issues. We have found that a good way to avoid issues is to ensure that all application nodes using a given database would use the same MariaDB node to write to at any time.

Hashicorp’s Consul

Consul is a robust, distributed, and easy-to-use service discovery tool. We like to think of it as bestowing your servers with a hive mind: Consul allows your boxes to monitor each other, and each other’s services, in real time.

Here is how we typically make Consul monitor that each of our MariaDB nodes are up and running, and also in sync:

As soon as any one of them gets out of sync, the Consul client on each of our application nodes is immediately aware of it, and can make sure that application node will talk to other, healthy DB nodes instead.

This alone is enough to get rid of our second issue, and to make sure the database can automatically recover from any node failing!

How to use both enable zero-downtime migrations for our Rails application

Galera supports Rolling Schema Upgrade (RSU) schema changes. They work by desynchronizing a database node, running the schema change on only that node, and then having it rejoin the cluster and resynchronize. This occurs for each node, one at a time.

Even better, that procedure can be fully automated. Consul gives us the list of all database nodes, and then iterating over that list, we take each DB node “down” one at a time (by putting it in maintenance mode), run the migration on it, and wait for it to be fully synced again to process the next database node.

This can be all neatly encapsulated in an ActiveSupport::Concern:

Where Consul::GaleraService in the snippet above is just a light wrapper around the excellent Diplomat gem.

Now, running a RSU migration becomes very simple. For example, if we want to change a column’s nullable setting, we can do the following:

Thanks for reading, we hope this helped you, and look forward to sharing more about engineering at Tripping.com soon!

Further reading

--

--