GitHub's Online Schema-migration Tool for MySQL
Go to file
Shlomi Noach 30f9212ab1 more doc
2016-07-16 07:14:25 -06:00
.github initial CONTRIBUTING.md 2016-05-20 13:10:50 +02:00
doc more doc 2016-07-16 06:57:22 -06:00
go max-lag-millis is dynamicly controllable 2016-07-13 09:44:00 +02:00
vendor more attempts 2016-06-16 11:49:16 +02:00
build.sh max-lag-millis is dynamicly controllable 2016-07-13 09:44:00 +02:00
LICENSE Initial commit 2016-03-21 11:08:52 +01:00
README.md more doc 2016-07-16 07:14:25 -06:00

gh-ost

GitHub's online schema migration for MySQL

gh-ost is a triggerless online schema migration solution for MySQL. It is testable and provides with pausability, dynamic control/reconfiguration, auditing, and many operational perks.

gh-ost produces a light workload on the master throughout the migration, decoupled from the existing workload on the migrated table.

It has been designed based on years of experience with existing solutions, and changes the paradigm of table migrations.

gh-ost logo

How?

All existing online-schema-change tools operate in similar manner: they create a ghost table in the likeness of your original table, migrate that table while empty, slowly and incrementally copy data from your original table to the ghost table, meanwhile propagating ongoing changes (any INSERT, DELETE, UPDATE applied to your table) to the ghost table. Finally, at the right time, they replace your original table with the ghost table.

gh-ost uses the same pattern. However it differs from all existing tools by not using triggers. We have recognized the triggers to be the source of many limitations and risks.

Instead, gh-ost uses the binary log stream to capture table changes, and asynchronously applies them onto the ghost table. gh-ost takes upon itself some tasks that other tools leave for the database to perform. As result, gh-ost has greater control over the migration process; can truly suspend it; can truly decouple the migration's write load from the master's workload.

In addition, it offers many operational perks that make it safer, trustworthy and fun to use.

gh-ost general flow

WORK IN PROGRESS

Please meanwhile refer to the docs for more information. No, really, go to the docs.

Usage

Where to execute

The recommended way of executing gh-ost is to have it connect to a replica, as opposed to having it connect to the master. gh-ost will crawl its way up the replication chain to figure out who the master is.

By connecting to a replica, gh-ost sets up a self-throttling mechanism; feels more comfortable in querying information_schema tables; and more. Connecting gh-ost to a replica is also the trick to make it work even if your master is configured with statement based replication, as gh-ost is able to manipulate the replica to rewrite logs in row based replication. See Migrating with Statement Based Replication.

The replica would have to use binary logs and be configured with log_slave_updates.

It is still OK to connect gh-ost directly on master; you will need to confirm this by providing --allow-on-master. The master would have to be using row based replication.

gh-ost itself may be executed from anywhere. It connects via tcp and it does not have to be executed from a MySQL box. However, do note it generates a lot of traffic, as it connects as a replica and pulls binary log data.

Testing on replica

Newcomer? We think you would enjoy building trust with this tool. You can ask gh-ost to simulate a migration on a replica -- this will not affect data on master and will not actually do a complete migration. It will operate on a replica, and end up with two tables: the original (untouched), and the migrated. You will have your chance to compare the two and verify the tool works to your satisfaction.

gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_running=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --test-on-replica --verbose --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --throttle-flag-file=/tmp/ghost.throttle.flag

Please read more on testing on replica

Migrating a master table

gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --initially-drop-ghost-table --initially-drop-old-table --max-load=Threads_running=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --verbose --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --throttle-flag-file=/tmp/ghost.throttle.flag [--execute]

Note: in order to migrate a table on the master you don't need to connect to the master. gh-ost is happy (and prefers) if you connect to a replica; it then figures out the identity of the master and makes the connection itself.

What's in a name?

Originally this was named gh-osc: GitHub Online Schema Change, in the likes of Facebook online schema change and pt-online-schema-change.

But then a rare genetic mutation happened, and the s transformed into t. And that sent us down the path of trying to figure out a new acronym. Right now, gh-ost (pronounce: Ghost), stands for:

  • GitHub Online Schema Transmogrifier/Translator/Transformer/Transfigurator

Pronounce: ghost

Authors

gh-ost is designed, authored, reviewed and tested by the database infrastructure team at GitHub: