From af74e8c6cdb872230f9dead4f49ee5ebe9f54316 Mon Sep 17 00:00:00 2001 From: Shlomi Noach Date: Sun, 25 Dec 2016 11:46:14 +0200 Subject: [PATCH] Resurrection documentation --- README.md | 9 ++++----- RELEASE_VERSION | 2 +- doc/command-line-flags.md | 8 ++++++++ doc/resurrect.md | 42 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 55 insertions(+), 6 deletions(-) create mode 100644 doc/resurrect.md diff --git a/README.md b/README.md index 451b5b2..b4106fe 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ In addition, it offers many [operational perks](doc/perks.md) that make it safer - Auditing: you may query `gh-ost` for status. `gh-ost` listens on unix socket or TCP. - Control over cut-over phase: `gh-ost` can be instructed to postpone what is probably the most critical step: the swap of tables, until such time that you're comfortably available. No need to worry about ETA being outside office hours. - External [hooks](doc/hooks.md) can couple `gh-ost` with your particular environment. +- [Resurrection](doc/resurrect.md) can resume a failed migration, proceeding from last known good position. Please refer to the [docs](doc) for more information. No, really, read the [docs](doc). @@ -76,19 +77,17 @@ But then a rare genetic mutation happened, and the `c` transformed into `t`. And ## Community -`gh-ost` is released at a stable state, but with mileage to go. We are [open to pull requests](https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md). Please first discuss your intentions via [Issues](https://github.com/github/gh-ost/issues). +`gh-ost` is released at a stable state, and still with mileage to go. We are [open to pull requests](https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md). Please first discuss your intentions via [Issues](https://github.com/github/gh-ost/issues). We develop `gh-ost` at GitHub and for the community. We may have different priorities than others. From time to time we may suggest a contribution that is not on our immediate roadmap but which may appeal to others. ## Download/binaries/source -`gh-ost` is now GA and stable. - -`gh-ost` is available in binary format for Linux and Mac OS/X +`gh-ost` is GA and stable, available in binary format for Linux and Mac OS/X [Download latest release here](https://github.com/github/gh-ost/releases/latest) -`gh-ost` is a Go project; it is built with Go 1.5 with "experimental vendor". Soon to migrate to Go 1.6. See and use [build file](https://github.com/github/gh-ost/blob/master/build.sh) for compiling it on your own. +`gh-ost` is a Go project; it is built with Go 1.7. See and use [build file](https://github.com/github/gh-ost/blob/master/build.sh) for compiling it on your own. Generally speaking, `master` branch is stable, but only [releases](https://github.com/github/gh-ost/releases) are to be used in production. diff --git a/RELEASE_VERSION b/RELEASE_VERSION index 15245f3..9084fa2 100644 --- a/RELEASE_VERSION +++ b/RELEASE_VERSION @@ -1 +1 @@ -1.0.32 +1.1.0 diff --git a/doc/command-line-flags.md b/doc/command-line-flags.md index 1707685..8b10030 100644 --- a/doc/command-line-flags.md +++ b/doc/command-line-flags.md @@ -111,6 +111,14 @@ See also: [Sub-second replication lag throttling](subsecond-lag.md) Typically `gh-ost` is used to migrate tables on a master. If you wish to only perform the migration in full on a replica, connect `gh-ost` to said replica and pass `--migrate-on-replica`. `gh-ost` will briefly connect to the master but other issue no changes on the master. Migration will be fully executed on the replica, while making sure to maintain a small replication lag. +### resurrect + +It is possible to resurrect/resume a failed migration. Such a migration would be a valid execution, which bailed out throughout the migration process. A migration would bail out on meeting with `--critical-load`, or perhaps a user `kill -9`'d it. + +Use `--resurrect` with exact same other flags (same `--database, --table, --alter`) to resume a failed migration. + +Read more on [resurrection docs](resurrect.md) + ### skip-foreign-key-checks By default `gh-ost` verifies no foreign keys exist on the migrated table. On servers with large number of tables this check can take a long time. If you're absolutely certain no foreign keys exist (table does not referenece other table nor is referenced by other tables) and wish to save the check time, provide with `--skip-foreign-key-checks`. diff --git a/doc/resurrect.md b/doc/resurrect.md new file mode 100644 index 0000000..d66d707 --- /dev/null +++ b/doc/resurrect.md @@ -0,0 +1,42 @@ +# Resurrection + +`gh-ost` supports resurrection of a failed migration, continuing the migration from last known good position, potentially saving hours of clock-time. + +A migration may fail as follows: + +- On meeting with `--critical-load` +- On successively meeting with a specific error (e.g. recurring locks) +- Being `kill -9`'d by a user +- MySQL crash +- Server crash +- Robots taking over the world and other reasons. + +### --resurrect + +One may resurrect such a migration by running the exact same command, adding the `--resurrect` flag. + +The terms for resurrection are: + +- Exact same database/table/alter +- Previous migration ran for at least one minute +- Previous migration began looking at row-copy and event handling (by `1` minute of execution you may expect this to be the case) + +### How does it work? + +`gh-ost` dumps its migration status (context) once per minute, onto the _changelog table_. The changelog table is used for internal bookkeeping, and manages heartbeat and internal message passing. + +When `--resurrect` is provided,`gh-ost` attempts to find such status dump in the changelog table. Most interestingly this status included: + +- Last handled binlog event coordinates (any event up to that point has been applied to _ghost_ table) +- Last copied chunk range +- Other useful information + +Resurrection reconnects the streamer at last handled binlog coordinates, and skips rowcopy to proceed from last copied chunk range. + +Noteworthy is that it is not important to resume from _exact same_ coordinates and chunk as last applied; the context dump only runs once per minute, and resurrection may re-apply a minute's worth of binary logs, and re-iterate a minute's work of copied chunks. + +Row-based replication has the property of being idempotent for DML events. There is no damage in reapplying contiguous binlog events starting at some point in the past. + +Chunk-reiteration likewise poses no integrity concern and there is no harm in re-copying same range of rows. + +The only concern is to never skip binlog events, and never skip a row range. By virtue of only dumping events and ranges that have been applied, and by virtue of only proceessing binlog events and chunks moving forward, `gh-ost` keeps integrity intact.