Merge pull request #43 from github/documentation

Beginning documentation
This commit is contained in:
Shlomi Noach 2016-05-25 12:36:46 +02:00
commit c63df257db
10 changed files with 306 additions and 1 deletions

26
.github/CONTRIBUTING.md vendored Normal file
View File

@ -0,0 +1,26 @@
## Contributing
Hi there! We're thrilled that you'd like to contribute to this project. Your help is essential for keeping it great.
This project adheres to the [Open Code of Conduct](http://todogroup.org/opencodeofconduct/#gh-ost/opensource@github.com). By participating, you are expected to uphold this code.
## Submitting a pull request
0. [Fork](https://github.com/github/gh-ost/fork) and clone the repository
0. Create a new branch: `git checkout -b my-branch-name`
0. Make your change, add tests, and make sure the tests still pass
0. Push to your fork and [submit a pull request](https://github.com/github/gh-ost/compare)
0. Pat your self on the back and wait for your pull request to be reviewed and merged.
Here are a few things you can do that will increase the likelihood of your pull request being accepted:
- Follow the [style guide](https://golang.org/doc/effective_go.html#formatting).
- Write tests.
- Keep your change as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests.
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).
## Resources
- [Contributing to Open Source on GitHub](https://guides.github.com/activities/contributing-to-open-source/)
- [Using Pull Requests](https://help.github.com/articles/using-pull-requests/)
- [GitHub Help](https://help.github.com)

0
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

0
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

View File

@ -1,2 +1,30 @@
# gh-ost # gh-ost
GitHub's Online Schema Change for MySQL
#### GitHub's online schema migration for MySQL
`gh-ost` allows for online schema migrations in MySQL which are:
- Triggerless
- Testable
- Pausable
- Operations-friendly
## How?
WORK IN PROGRESS
Please meanwhile refer to the [docs](doc) for more information.
## What's in a name?
Originally this was named `gh-osc`: GitHub Online Schema Change, in the likes of [Facebook online schema change](https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/) and [pt-online-schema-change](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
But then a rare genetic mutation happened, and the `s` transformed into `t`. And that sent us down the path of trying to figure out a new acronym. Right now, `gh-ost` (pronounce: _Ghost_), stands for:
- GitHub Online Schema Translator/Transformer/Transfigurator
## Authors
`gh-ost` is designed, authored, reviewed and tested by the database infrastructure team at GitHub:
- [@jonahberquist](https://github.com/jonahberquist)
- [@ggunson](https://github.com/ggunson)
- [@tomkrouper](https://github.com/tomkrouper)
- [@shlomi-noach](https://github.com/shlomi-noach)

16
doc/command-line-flags.md Normal file
View File

@ -0,0 +1,16 @@
# Command line flags
A more in-depth discussion of various `gh-ost` command line flags: implementation, implication, use cases.
##### exact-rowcount
A `gh-ost` execution need to copy whatever rows you have in your existing table onto the ghost table. This can, and often be, a large number. Exactly what that number is?
`gh-ost` initially estimates the number of rows in your table by issuing an `explain select * from your_table`. This will use statistics on your table and return with a rough estimate. How rough? It might go as low as half or as high as double the actual number of rows in your table. This is the same method as used in [`pt-online-schema-change`](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
`gh-ost` also supports the `--exact-rowcount` flag. When this flag is given, two things happen:
- An initial, authoritative `select count(*) from your_table`.
This query may take a long time to complete, but is performed before we begin the massive operations.
- A continuous update to the estimate as we make progress applying events.
We heuristically update the number of rows based on the queries we process from the binlogs.
While the ongoing estimated number of rows is still heuristic, it's almost exact, such that the reported [ETA](understanding-output.md) or percentage progress is typically accurate to the second throughout a multiple-hour operation.

24
doc/migrating-with-sbr.md Normal file
View File

@ -0,0 +1,24 @@
# Migrating with Statement Based Replication
Even though `gh-ost` relies on Row Based Replication (RBR), it does not mean you can't keep your Statement Based Replication (SBR).
`gh-ost` is happy to, and actually prefers and suggests so, connect to a replica. On this replica, it is happy to:
- issue the heavyweight `INFORMATION_SCHEMA` queries that make a table structure analysis
- issue a `select count(*) from mydb.mytable`, should `--exact-rowcount` be provided
- connect itself as a fake replica to get the binary log stream
All of the above can be executed on the master, but we're more comfortable that they execute on a replica.
Please note the third item: `gh-ost` connects as a fake replica and pulls the binary logs. This is how `gh-ost` finds the table's changelog: it looks up entries in the binary log.
The magic is that your master can still produce SRB, but if you have a replica with `log-slave-updates`, you can also configure it to have `binlog_format='ROW'`. Such a replica accepts SBR statements from its master, and produces RBR statements onto its binary logs.
`gh-ost` is happy to modify the `binlog_format` on the replica for you:
- If you supply `--switch-to-rbr`, `gh-ost` will convert the binlog format for you, and restart replication to make sure this takes effect.
- If your replica is an intermediate master, i.e. further serves as a master to other replicas, `gh-ost` will not convert the `binlog_format`.
- At any case, `gh-ost` **will not** convert back to `STATEMENT` (SBR). This is because you may be running multiple migrations concurrently. Being able to run concurrent migrations is one of the design goals of this tool. It's your own responsibility to switch back to SBR once all pending migrations are complete.
### Summary
- If you're already using RBR, all is well for you
- If not, convert one of your replicas to `binlog_format='ROW'`, or let `gh-ost` do this for you.

14
doc/swapping-tables.md Normal file
View File

@ -0,0 +1,14 @@
# Swapping the tables
The table-swap is the final major step of the migration: it's the moment where your original table is pushed aside, and the ghost table (the one we secretly altered and operated on throughout the process) takes its place.
MySQL poses some limitations on how the table swap can take place. While it supports an atomic swap, it does not allow for a swap under controlled lock.
The [facebook OSC](https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/) tool documents this nicely. Look for **"Cut-over phase"**.
`gh-ost` supports various types of table-swap / cut-over options:
- `--quick-and-bumpy-swap-tables` - this method is similar to the one taken by the facebook OSC. It's non-blocking but also non-atomic. The original table is first renames and pushed aside, then the ghost table is renamed to take its place. In between the two renames there's a brief period of time where your table just does not exist, and queries will fail.
- Voluntary lock based solution (default at this time): as depicted in [Solving the Facebook-OSC non-atomic table swap problem](http://code.openark.org/blog/mysql/solving-the-facebook-osc-non-atomic-table-swap-problem), this solution uses voluntary MySQL locks, and makes for a blocking swap, where your queries do not fail, but block until operation is complete. This effect is desired. There is danger in this solution, since connection failure of the two sessions involved in creating the lock, would result in a premature swap of the tables, hence with potentially corrupted data.
- We are working at this time on a blocking, safe, atomic solution, using wait conditions and via User Defined Functions which will need to be dynamically loaded onto your MySQL server.
- With [`--test-on-replica`](testing-on-replica.md) there is no table swap.

59
doc/testing-on-replica.md Normal file
View File

@ -0,0 +1,59 @@
# Testing on replica
`gh-ost`'s design allows for trusted and reliable tests of the migration without compromising production data integrity.
Test on replica if you:
- Are unsure of `gh-ost`, have not gained confidence into its workings
- Just want to experiment with a real migration without affecting production (maybe measure migration time?)
- Wish to observe data change impact
## What testing on replica means
TL;DR `gh-ost` will make all changes on a replica and leave both original and ghost tables for you to compare.
## Issuing a test drive
Apply `--test-on-replica --host=<a.replica>`.
- `gh-ost` would connect to the indicated server
- Will verify this is indeed a replica and not a master
- Will perform _everything_ on this replica. Other then checking who the master is, it will otherwise not touch it.
- All `INFORMATION_SCHEMA` and `SELECT` queries run on the replica
- Ghost table is created on the replica
- Rows are copied onto the ghost table on the replica
- Binlog events are read from the replica and applied to ghost table on the replica
- So... everything
`gh-ost` will sync the ghost table with the original table.
- When it is satisfied, it will issue a `STOP SLAVE IO_THREAD`, effectively stopping replication
- Will finalize last few statements
- Will terminate. No table swap takes place. No table is dropped.
You are now left with the original table **and** the ghost table. They _should_ be identical.
You now have the time to verify the tool works correctly. You may checksum the entire table data if you like.
- e.g.
`mysql -e 'select * from mydb.mytable order by id' | md5sum`
`mysql -e 'select * from mydb._mytable_gst order by id' | md5sum`
- or of course only select the shared columns before/after the migration
- We use the trivial `engine=innodb` for `alter` when testing. This way the resulting ghost table is identical in structure to the original table (including indexes) and we expect data to be completely identical. We use `md5sum` on the entire dataset to confirm the test result.
### Cleanup
It's your job to:
- Drop the ghost table (at your leisure, you should be aware that a `DROP` can be a lengthy operation)
- Start replication back (via `START SLAVE`)
### Examples
Simple:
```shell
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --verbose --execute
```
Elaborate:
```shell
$ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample_table --alter="engine=innodb" --chunk-size=2000 --max-load=Threads_connected=20 --switch-to-rbr --initially-drop-ghost-table --initially-drop-old-table --test-on-replica --postpone-swap-tables-flag-file=/tmp/ghost-postpone.flag --exact-rowcount --allow-nullable-unique-key --verbose --execute
```
- Count exact number of rows (makes ETA estimation very good). This goes at the expense of paying the time for issuing a `SELECT COUNT(*)` on your table. We use this lovingly.
- Automatically switch to `RBR` if replica is configured as `SBR`. See also: [migrating with SBR](migrating-with-sbr.md)
- allow iterating on a `UNIQUE KEY` that has `NULL`able columns (at your own risk)

View File

138
doc/understanding-output.md Normal file
View File

@ -0,0 +1,138 @@
# Understading gh-ost output
`gh-ost` attempts to be verbose to the point where you really know what it's doing, without completely spamming you.
You can control output levels:
- `--verbose`: common use. Useful output, not tons of it
- `--debug`: everything. Tons of output.
Initial output lines may look like this:
```
2016-05-19 17:57:04 INFO starting gh-ost 0.7.14
2016-05-19 17:57:04 INFO Migrating `mydb`.`mytable`
2016-05-19 17:57:04 INFO connection validated on 127.0.0.1:3306
2016-05-19 17:57:04 INFO User has ALL privileges
2016-05-19 17:57:04 INFO binary logs validated on 127.0.0.1:3306
2016-05-19 17:57:04 INFO Restarting replication on 127.0.0.1:3306 to make sure binlog settings apply to replication thread
2016-05-19 17:57:04 INFO Table found. Engine=InnoDB
2016-05-19 17:57:05 INFO As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while
2016-05-19 17:57:11 INFO Exact number of rows via COUNT: 4466810
2016-05-19 17:57:11 INFO --test-on-replica given. Will not execute on master the.master:3306 but rather on replica 127.0.0.1:3306 itself
2016-05-19 17:57:11 INFO Master found to be 127.0.0.1:3306
2016-05-19 17:57:11 INFO connection validated on 127.0.0.1:3306
2016-05-19 17:57:11 INFO Registering replica at 127.0.0.1:3306
2016-05-19 17:57:11 INFO Connecting binlog streamer at mysql-bin.002587:348694066
2016-05-19 17:57:11 INFO connection validated on 127.0.0.1:3306
2016-05-19 17:57:11 INFO rotate to next log name: mysql-bin.002587
2016-05-19 17:57:11 INFO connection validated on 127.0.0.1:3306
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_gst`
2016-05-19 17:57:11 INFO Table dropped
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_old`
2016-05-19 17:57:11 INFO Table dropped
2016-05-19 17:57:11 INFO Creating ghost table `mydb`.`_mytable_gst`
2016-05-19 17:57:11 INFO Ghost table created
2016-05-19 17:57:11 INFO Altering ghost table `mydb`.`_mytable_gst`
2016-05-19 17:57:11 INFO Ghost table altered
2016-05-19 17:57:11 INFO Droppping table `mydb`.`_mytable_osc`
2016-05-19 17:57:11 INFO Table dropped
2016-05-19 17:57:11 INFO Creating changelog table `mydb`.`_mytable_osc`
2016-05-19 17:57:11 INFO Changelog table created
2016-05-19 17:57:11 INFO Chosen shared unique key is PRIMARY
2016-05-19 17:57:11 INFO Shared columns are id,name,ref,col4,col5,col6
```
Those are relatively self explanatory. Mostly they indicate that all goes well.
You will be mostly interested in following up on the migration and understanding whether it goes well. Once migration actually begins, you will see output as follows:
```
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 0/100; Elapsed: 0s(copy), 6s(total); streamer: mysql-bin.002587:348727198; ETA: N/A
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 1s(copy), 7s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=83.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 2s(copy), 8s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=79.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 3s(copy), 9s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=74.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 4s(copy), 10s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=69.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 5s(copy), 11s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=65.000000s
...
```
In the above we're mostly interested to see that `ETA: throttled, replica-lag=65.000000s`.
- Migration is throttled, i.e. `gh-ost` finds that the server is too busy, or replication is too far behind, and so it ceases (or does not start) data copy operation.
- It also provides a reason for the throttling. In out case it seems replication is too far behind. `gh-ost` awaits until replication lag is smaller than `--max-lag-millis`.
However another thing catches the eye: `Backlog: 0/100` transitions into `Backlog: 100/100`
- `Backlog` is the binlog events queue. A queue of events read from the binary log which are relevant for the migration. The queue gets emptied as events are applied onto the ghost table. Typically we want to see that queue empty or almost empty. However, due to the fact we're not throttled it makes perfect sense that the queue is full: htrottling means we do not apply events onto the ghost table, hence we do not purge the queue.
```
...
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 16s(copy), 22s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=8.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 17s(copy), 23s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=2.000000s
Copy: 0/4466885 0.0%; Applied: 1492; Backlog: 100/100; Elapsed: 18s(copy), 24s(total); streamer: mysql-bin.002587:358722182; ETA: N/A
Copy: 0/4466942 0.0%; Applied: 2966; Backlog: 100/100; Elapsed: 19s(copy), 25s(total); streamer: mysql-bin.002587:367190999; ETA: N/A
Copy: 0/4466993 0.0%; Applied: 4462; Backlog: 1/100; Elapsed: 20s(copy), 26s(total); streamer: mysql-bin.002587:376732190; ETA: N/A
Copy: 12500/4466994 0.3%; Applied: 4496; Backlog: 2/100; Elapsed: 21s(copy), 27s(total); streamer: mysql-bin.002587:381475469; ETA: N/A
Copy: 25000/4466997 0.6%; Applied: 4535; Backlog: 6/100; Elapsed: 22s(copy), 28s(total); streamer: mysql-bin.002587:386747649; ETA: N/A
Copy: 40000/4467001 0.9%; Applied: 4582; Backlog: 3/100; Elapsed: 23s(copy), 29s(total); streamer: mysql-bin.002587:393017028; ETA: N/A
```
In the above, `gh-ost` found replication to be caught up and began operation. We note:
- `Backlog` goes down to `1` or `2` or otherwise smaller numbers. This means we are good with processing the binlog events and applying them onto the ghost table.
- `Applied` is the incrementing number of events we have applied from the binary log onto the ghost table, since the migration began.
- `Copy`: at the beginning the tool estimated `4466810` rows already existing in the table. Initially `0` of them are copied, hence `0/4466810`. But as `gh-ost` makes progress, this number grows:
- `12500/4466994 0.3%`
- `25000/4466997 0.6%`
- `40000/4467001 0.9%`
- You can also observe that the number of rows changes. This is implied by the flag `--exact-rowcount`, where we try and keep an updated amount of rows were are going to process throughout the migration, even as new rows are added and old rows deleted. This is not an exact number, but turns out to be a pretty good estimate.
- `Elapsed: 23s(copy), 29s(total)`: `total` stands for total time from executing of `gh-ost`. `copy` stands for the time elapsed since `gh-ost` finished making preparations and was good to go with copy.
- `streamer: mysql-bin.002587:393017028` tells us which binary log entry is `gh-ost` processing at this time.
- `ETA`: Estimated Time of Arrival, is still `N/A` since `gh-ost` has not collected enough data to make an estimate.
Some time later, we will have:
```
Copy: 50000/4467001 1.1%; Applied: 4620; Backlog: 6/100; Elapsed: 24s(copy), 30s(total); streamer: mysql-bin.002587:396414283; ETA: 35m20s
Copy: 62500/4467002 1.4%; Applied: 4671; Backlog: 3/100; Elapsed: 25s(copy), 31s(total); streamer: mysql-bin.002587:402582372; ETA: 29m21s
Copy: 75000/4467003 1.7%; Applied: 4703; Backlog: 3/100; Elapsed: 26s(copy), 32s(total); streamer: mysql-bin.002587:407864888; ETA: 25m22s
Copy: 87500/4467004 2.0%; Applied: 4751; Backlog: 6/100; Elapsed: 27s(copy), 33s(total); streamer: mysql-bin.002587:413142992; ETA: 22m31s
Copy: 100000/4467004 2.2%; Applied: 4795; Backlog: 6/100; Elapsed: 28s(copy), 34s(total); streamer: mysql-bin.002587:418380729; ETA: 20m22s
Copy: 112500/4467005 2.5%; Applied: 4835; Backlog: 1/100; Elapsed: 29s(copy), 35s(total); streamer: mysql-bin.002587:423592450; ETA: 18m42s
```
And `gh-ost` progressively provides an ETA.
Status frequency:
- In the first `60` seconds `gh-ost` emits a status entry every `1` second.
- Then, up till `3` miinutes into operation, status shows every `5` seconds.
- It then drops down to once per `30` seconds
- But goes into once-per-`5`-seconds again when it estimates < `3` minutes ETA
- And once per `1` second when it estimates < `1` minute ETA
```
Copy: 602500/4467053 13.5%; Applied: 6770; Backlog: 0/100; Elapsed: 1m14s(copy), 1m20s(total); streamer: mysql-bin.002587:630949369; ETA: 7m54s
Copy: 655000/4467060 14.7%; Applied: 6985; Backlog: 6/100; Elapsed: 1m19s(copy), 1m25s(total); streamer: mysql-bin.002587:652696032; ETA: 7m39s
Copy: 707500/4467066 15.8%; Applied: 7207; Backlog: 0/100; Elapsed: 1m24s(copy), 1m30s(total); streamer: mysql-bin.002587:674577141; ETA: 7m26s
...
Copy: 1975000/4466798 44.2%; Applied: 12919; Backlog: 2/100; Elapsed: 3m24s(copy), 3m30s(total); streamer: mysql-bin.002588:119901391; ETA: 4m17s
Copy: 2285000/4466855 51.2%; Applied: 14234; Backlog: 13/100; Elapsed: 3m54s(copy), 4m0s(total); streamer: mysql-bin.002588:243346615; ETA: 3m43s
...
Copy: 4397500/4467226 98.4%; Applied: 22996; Backlog: 8/100; Elapsed: 7m11s(copy), 7m17s(total); streamer: mysql-bin.002588:1063945589; ETA: 6s
Copy: 4410000/4467227 98.7%; Applied: 23045; Backlog: 5/100; Elapsed: 7m12s(copy), 7m18s(total); streamer: mysql-bin.002588:1068763841; ETA: 5s
Copy: 4420000/4467229 98.9%; Applied: 23086; Backlog: 5/100; Elapsed: 7m13s(copy), 7m19s(total); streamer: mysql-bin.002588:1072751966; ETA: 4s
2016-05-19 18:04:25 INFO rotate to next log name: mysql-bin.002589
2016-05-19 18:04:25 INFO rotate to next log name: mysql-bin.002589
Copy: 4430000/4467231 99.2%; Applied: 23124; Backlog: 3/100; Elapsed: 7m14s(copy), 7m20s(total); streamer: mysql-bin.002589:2944139; ETA: 3s
Copy: 4442500/4467231 99.4%; Applied: 23181; Backlog: 2/100; Elapsed: 7m15s(copy), 7m21s(total); streamer: mysql-bin.002589:8042490; ETA: 2s
Copy: 4452500/4467232 99.7%; Applied: 23235; Backlog: 5/100; Elapsed: 7m16s(copy), 7m22s(total); streamer: mysql-bin.002589:12084190; ETA: 1s
Copy: 4462500/4467235 99.9%; Applied: 23295; Backlog: 8/100; Elapsed: 7m17s(copy), 7m23s(total); streamer: mysql-bin.002589:16174016; ETA: 0s
2016-05-19 18:04:29 INFO Row copy complete
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 0/100; Elapsed: 7m17s(copy), 7m24s(total); streamer: mysql-bin.002589:17255091; ETA: 0s
2016-05-19 18:04:29 INFO Stopping replication
2016-05-19 18:04:29 INFO Replication stopped
2016-05-19 18:04:29 INFO Verifying SQL thread is running
2016-05-19 18:04:29 INFO SQL thread started
2016-05-19 18:04:29 INFO Replication IO thread at mysql-bin.001801:719204179. SQL thread is at mysql-bin.001801:719204179
2016-05-19 18:04:29 INFO Writing changelog state: AllEventsUpToLockProcessed
2016-05-19 18:04:29 INFO Waiting for events up to lock
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 1/100; Elapsed: 7m18s(copy), 7m24s(total); streamer: mysql-bin.002589:17702369; ETA: 0s
2016-05-19 18:04:30 INFO Done waiting for events up to lock
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 0/100; Elapsed: 7m18s(copy), 7m25s(total); streamer: mysql-bin.002589:17703056; ETA: 0s
```
This migration took, till this point, `7m25s`, had applied `23309` events from the binary log and has copied `4466492` rows onto the ghost table.