merged master

This commit is contained in:
Shlomi Noach 2016-08-26 15:57:19 +02:00
commit c45d2923c0
43 changed files with 1516 additions and 668 deletions

View File

@ -0,0 +1,9 @@
> This is the place to report a bug, ask a question, suggest an enhancment.
> This is the place to make a discussion before creating a PR.
> Please label your Issue
> Please understand if this Issue is not addressed immediately or in a timeframe you were expecting.
> Thank you!

View File

@ -0,0 +1,21 @@
## A Pull Request should be associated with an Issue.
> We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
> If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
> and potentially we'll be able to point development in a particular direction.
Related issue: https://github.com/github/gh-ost/issues/0123456789
> Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
> Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR
### Description
This PR [briefly explain what is does]
> In case this PR introduced Go code changes:
- [ ] contributed code is using same conventions as original code
- [ ] code is formatted via `gofmt` (please avoid `goimports`)
- [ ] code is built via `./build.sh`
- [ ] code is tested via `./test.sh`

7
.travis.yml Normal file
View File

@ -0,0 +1,7 @@
language: go
go:
- 1.6
- tip
script: ./test.sh

109
README.md
View File

@ -1,67 +1,94 @@
# gh-ost
#### GitHub's online schema migration for MySQL
#### GitHub's online schema migration for MySQL <img src="doc/images/gh-ost-logo-light-160.png" align="right">
`gh-ost` is a triggerless online schema migration solution for MySQL. It is testable and provides pausability, dynamic control/reconfiguration, auditing, and many operational perks.
`gh-ost` produces a light workload on the master throughout the migration, decoupled from the existing workload on the migrated table.
It has been designed based on years of experience with existing solutions, and changes the paradigm of table migrations.
`gh-ost` allows for online schema migrations in MySQL which are:
- Triggerless
- Testable
- Pausable
- Operations-friendly
## How?
WORK IN PROGRESS
All existing online-schema-change tools operate in similar manner: they create a _ghost_ table in the likeness of your original table, migrate that table while empty, slowly and incrementally copy data from your original table to the _ghost_ table, meanwhile propagating ongoing changes (any `INSERT`, `DELETE`, `UPDATE` applied to your table) to the _ghost_ table. Finally, at the right time, they replace your original table with the _ghost_ table.
Please meanwhile refer to the [docs](doc) for more information. No, really, go to the [docs](doc).
`gh-ost` uses the same pattern. However it differs from all existing tools by not using triggers. We have recognized the triggers to be the source of [many limitations and risks](doc/why-triggerless.md).
- [Why triggerless](doc/why-triggerless.md)
- [Triggerless design](doc/triggerless-design.md)
- [Cut over phase](doc/cut-over.md)
- [Testing on replica](doc/testing-on-replica.md)
- [Throttle](doc/throttle.md)
- [Operational perks](doc/perks.md)
- [Migrating with Statement Based Replication](doc/migrating-with-sbr.md)
- [Understanding output](doc/understanding-output.md)
- [Interactive commands](doc/interactive-commands.md)
- [Command line flags](doc/command-line-flags.md)
Instead, `gh-ost` [uses the binary log stream](doc/triggerless-design.md) to capture table changes, and asynchronously applies them onto the _ghost_ table. `gh-ost` takes upon itself some tasks that other tools leave for the database to perform. As result, `gh-ost` has greater control over the migration process; can truly suspend it; can truly decouple the migration's write load from the master's workload.
In addition, it offers many [operational perks](doc/perks.md) that make it safer, trustworthy and fun to use.
![gh-ost general flow](doc/images/gh-ost-general-flow.png)
## Highlights
- Build your trust in `gh-ost` by testing it on replicas. `gh-ost` will issue same flow as it would have on the master, to migrate a table on a replica, without actually replacing the original table, leaving the replica with two tables you can then compare and satisfy yourself that the tool operates correctly. This is how we continuously test `gh-ost` in production.
- True pause: when `gh-ost` [throttles](doc/throttle.md), it truly ceases writes on master: no row copies and no ongoing events processing. By throttling, you return your master to its original workload
- Dynamic control: you can [interactively](doc/interactive-commands.md) reconfigure `gh-ost`, even as migration still runs. You may forcibly initiate throttling.
- Auditing: you may query `gh-ost` for status. `gh-ost` listens on unix socket or TCP.
- Control over cut-over phase: `gh-ost` can be instructed to postpone what is probably the most critical step: the swap of tables, until such time that you're comfortably available. No need to worry about ETA being outside office hours.
Please refer to the [docs](doc) for more information. No, really, read the [docs](doc).
## Usage
#### Where to execute
The [cheatsheet](doc/cheatsheet.md) has it all. You may be interested in invoking `gh-ost` in various modes:
The recommended way of executing `gh-ost` is to have it connect to a _replica_, as opposed to having it connect to the master. `gh-ost` will crawl its way up the replication chain to figure out who the master is.
- a _noop_ migration (merely testing that the migration is valid and good to go)
- a real migration, utilizing a replica (the migration runs on the master; `gh-ost` figures out identities of servers involved. Required mode if your master uses Statement Based Replication)
- a real migration, run directly on the master (but `gh-ost` prefers the former)
- a real migration on a replica (master untouched)
- a test migration on a replica, the way for you to build trust with `gh-ost`'s operation.
By connecting to a replica, `gh-ost` sets up a self-throttling mechanism; feels more comfortable in querying `information_schema` tables; and more. Connecting `gh-ost` to a replica is also the trick to make it work even if your master is configured with `statement based replication`, as `gh-ost` is able to manipulate the replica to rewrite logs in `row based replication`. See [Migrating with Statement Based Replication](migrating-with-sbr.md).
Our tips:
The replica would have to use binary logs and be configured with `log_slave_updates`.
- [Testing above all](doc/testing-on-replica.md), try out `--test-on-replica` first few times. Better yet, make it continuous. We have multiple replicas where we iterate our entire fleet of production tables, migrating them one by one, checksumming the results, verifying migration is good.
- For each master migration, first issue a _noop_
- Then issue the real thing via `--execute`.
It is still OK to connect `gh-ost` directly on master; you will need to confirm this by providing `--allow-on-master`. The master would have to be using `row based replication`.
More tips:
`gh-ost` itself may be executed from anywhere. It connects via `tcp` and it does not have to be executed from a `MySQL` box. However, do note it generates a lot of traffic, as it connects as a replica and pulls binary log data.
- Use `--exact-rowcount` for accurate progress indication
- Use `--postpone-cut-over-flag-file` to gain control over cut-over timing
- Get familiar with the [interactive commands](doc/interactive-commands.md)
#### Testing on replica
Also see:
Newcomer? We think you would enjoy building trust with this tool. You can ask `gh-ost` to simulate a migration on a replica -- this will not affect data on master and will not actually do a complete migration. It will operate on a replica, and end up with two tables: the original (untouched), and the migrated. You will have your chance to compare the two and verify the tool works to your satisfaction.
```
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --execute --initially-drop-ghost-table --initially-drop-old-table -max-load=Threads_running=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --test-on-replica --verbose --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --throttle-flag-file=/tmp/ghost.throttle.flag
```
Please read more on [testing on replica](testing-on-replica.md)
#### Migrating a master table
```
gh-ost --conf=.my.cnf --database=mydb --table=mytable --verbose --alter="engine=innodb" --initially-drop-ghost-table --initially-drop-old-table --max-load=Threads_running=30 --switch-to-rbr --chunk-size=2500 --exact-rowcount --verbose --postpone-cut-over-flag-file=/tmp/ghost.postpone.flag --throttle-flag-file=/tmp/ghost.throttle.flag [--execute]
```
Note: in order to migrate a table on the master you don't need to _connect_ to the master. `gh-ost` is happy (and prefers) if you connect to a replica; it then figures out the identity of the master and makes the connection itself.
- [requirements and limitations](doc/requirements-and-limitations.md)
- [what if?](doc/what-if.md)
- [the fine print](doc/the-fine-print.md)
## What's in a name?
Originally this was named `gh-osc`: GitHub Online Schema Change, in the likes of [Facebook online schema change](https://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932/) and [pt-online-schema-change](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
But then a rare genetic mutation happened, and the `s` transformed into `t`. And that sent us down the path of trying to figure out a new acronym. Right now, `gh-ost` (pronounce: _Ghost_), stands for:
- GitHub Online Schema Translator/Transformer/Transfigurator
But then a rare genetic mutation happened, and the `c` transformed into `t`. And that sent us down the path of trying to figure out a new acronym. `gh-ost` (pronounce: _Ghost_), stands for GitHub's Online Schema Transmogrifier/Translator/Transformer/Transfigurator
## License
`gh-ost` is licensed under the [MIT license](https://github.com/github/gh-ost/blob/master/LICENSE)
`gh-ost` uses 3rd party libraries, each with their own license. These are found [here](https://github.com/github/gh-ost/tree/master/vendor).
## Community
`gh-ost` is released at a stable state, but with mileage to go. We are [open to pull requests](https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md). Please first discuss your intentions via [Issues](https://github.com/github/gh-ost/issues).
We develop `gh-ost` at GitHub and for the community. We may have different priorities than others. From time to time we may suggest a contribution that is not on our immediate roadmap but which may appeal to others.
## Download/binaries/source
`gh-ost` is now GA and stable.
`gh-ost` is available in binary format for Linux and Mac OS/X
[Download latest release here](https://github.com/github/gh-ost/releases/latest)
`gh-ost` is a Go project; it is built with Go 1.5 with "experimental vendor". Soon to migrate to Go 1.6. See and use [build file](https://github.com/github/gh-ost/blob/master/build.sh) for compiling it on your own.
Generally speaking, `master` branch is stable, but only [releases](https://github.com/github/gh-ost/releases) are to be used in production.
## Authors

35
build.sh Normal file → Executable file
View File

@ -1,22 +1,37 @@
#!/bin/bash
#
#
RELEASE_VERSION="0.9.9"
RELEASE_VERSION="1.0.14"
function build {
osname=$1
osshort=$2
GOOS=$3
GOARCH=$4
echo "Building ${osname} binary"
export GOOS
export GOARCH
go build -ldflags "$ldflags" -o $buildpath/$target go/cmd/gh-ost/main.go
if [ $? -ne 0 ]; then
echo "Build failed for ${osname}"
exit 1
fi
(cd $buildpath && tar cfz ./gh-ost-binary-${osshort}-${timestamp}.tar.gz $target)
}
buildpath=/tmp/gh-ost
target=gh-ost
timestamp=$(date "+%Y%m%d%H%M%S")
mkdir -p ${buildpath}
ldflags="-X main.AppVersion=${RELEASE_VERSION}"
gobuild="go build -ldflags \"$ldflags\" -o $buildpath/$target go/cmd/gh-ost/main.go"
export GO15VENDOREXPERIMENT=1
echo "Building OS/X binary"
echo "GO15VENDOREXPERIMENT=1 GOOS=darwin GOARCH=amd64 $gobuild" | bash
(cd $buildpath && tar cfz ./gh-ost-binary-osx-${timestamp}.tar.gz $target)
echo "Building linux binary"
echo "GO15VENDOREXPERIMENT=1 GOOS=linux GOARCH=amd64 $gobuild" | bash
(cd $buildpath && tar cfz ./gh-ost-binary-linux-${timestamp}.tar.gz $target)
mkdir -p ${buildpath}
build macOS osx darwin amd64
build GNU/Linux linux linux amd64
echo "Binaries found in:"
ls -1 $buildpath/gh-ost-binary*${timestamp}.tar.gz

123
doc/cheatsheet.md Normal file
View File

@ -0,0 +1,123 @@
# Cheatsheet
![operation modes](images/gh-ost-operation-modes.png)
`gh-ost` operates by connecting to potentially multiple servers, as well as imposing itself as a replica in order to streamline binary log events directly from one of those servers. There are various operation modes, which depend on your setup, configuration, and where you want to run the migration.
### a. Connect to replica, migrate on master
This is the mode `gh-ost` expects by default. `gh-ost` will investigate the replica, crawl up to find the topology's master, and will hook onto it as well. Migration will:
- Read and write row-data on master
- Read binary logs events on the replica, apply the changes onto the master
- Investigates table format, columns & keys, count rows on the replica
- Read internal changelog events (such as heartbeat) from the replica
- Cut-over (switch tables) on the master
If your master works with SBR, this is the mode to work with. The replica must be configured with binary logs enabled (`log_bin`, `log_slave_updates`) and should have `binlog_format=ROW` (`gh-ost` can apply the latter for you).
However even with RBR we suggest this is the least master-intrusive operation mode.
```shell
gh-ost \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas="myreplica.1.com,myreplica.2.com" \
--max-lag-millis=1500 \
--user="gh-ost" \
--password="123456" \
--host=replica.with.rbr.com \
--database="my_schema" \
--table="my_table" \
--verbose \
--alter="engine=innodb" \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
[--execute]
```
With `--execute`, migration actually copies data and flips tables. Without it this is a `noop` run.
### b. Connect to master
If you don't have replicas, or do not wish to use them, you are still able to operate directly on the master. `gh-ost` will do all operations directly on the master. You may still ask it to be considerate of replication lag.
- Your master must produce binary logs in RBR format.
- You must approve this mode via `--allow-on-master`.
```shell
gh-ost \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--throttle-control-replicas="myreplica.1.com,myreplica.2.com" \
--max-lag-millis=1500 \
--user="gh-ost" \
--password="123456" \
--host=master.with.rbr.com \
--allow-on-master \
--database="my_schema" \
--table="my_table" \
--verbose \
--alter="engine=innodb" \
--switch-to-rbr \
--allow-master-master \
--cut-over=default \
--exact-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
[--execute]
```
### c. Migrate/test on replica
This will perform a migration on the replica. `gh-ost` will briefly connect to the master but will thereafter perform all operations on the replica without modifying anything on the master.
Throughout the operation, `gh-ost` will throttle such that the replica is up to date.
- `--migrate-on-replica` indicates to `gh-ost` that it must migrate the table directly on the replica. It will perform the cut-over phase even while replication is running.
- `--test-on-replica` indicates the migration is for purpose of testing only. Before cut-over takes place, replication is stopped. Tables are swapped and then swapped back: your original table returns to its original place.
Both tables are left with replication stopped. You may examine the two and compare data.
Test on replica cheatsheet:
```shell
gh-ost \
--user="gh-ost" \
--password="123456" \
--host=replica.with.rbr.com \
--test-on-replica \
--database="my_schema" \
--table="my_table" \
--verbose \
--alter="engine=innodb" \
--initially-drop-ghost-table \
--initially-drop-old-table \
--max-load=Threads_running=30 \
--switch-to-rbr \
--chunk-size=2500 \
--cut-over=default \
--exact-rowcount \
--serve-socket-file=/tmp/gh-ost.test.sock \
--panic-flag-file=/tmp/gh-ost.panic.flag \
--execute
```
### cnf file
You may use a `cnf` file in the following format:
```
[client]
user=gh-ost
password=123456
```
You may then remove `--user=gh-ost --password=123456` and specify `--conf=/path/to/config/file.cnf`

View File

@ -2,7 +2,27 @@
A more in-depth discussion of various `gh-ost` command line flags: implementation, implication, use cases.
##### conf
### allow-on-master
By default, `gh-ost` would like you to connect to a replica, from where it figures out the master by itself. This wiring is required should your master execute using `binlog_format=STATEMENT`.
If, for some reason, you do not wish `gh-ost` to connect to a replica, you may connect it directly to the master and approve this via `--allow-on-master`.
### approve-renamed-columns
When your migration issues a column rename (`change column old_name new_name ...`) `gh-ost` analyzes the statement to try an associate the old column name with new column name. Otherwise the new structure may also look like some column was dropped and another was added.
`gh-ost` will print out what it thinks the _rename_ implied, but will not issue the migration unless you provide with `--approve-renamed-columns`.
If you think `gh-ost` is mistaken and that there's actually no _rename_ involved, you may pass `--skip-renamed-columns` instead. This will cause `gh-ost` to disassociate the column values; data will not be copied between those columns.
### assume-rbr
If you happen to _know_ your servers use RBR (Row Based Replication, i.e. `binlog_format=ROW`), you may specify `--assume-rbr`. This skips a verification step where `gh-ost` would issue a `STOP SLAVE; START SLAVE`.
Skipping this step means `gh-ost` would not need the `SUPER` privilege in order to operate.
You may want to use this on Amazon RDS
### conf
`--conf=/path/to/my.cnf`: file where credentials are specified. Should be in (or contain) the following format:
@ -12,11 +32,11 @@ user=gromit
password=123456
```
##### cut-over
### cut-over
Optional. Default is `safe`. See more discussion in [cut-over](cut-over.md)
##### exact-rowcount
### exact-rowcount
A `gh-ost` execution need to copy whatever rows you have in your existing table onto the ghost table. This can, and often be, a large number. Exactly what that number is?
`gh-ost` initially estimates the number of rows in your table by issuing an `explain select * from your_table`. This will use statistics on your table and return with a rough estimate. How rough? It might go as low as half or as high as double the actual number of rows in your table. This is the same method as used in [`pt-online-schema-change`](https://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html).
@ -29,20 +49,28 @@ A `gh-ost` execution need to copy whatever rows you have in your existing table
While the ongoing estimated number of rows is still heuristic, it's almost exact, such that the reported [ETA](understanding-output.md) or percentage progress is typically accurate to the second throughout a multiple-hour operation.
##### execute
### execute
Without this parameter, migration is a _noop_: testing table creation and validity of migration, but not touching data.
##### initially-drop-ghost-table
### initially-drop-ghost-table
`gh-ost` maintains two tables while migrating: the _ghost_ table (which is synced from your original table and finally replaces it) and a changelog table, which is used internally for bookkeeping. By default, it panics and aborts if it sees those tables upon startup. Provide `--initially-drop-ghost-table` and `--initially-drop-old-table` to let `gh-ost` know it's OK to drop them beforehand.
We think `gh-ost` should not take chances or make assumptions about the user's tables. Dropping tables can be a dangerous, locking operation. We let the user explicitly approve such operations.
##### initially-drop-old-table
### initially-drop-old-table
See #initially-drop-ghost-table
##### test-on-replica
### migrate-on-replica
Typically `gh-ost` is used to migrate tables on a master. If you wish to only perform the migration in full on a replica, connect `gh-ost` to said replica and pass `--migrate-on-replica`. `gh-ost` will briefly connect to the master but other issue no changes on the master. Migration will be fully executed on the replica, while making sure to maintain a small replication lag.
### skip-renamed-columns
See `approve-renamed-columns`
### test-on-replica
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [test-on-replica](test-on-replica.md)

Binary file not shown.

After

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

View File

@ -15,15 +15,16 @@ Both interfaces may serve at the same time. Both respond to simple text command,
### Known commands
- `help`: shows a brief list of available commands
- `status`: returns a status summary of migration progress and configuration
replication lag on to determine throttling
- `status`: returns a detailed status summary of migration progress and configuration
- `sup`: returns a brief status summary of migration progress
- `chunk-size=<newsize>`: modify the `chunk-size`; applies on next running copy-iteration
- `max-lag-millis=<max-lag>`: modify the maximum replication lag threshold (milliseconds, minimum value is `1000`, i.e. 1 second)
- `max-load=<max-load-thresholds>`: modify the `max-load` config; applies on next running copy-iteration
The `max-load` format must be: `some_status=<numeric-threshold>[,some_status=<numeric-threshold>...]`. For example: `Threads_running=50,threads_connected=1000`, and you would then write/echo `max-load=Threads_running=50,threads_connected=1000` to the socket.
- `critical-load=<load>`: change critical load setting (exceeding given thresholds causes panic and abort)
- `nice-ratio=<ratio>`: change _nice_ ratio: 0 for aggressive, positive integer `n`: for any unit of time spent copying rows, spend `n` units of time sleeping.
- `nice-ratio=<ratio>`: change _nice_ ratio: 0 for aggressive (not nice, not sleeping), positive integer `n`: for any `1ms` spent copying rows, spend `n*1ms` units of time sleeping. Examples: assume a single rows chunk copy takes `100ms` to complete. `nice-ratio=0.5` will cause `gh-ost` to sleep for `50ms` immediately following. `nice-ratio=1` will cause `gh-ost` to sleep for `100ms`, effectively doubling runtime; value of `2` will effectively triple the runtime; etc.
- `throttle-query`: change throttle query
- `throttle-control-replicas`: change list of throttle-control replicas, these are replicas `gh-ost` will cehck
- `throttle-control-replicas='replica1,replica2'`: change list of throttle-control replicas, these are replicas `gh-ost` will check. This takes a comma separated list of replica's to check and replaces the previous list.
- `throttle`: force migration suspend
- `no-throttle`: cancel forced suspension (though other throttling reasons may still apply)
- `unpostpone`: at a time where `gh-ost` is postponing the [cut-over](cut-over.md) phase, instruct `gh-ost` to stop postponing and proceed immediately to cut-over.

22
doc/local-tests.md Normal file
View File

@ -0,0 +1,22 @@
# Local tests
`gh-ost` is continuously tested in production via `--test-on-replica alter='engine=innodb'`. These tests check the GitHub workload and usage, but not necessarily the general case.
Local tests are an additional layer of tests. They will eventually be part of continuous integration tests.
Local tests test explicit use cases, such as column renames, mix of time zones, special types and alters. Traits of a single test:
- Composed of a single table.
- A single alter.
- By default the alter is `engine=innodb`, but this can be overridden per-test
- Scheduled DML operations, executed via `event_scheduler`.
- `gh-ost` is set to execute and throttle for `5` seconds, at which time all tested DMLs are expected to operate.
- The test requires a replication topology and utilizes `--test-on-replica`
- The test checksums the two tables (original and _ghost_) and expects identical checksum
- By default the test selects all (`*`) columns, but this can be overridden per-test
Tests are found under [localtests](https://github.com/github/gh-ost/tree/master/localtests). A single test is a subdirectory and tests are iterated alphabetically.
New data-integrity, synchronization issues or otherwise concerns are expected to be tested by new test cases.
While this is merged work is still ongoing.

View File

@ -0,0 +1,34 @@
# Requirements and limitations
### Requirements
- You will need to have one server serving Row Based Replication (RBR) format binary logs. Right now `FULL` row image is supported. `MINIMAL` to be supported in the near future. `gh-ost` prefers to work with replicas. You may [still have your master configured with Statement Based Replication](migrating-with-sbr) (SBR).
- `gh-ost` requires an account with these privileges:
- `ALTER, CREATE, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT, TRIGGER, UPDATE` on the database (schema) where your migrated table is, or of course on `*.*`
- either:
- `SUPER, REPLICATION SLAVE` on `*.*`, or:
- `REPLICATION CLIENT, REPLICATION SLAVE` on `*.*`
The `SUPER` privilege is required for `STOP SLAVE`, `START SLAVE` operations. These are used on:
- Switching your `binlog_format` to `ROW`, in the case where it is _not_ `ROW` and you explicitly specified `--switch-to-rbr`
- If your replication is already in RBR (`binlog_format=ROW`) you can specify `--assume-rbr` to avoid the `STOP SLAVE/START SLAVE` operations, hence no need for `SUPER`.
- Running `--test-on-replica`: before the cut-over phase, `gh-ost` stops replication so that you can compare the two tables and satisfy that the migration is sound.
### Limitations
- Foreign keys not supported. They may be supported in the future, to some extent.
- Triggers are not supported. They may be supported in the future.
- MySQL 5.7 generated columns are not supported. They may be supported in the future.
- The two _before_ & _after_ tables must share some `UNIQUE KEY`. Such key would be used by `gh-ost` to iterate the table.
- As an example, if your table has a single `UNIQUE KEY` and no `PRIMARY KEY`, and you wish to replace it with a `PRIMARY KEY`, you will need two migrations: one to add the `PRIMARY KEY` (this migration will use the existing `UNIQUE KEY`), another to drop the now redundant `UNIQUE KEY` (this migration will use the `PRIMARY KEY`).
- The chosen migration key must not include columns with `NULL` values.
- `gh-ost` will do its best to pick a migration key with non-nullable columns. It will by default refuse a migration where the only possible `UNIQUE KEY` includes nullable-columns. You may override this refusal via `--allow-nullable-unique-key` but **you must** be sure there are no actual `NULL` values in those columns. Such `NULL` values would cause a data integrity problem and potentially a corrupted migration.
- It is not allowed to migrate a table where another table exists with same name and different upper/lower case.
- For example, you may not migrate `MyTable` if another table called `MYtable` exists in the same schema.
- Amazon RDS and Google Cloud SQL are currently not supported
- We began working towards removing this limitation. See tracking issue: https://github.com/github/gh-ost/issues/163
- Multisource is not supported when migrating via replica. It _should_ work (but never tested) when connecting directly to master (`--allow-on-master`)
- Master-master setup is only supported in active-passive setup. Active-active (where table is being written to on both masters concurrently) is unsupported. It may be supported in the future.

View File

@ -63,3 +63,6 @@ $ gh-osc --host=myhost.com --conf=/etc/gh-ost.cnf --database=test --table=sample
### Further notes
Do not confuse `--test-on-replica` with `--migrate-on-replica`; the latter performs the migration and _keeps it that way_ (does not revert the table swap nor stops replication)
As part of testing on replica, `gh-ost` issues a `STOP SLAVE`. This requires the `SUPER` privilege.
See related discussion on https://github.com/github/gh-ost/issues/162

41
doc/the-fine-print.md Normal file
View File

@ -0,0 +1,41 @@
# The Fine Print: What are You Not Telling Me?
Here are technical considerations you may be interested in. We write here things that are not an obvious [Requirements & Limitations](requirements-and-limitations.md)
# Connecting to replica
`gh-ost` prefers connecting to replica. If your master uses Statement Based Replication, this is a _requirement_.
What does "connect to replica" mean?
- `gh-ost` connects to the replica as a normal client
- It additionally connects as a replica to the replica (pretends to be a MySQL replica itself)
- It auto-detects master
`gh-ost` reads the RBR binary logs from the replica, and applies events onto the master as tables are being migrated.
THE FINE PRINT:
- You trust the replica's binary logs to represent events applied on master.
If you don't trust the replica, if you suspect there's data drift between replica & master, take notice. If your master is RBR, do instead connect `gh-ost` to master, via `--allow-on-master` (see [cheatsheet](cheatsheet.md)).
Our take: we trust replica data; if master dies in production, we promote a replica. Our read serving is based on replica(s).
- Replication needs to run.
This is an obvious, but worth stating. You cannot perform a migration with "connect to replica" if your replica lags. `gh-ost` will actually do all it can so that replication does not lag, and avoid critical operations at such time when replication does lag.
# Network usage
`gh-ost` reads binary logs and then applies them onto the migrated server.
THE FINE PRINT:
- `gh-ost` delivers more network traffic than other online-schema-change tools, that let MySQL handle all data transfer internally. This is part of the [triggerless design](triggerless-design.md).
Our take: we deal with cross-DC migration traffic and this is working well for us.
# Impersonating as a replica
`gh-ost` impersonates as a replica: connects to a MySQL server, says "oh hey, I'm a replica, please send me binary logs kthx".
THE FINE PRINT:
- `SHOW SLAVE HOSTS` or `SHOW PROCESSLIST` will list down this strange "replica" that you can't really connect to.

View File

@ -44,105 +44,81 @@ Those are relatively self explanatory. Mostly they indicate that all goes well.
You will be mostly interested in following up on the migration and understanding whether it goes well. Once migration actually begins, you will see output as follows:
```
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 0/100; Elapsed: 0s(copy), 6s(total); streamer: mysql-bin.002587:348727198; ETA: N/A
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 1s(copy), 7s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=83.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 2s(copy), 8s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=79.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 3s(copy), 9s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=74.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 4s(copy), 10s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=69.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 5s(copy), 11s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=65.000000s
Copy: 0/752865 0.0%; Applied: 0; Backlog: 0/100; Time: 29s(total), 0s(copy); streamer: mysql-bin.007068:846528615; ETA: N/A
Copy: 0/752865 0.0%; Applied: 0; Backlog: 0/100; Time: 30s(total), 1s(copy); streamer: mysql-bin.007068:846875570; ETA: N/A
Copy: 7300/752865 1.0%; Applied: 0; Backlog: 0/100; Time: 31s(total), 2s(copy); streamer: mysql-bin.007068:855439063; ETA: N/A
Copy: 14100/752865 1.9%; Applied: 0; Backlog: 0/100; Time: 32s(total), 3s(copy); streamer: mysql-bin.007068:864722759; ETA: 2m37s
Copy: 20100/752865 2.7%; Applied: 0; Backlog: 0/100; Time: 33s(total), 4s(copy); streamer: mysql-bin.007068:874346340; ETA: 2m26s
Copy: 27000/752865 3.6%; Applied: 0; Backlog: 0/100; Time: 34s(total), 5s(copy); streamer: mysql-bin.007068:886997306; ETA: 2m14s
...
```
In the above we're mostly interested to see that `ETA: throttled, replica-lag=65.000000s`.
- Migration is throttled, i.e. `gh-ost` finds that the server is too busy, or replication is too far behind, and so it ceases (or does not start) data copy operation.
- It also provides a reason for the throttling. In out case it seems replication is too far behind. `gh-ost` awaits until replication lag is smaller than `--max-lag-millis`.
However another thing catches the eye: `Backlog: 0/100` transitions into `Backlog: 100/100`
- `Backlog` is the binlog events queue. A queue of events read from the binary log which are relevant for the migration. The queue gets emptied as events are applied onto the ghost table. Typically we want to see that queue empty or almost empty. However, due to the fact we're not throttled it makes perfect sense that the queue is full: throttling means we do not apply events onto the ghost table, hence we do not purge the queue.
In the above some time was spent on counting table rows. `29s` have elapsed before actual rowcopy began. `gh-ost` will not deliver ETA before `1%` of the copy is complete.
```
...
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 16s(copy), 22s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=8.000000s
Copy: 0/4466810 0.0%; Applied: 0; Backlog: 100/100; Elapsed: 17s(copy), 23s(total); streamer: mysql-bin.002587:349815124; ETA: throttled, replica-lag=2.000000s
Copy: 0/4466885 0.0%; Applied: 1492; Backlog: 100/100; Elapsed: 18s(copy), 24s(total); streamer: mysql-bin.002587:358722182; ETA: N/A
Copy: 0/4466942 0.0%; Applied: 2966; Backlog: 100/100; Elapsed: 19s(copy), 25s(total); streamer: mysql-bin.002587:367190999; ETA: N/A
Copy: 0/4466993 0.0%; Applied: 4462; Backlog: 1/100; Elapsed: 20s(copy), 26s(total); streamer: mysql-bin.002587:376732190; ETA: N/A
Copy: 12500/4466994 0.3%; Applied: 4496; Backlog: 2/100; Elapsed: 21s(copy), 27s(total); streamer: mysql-bin.002587:381475469; ETA: N/A
Copy: 25000/4466997 0.6%; Applied: 4535; Backlog: 6/100; Elapsed: 22s(copy), 28s(total); streamer: mysql-bin.002587:386747649; ETA: N/A
Copy: 40000/4467001 0.9%; Applied: 4582; Backlog: 3/100; Elapsed: 23s(copy), 29s(total); streamer: mysql-bin.002587:393017028; ETA: N/A
Copy: 460900/752865 61.2%; Applied: 0; Backlog: 0/100; Time: 2m35s(total), 2m6s(copy); streamer: mysql-bin.007069:596112173; ETA: 1m19s
Copy: 466600/752865 62.0%; Applied: 0; Backlog: 0/100; Time: 2m40s(total), 2m11s(copy); streamer: mysql-bin.007069:622646704; ETA: throttled, my.replica-01.com:3306 replica-lag=3.000000s
Copy: 478500/752865 63.6%; Applied: 0; Backlog: 0/100; Time: 2m45s(total), 2m16s(copy); streamer: mysql-bin.007069:641258880; ETA: 1m17s
Copy: 496900/752865 66.0%; Applied: 0; Backlog: 0/100; Time: 2m50s(total), 2m21s(copy); streamer: mysql-bin.007069:678956577; ETA: throttled, my.replica-01.com:3306 replica-lag=2.000000s
Copy: 496900/752865 66.0%; Applied: 0; Backlog: 0/100; Time: 2m55s(total), 2m26s(copy); streamer: mysql-bin.007069:681610879; ETA: throttled, max-load Threads_running=26 >= 25
Copy: 528000/752865 70.1%; Applied: 0; Backlog: 0/100; Time: 3m0s(total), 2m31s(copy); streamer: mysql-bin.007069:711177703; ETA: throttled, lag=2.483039s
Copy: 564900/752865 75.0%; Applied: 0; Backlog: 0/100; Time: 3m30s(total), 3m1s(copy); streamer: mysql-bin.007069:795150744; ETA: throttled, lag=3.482914s
Copy: 577200/752865 76.7%; Applied: 0; Backlog: 0/100; Time: 3m39s(total), 3m10s(copy); streamer: mysql-bin.007069:819956052; ETA: 57s
Copy: 589300/752865 78.3%; Applied: 0; Backlog: 0/100; Time: 3m56s(total), 3m27s(copy); streamer: mysql-bin.007069:858738375; ETA: 57s
Copy: 595700/752865 79.1%; Applied: 0; Backlog: 0/100; Time: 3m57s(total), 3m28s(copy); streamer: mysql-bin.007069:860745762; ETA: 54s
```
In the above, `gh-ost` found replication to be caught up and began operation. We note:
- `Backlog` goes down to `1` or `2` or otherwise smaller numbers. This means we are good with processing the binlog events and applying them onto the ghost table.
- `Applied` is the incrementing number of events we have applied from the binary log onto the ghost table, since the migration began.
- `Copy`: at the beginning the tool estimated `4466810` rows already existing in the table. Initially `0` of them are copied, hence `0/4466810`. But as `gh-ost` makes progress, this number grows:
- `12500/4466994 0.3%`
- `25000/4466997 0.6%`
- `40000/4467001 0.9%`
- You can also observe that the number of rows changes. This is implied by the flag `--exact-rowcount`, where we try and keep an updated amount of rows were are going to process throughout the migration, even as new rows are added and old rows deleted. This is not an exact number, but turns out to be a pretty good estimate.
- `Elapsed: 23s(copy), 29s(total)`: `total` stands for total time from executing of `gh-ost`. `copy` stands for the time elapsed since `gh-ost` finished making preparations and was good to go with copy.
- `streamer: mysql-bin.002587:393017028` tells us which binary log entry is `gh-ost` processing at this time.
- `ETA`: Estimated Time of Arrival, is still `N/A` since `gh-ost` has not collected enough data to make an estimate.
In the above migration is throttled on occasion.
Some time later, we will have:
- A few times because one of the control replicas, specifically `my.replica-01.com:3306`, was lagging
- Once because `max-load` threshold has been reached (`Threads_running=26 >= 25`)
- Once because the migration replica itself (the server into which `gh-ost` connected) lagged.
`gh-ost` will always specify the reason for throttling.
### Progress
- `Copy: 595700/752865 79.1%` indicates the number of existing table rows copied onto the _ghost_ table, out of an estimate of the total row count.
- `Applied: 0` indicates the number of entries processed in the binary log and applied onto the _ghost_ table. In the examples above there was no traffic on the migrated table, hence no rows processed.
A migration on a more intensively used table may look like this:
```
Copy: 50000/4467001 1.1%; Applied: 4620; Backlog: 6/100; Elapsed: 24s(copy), 30s(total); streamer: mysql-bin.002587:396414283; ETA: 35m20s
Copy: 62500/4467002 1.4%; Applied: 4671; Backlog: 3/100; Elapsed: 25s(copy), 31s(total); streamer: mysql-bin.002587:402582372; ETA: 29m21s
Copy: 75000/4467003 1.7%; Applied: 4703; Backlog: 3/100; Elapsed: 26s(copy), 32s(total); streamer: mysql-bin.002587:407864888; ETA: 25m22s
Copy: 87500/4467004 2.0%; Applied: 4751; Backlog: 6/100; Elapsed: 27s(copy), 33s(total); streamer: mysql-bin.002587:413142992; ETA: 22m31s
Copy: 100000/4467004 2.2%; Applied: 4795; Backlog: 6/100; Elapsed: 28s(copy), 34s(total); streamer: mysql-bin.002587:418380729; ETA: 20m22s
Copy: 112500/4467005 2.5%; Applied: 4835; Backlog: 1/100; Elapsed: 29s(copy), 35s(total); streamer: mysql-bin.002587:423592450; ETA: 18m42s
Copy: 30713100/43138319 71.2%; Applied: 381910; Backlog: 0/100; Time: 2h6m30s(total), 2h3m20s(copy); streamer: mysql-bin.006792:1001340307; ETA: 49m53s
Copy: 30852500/43138338 71.5%; Applied: 383365; Backlog: 0/100; Time: 2h7m0s(total), 2h3m50s(copy); streamer: mysql-bin.006792:1050191186; ETA: 49m18s
2016-07-25 03:20:41 INFO rotate to next log name: mysql-bin.006793
2016-07-25 03:20:41 INFO rotate to next log name: mysql-bin.006793
Copy: 30925700/43138360 71.7%; Applied: 384873; Backlog: 0/100; Time: 2h7m30s(total), 2h4m20s(copy); streamer: mysql-bin.006793:9144080; ETA: 49m5s
Copy: 31028800/43138380 71.9%; Applied: 386325; Backlog: 0/100; Time: 2h8m0s(total), 2h4m50s(copy); streamer: mysql-bin.006793:47984430; ETA: 48m43s
Copy: 31165600/43138397 72.2%; Applied: 387787; Backlog: 0/100; Time: 2h8m30s(total), 2h5m20s(copy); streamer: mysql-bin.006793:96139474; ETA: 48m8s
Copy: 31291200/43138418 72.5%; Applied: 389257; Backlog: 7/100; Time: 2h9m0s(total), 2h5m50s(copy); streamer: mysql-bin.006793:141094700; ETA: 47m38s
Copy: 31389700/43138432 72.8%; Applied: 390629; Backlog: 100/100; Time: 2h9m30s(total), 2h6m20s(copy); streamer: mysql-bin.006793:179473435; ETA: throttled, lag=1.548707s
```
And `gh-ost` progressively provides an ETA.
Status frequency:
- In the first `60` seconds `gh-ost` emits a status entry every `1` second.
- Then, up till `3` minutes into operation, status shows every `5` seconds.
- It then drops down to once per `30` seconds
- But goes into once-per-`5`-seconds again when it estimates < `3` minutes ETA
- And once per `1` second when it estimates < `1` minute ETA
Notes:
```
Copy: 602500/4467053 13.5%; Applied: 6770; Backlog: 0/100; Elapsed: 1m14s(copy), 1m20s(total); streamer: mysql-bin.002587:630949369; ETA: 7m54s
Copy: 655000/4467060 14.7%; Applied: 6985; Backlog: 6/100; Elapsed: 1m19s(copy), 1m25s(total); streamer: mysql-bin.002587:652696032; ETA: 7m39s
Copy: 707500/4467066 15.8%; Applied: 7207; Backlog: 0/100; Elapsed: 1m24s(copy), 1m30s(total); streamer: mysql-bin.002587:674577141; ETA: 7m26s
...
Copy: 1975000/4466798 44.2%; Applied: 12919; Backlog: 2/100; Elapsed: 3m24s(copy), 3m30s(total); streamer: mysql-bin.002588:119901391; ETA: 4m17s
Copy: 2285000/4466855 51.2%; Applied: 14234; Backlog: 13/100; Elapsed: 3m54s(copy), 4m0s(total); streamer: mysql-bin.002588:243346615; ETA: 3m43s
...
Copy: 4397500/4467226 98.4%; Applied: 22996; Backlog: 8/100; Elapsed: 7m11s(copy), 7m17s(total); streamer: mysql-bin.002588:1063945589; ETA: 6s
Copy: 4410000/4467227 98.7%; Applied: 23045; Backlog: 5/100; Elapsed: 7m12s(copy), 7m18s(total); streamer: mysql-bin.002588:1068763841; ETA: 5s
Copy: 4420000/4467229 98.9%; Applied: 23086; Backlog: 5/100; Elapsed: 7m13s(copy), 7m19s(total); streamer: mysql-bin.002588:1072751966; ETA: 4s
2016-05-19 18:04:25 INFO rotate to next log name: mysql-bin.002589
2016-05-19 18:04:25 INFO rotate to next log name: mysql-bin.002589
Copy: 4430000/4467231 99.2%; Applied: 23124; Backlog: 3/100; Elapsed: 7m14s(copy), 7m20s(total); streamer: mysql-bin.002589:2944139; ETA: 3s
Copy: 4442500/4467231 99.4%; Applied: 23181; Backlog: 2/100; Elapsed: 7m15s(copy), 7m21s(total); streamer: mysql-bin.002589:8042490; ETA: 2s
Copy: 4452500/4467232 99.7%; Applied: 23235; Backlog: 5/100; Elapsed: 7m16s(copy), 7m22s(total); streamer: mysql-bin.002589:12084190; ETA: 1s
Copy: 4462500/4467235 99.9%; Applied: 23295; Backlog: 8/100; Elapsed: 7m17s(copy), 7m23s(total); streamer: mysql-bin.002589:16174016; ETA: 0s
2016-05-19 18:04:29 INFO Row copy complete
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 0/100; Elapsed: 7m17s(copy), 7m24s(total); streamer: mysql-bin.002589:17255091; ETA: 0s
2016-05-19 18:04:29 INFO Stopping replication
2016-05-19 18:04:29 INFO Replication stopped
2016-05-19 18:04:29 INFO Verifying SQL thread is running
2016-05-19 18:04:29 INFO SQL thread started
2016-05-19 18:04:29 INFO Replication IO thread at mysql-bin.001801:719204179. SQL thread is at mysql-bin.001801:719204179
2016-05-19 18:04:29 INFO Writing changelog state: AllEventsUpToLockProcessed
2016-05-19 18:04:29 INFO Waiting for events up to lock
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 1/100; Elapsed: 7m18s(copy), 7m24s(total); streamer: mysql-bin.002589:17702369; ETA: 0s
2016-05-19 18:04:30 INFO Done waiting for events up to lock
Copy: 4466492/4467235 100.0%; Applied: 23309; Backlog: 0/100; Elapsed: 7m18s(copy), 7m25s(total); streamer: mysql-bin.002589:17703056; ETA: 0s
```
This migration - insofar - took `7m25s`, has applied `23309` events from the binary log and has copied `4466492` rows onto the ghost table.
- `Applied: 381910`: `381910` events in the binary logs presenting changes to the migrated table have been processed and applied on the _ghost_ table since beginning of migration.
- `Backlog: 0/100`: we are performing well on reading the binary log. There's nothing known in the binary log queue that awaits processing.
- `Backlog: 7/100`: while copying rows, a few events have piled up in the binary log _modifying our table_ that we spotted, and still need to apply.
- `Backlog: 100/100`: our buffer of `100` events is full; you may see this during or right after throttling (the binary logs keep filling up with relevant queries that are not being processed), or immediately following a high workload.
`gh-ost` will always prioritize binlog event processing (backlog) over row-copy; when next possible (throttling completes, in our example), `gh-ost` will drain the queue first, and only then proceed to resume row copy.
There is nothing wrong with seeing `100/100`; it just indicates we're behind at that point in time.
- `Copy: 31291200/43138418`, `Copy: 31389700/43138432`: this migration executed with `--exact-rowcount`. `gh-ost` continuously heuristically updates the total number of expected row copies as migration proceeds, hence the change from `43138418` to `43138432`
- `streamer: mysql-bin.006793:179473435` tells us which binary log entry is `gh-ost` processing at this time.
## Status hint
### Status hint
In addition, once every `10` minutes, a friendly reminder is printed, in the following form:
```
# Migrating `mydb`.`mytable`; Ghost table is `mydb`.`_mytable_gst`
# Migration started at Mon Jun 06 03:45:08 -0700 2016
# chunk-size: 2500; max lag: 1500ms; max-load: map[Threads_connected:30]
# Throttle additional flag file: /tmp/gh-ost.throttle
# Migrating mysql.master-01.com:3306; inspecting mysql.replica-05.com:3306; executing on some.host-17.com
# Migration started at Mon Jul 25 01:13:19 2016
# chunk-size: 500; max lag: 1000ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0
# throttle-additional-flag-file: /tmp/gh-ost.throttle.flag.file
# postpone-cut-over-flag-file: /tmp/gh-ost.postpone.flag.file [set]
# panic-flag-file: /tmp/gh-ost.panic.flag.file
# Serving on unix socket: /tmp/gh-ost.mydb.mytable.sock
```
- The above mostly print out the current configuration. Remember you can [dynamically control](interactive-commands.md) most of them.
- `gh-ost` notes that the `postpone-cut-over-flag-file` file actually exists by printing `[set]`

42
doc/what-if.md Normal file
View File

@ -0,0 +1,42 @@
# What if?
Technical questions and answers. This document will be updated as we go
### What if I'm using Statement Based Replication?
You can still migrate tables with `gh-ost`. We do that. What you will need is a replica configured with:
- `log_bin`
- `log_slave_updates`
- `binlog_format=ROW`
Thus, the replica will transform the master's SBR binlogs into RBR binlogs. `gh-ost` is happy to read the binary logs from the replica. [Read more](migrating-with-sbr.md)
### What if gh-ost crashes halfway through, or I kill it?
Unlike trigger-based solutions, there's nothing urgent to clean up in the event `gh-ost` bails out or gets killed. There are the two tables creates by `gh-ost`:
- The _ghost_ table: `_yourtablename_gho`
- The _changelog_ table: `_yourtablename_ghc`
You may instruct `gh-ost` to drop these tables upon startup; or better yet, you drop them.
### What if the cut-over (table switch) is unable to proceed due to locks/timeout?
There is a `lock_wait_timeout` explicitly associated with the cut-over operation. If your table suddenly suffers from a long running query, the cut-over (involving `LOCK` and `RENAME` statements) may be unable to proceed. There's a finite number of retries, and if none of these succeeds, `gh-ost` bails out.
### What if the migration is causing a high load on my master?
This is where `gh-ost` shines. There is no need to kill it as you may be used to with other tools. You can reconfigure `gh-ost` [on the fly](https://github.com/github/gh-ost/blob/master/doc/interactive-commands.md) to be nicer.
You're always able to actively begin [throttling](throttle.md). Just touch the `throttle-file` or `echo throttle` into `gh-ost`. Otherwise, reconfigure your `max-load`, the `nice-ratio`, the `throttle-query` to gain better thresholds that would suit your needs.
### What if my replicas don't use binary logs?
If the master is running Row Based Replication (RBR) - point `gh-ost` to the master, and specify `--allow-on-master`. See [cheatsheets](cheatsheets.md)
If the master is running Statement Based Replication (SBR) - you have no alternative but to reconfigure a replica with:
- `log_bin`
- `log_slave_updates`
- `binlog_format=ROW`

View File

@ -12,7 +12,6 @@ are all using [triggers](http://dev.mysql.com/doc/refman/5.6/en/triggers.html) t
Use of triggers simplifies a lot of the flow in doing a live table migration, but also poses some limitations or difficulties. Here are reasons why we choose to [design a triggerless solution](triggerless-design.md) to schema migrations.
## Overview
Triggers are stored routines which are invoked on a per-row operation upon `INSERT`, `DELETE`, `UPDATE` on a table.

View File

@ -7,6 +7,8 @@ package base
import (
"fmt"
"os"
"regexp"
"strings"
"sync"
"sync/atomic"
@ -16,6 +18,7 @@ import (
"github.com/github/gh-ost/go/sql"
"gopkg.in/gcfg.v1"
gcfgscanner "gopkg.in/gcfg.v1/scanner"
)
// RowsEstimateMethod is the type of row number estimation
@ -31,10 +34,13 @@ type CutOver int
const (
CutOverAtomic CutOver = iota
CutOverSafe = iota
CutOverTwoStep = iota
)
var (
envVariableRegexp = regexp.MustCompile("[$][{](.*)[}]")
)
// MigrationContext has the general, global state of migration. It is used by
// all components throughout the migration process.
type MigrationContext struct {
@ -43,9 +49,11 @@ type MigrationContext struct {
AlterStatement string
CountTableRows bool
ConcurrentCountTableRows bool
AllowedRunningOnMaster bool
AllowedMasterMaster bool
SwitchToRowBinlogFormat bool
AssumeRBR bool
NullableUniqueKeyAllowed bool
ApproveRenamedColumns bool
SkipRenamedColumns bool
@ -58,26 +66,28 @@ type MigrationContext struct {
defaultNumRetries int64
ChunkSize int64
NiceRatio int64
niceRatio float64
MaxLagMillisecondsThrottleThreshold int64
ReplictionLagQuery string
ThrottleControlReplicaKeys *mysql.InstanceKeyMap
replicationLagQuery string
throttleControlReplicaKeys *mysql.InstanceKeyMap
ThrottleFlagFile string
ThrottleAdditionalFlagFile string
ThrottleQuery string
throttleQuery string
ThrottleCommandedByUser int64
maxLoad LoadMap
criticalLoad LoadMap
PostponeCutOverFlagFile string
SwapTablesTimeoutSeconds int64
CutOverLockTimeoutSeconds int64
PanicFlagFile string
DropServeSocket bool
ServeSocketFile string
ServeTCPPort int64
Noop bool
TestOnReplica bool
MigrateOnReplica bool
TestOnReplicaSkipReplicaStop bool
OkToDropTable bool
InitiallyDropOldTable bool
InitiallyDropGhostTable bool
@ -86,7 +96,9 @@ type MigrationContext struct {
TableEngine string
RowsEstimate int64
RowsDeltaEstimate int64
UsedRowsEstimateMethod RowsEstimateMethod
HasSuperPrivilege bool
OriginalBinlogFormat string
OriginalBinlogRowImage string
InspectorConnectionConfig *mysql.ConnectionConfig
@ -106,6 +118,7 @@ type MigrationContext struct {
throttleReason string
throttleMutex *sync.Mutex
IsPostponingCutOver int64
CountingRowsFlag int64
OriginalTableColumns *sql.ColumnList
OriginalTableUniqueKeys [](*sql.UniqueKey)
@ -149,12 +162,12 @@ func newMigrationContext() *MigrationContext {
ChunkSize: 1000,
InspectorConnectionConfig: mysql.NewConnectionConfig(),
ApplierConnectionConfig: mysql.NewConnectionConfig(),
MaxLagMillisecondsThrottleThreshold: 1000,
SwapTablesTimeoutSeconds: 3,
MaxLagMillisecondsThrottleThreshold: 1500,
CutOverLockTimeoutSeconds: 3,
maxLoad: NewLoadMap(),
criticalLoad: NewLoadMap(),
throttleMutex: &sync.Mutex{},
ThrottleControlReplicaKeys: mysql.NewInstanceKeyMap(),
throttleControlReplicaKeys: mysql.NewInstanceKeyMap(),
configMutex: &sync.Mutex{},
pointOfInterestTimeMutex: &sync.Mutex{},
ColumnRenameMap: make(map[string]string),
@ -211,6 +224,17 @@ func (this *MigrationContext) HasMigrationRange() bool {
return this.MigrationRangeMinValues != nil && this.MigrationRangeMaxValues != nil
}
func (this *MigrationContext) SetCutOverLockTimeoutSeconds(timeoutSeconds int64) error {
if timeoutSeconds < 1 {
return fmt.Errorf("Minimal timeout is 1sec. Timeout remains at %d", this.CutOverLockTimeoutSeconds)
}
if timeoutSeconds > 10 {
return fmt.Errorf("Maximal timeout is 10sec. Timeout remains at %d", this.CutOverLockTimeoutSeconds)
}
this.CutOverLockTimeoutSeconds = timeoutSeconds
return nil
}
func (this *MigrationContext) SetDefaultNumRetries(retries int64) {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
@ -218,6 +242,7 @@ func (this *MigrationContext) SetDefaultNumRetries(retries int64) {
this.defaultNumRetries = retries
}
}
func (this *MigrationContext) MaxRetries() int64 {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
@ -241,7 +266,14 @@ func (this *MigrationContext) IsTransactionalTable() bool {
// ElapsedTime returns time since very beginning of the process
func (this *MigrationContext) ElapsedTime() time.Duration {
return time.Now().Sub(this.StartTime)
return time.Since(this.StartTime)
}
// MarkRowCopyStartTime
func (this *MigrationContext) MarkRowCopyStartTime() {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.RowCopyStartTime = time.Now()
}
// ElapsedRowCopyTime returns time since starting to copy chunks of rows
@ -249,8 +281,13 @@ func (this *MigrationContext) ElapsedRowCopyTime() time.Duration {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
if this.RowCopyStartTime.IsZero() {
// Row copy hasn't started yet
return 0
}
if this.RowCopyEndTime.IsZero() {
return time.Now().Sub(this.RowCopyStartTime)
return time.Since(this.RowCopyStartTime)
}
return this.RowCopyEndTime.Sub(this.RowCopyStartTime)
}
@ -284,7 +321,14 @@ func (this *MigrationContext) TimeSincePointOfInterest() time.Duration {
this.pointOfInterestTimeMutex.Lock()
defer this.pointOfInterestTimeMutex.Unlock()
return time.Now().Sub(this.pointOfInterestTime)
return time.Since(this.pointOfInterestTime)
}
func (this *MigrationContext) SetMaxLagMillisecondsThrottleThreshold(maxLagMillisecondsThrottleThreshold int64) {
if maxLagMillisecondsThrottleThreshold < 1000 {
maxLagMillisecondsThrottleThreshold = 1000
}
atomic.StoreInt64(&this.MaxLagMillisecondsThrottleThreshold, maxLagMillisecondsThrottleThreshold)
}
func (this *MigrationContext) SetChunkSize(chunkSize int64) {
@ -310,13 +354,30 @@ func (this *MigrationContext) IsThrottled() (bool, string) {
return this.isThrottled, this.throttleReason
}
func (this *MigrationContext) GetReplicationLagQuery() string {
var query string
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
query = this.replicationLagQuery
return query
}
func (this *MigrationContext) SetReplicationLagQuery(newQuery string) {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.replicationLagQuery = newQuery
}
func (this *MigrationContext) GetThrottleQuery() string {
var query string
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
query = this.ThrottleQuery
query = this.throttleQuery
return query
}
@ -324,7 +385,7 @@ func (this *MigrationContext) SetThrottleQuery(newQuery string) {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.ThrottleQuery = newQuery
this.throttleQuery = newQuery
}
func (this *MigrationContext) GetMaxLoad() LoadMap {
@ -341,6 +402,26 @@ func (this *MigrationContext) GetCriticalLoad() LoadMap {
return this.criticalLoad.Duplicate()
}
func (this *MigrationContext) GetNiceRatio() float64 {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
return this.niceRatio
}
func (this *MigrationContext) SetNiceRatio(newRatio float64) {
if newRatio < 0.0 {
newRatio = 0.0
}
if newRatio > 100.0 {
newRatio = 100.0
}
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.niceRatio = newRatio
}
// ReadMaxLoad parses the `--max-load` flag, which is in multiple key-value format,
// such as: 'Threads_running=100,Threads_connected=500'
// It only applies changes in case there's no parsing error.
@ -376,7 +457,7 @@ func (this *MigrationContext) GetThrottleControlReplicaKeys() *mysql.InstanceKey
defer this.throttleMutex.Unlock()
keys := mysql.NewInstanceKeyMap()
keys.AddKeys(this.ThrottleControlReplicaKeys.GetInstanceKeys())
keys.AddKeys(this.throttleControlReplicaKeys.GetInstanceKeys())
return keys
}
@ -389,7 +470,15 @@ func (this *MigrationContext) ReadThrottleControlReplicaKeys(throttleControlRepl
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.ThrottleControlReplicaKeys = keys
this.throttleControlReplicaKeys = keys
return nil
}
func (this *MigrationContext) AddThrottleControlReplicaKey(key mysql.InstanceKey) error {
this.throttleMutex.Lock()
defer this.throttleMutex.Unlock()
this.throttleControlReplicaKeys.AddKey(key)
return nil
}
@ -422,8 +511,20 @@ func (this *MigrationContext) ReadConfigFile() error {
if this.ConfigFile == "" {
return nil
}
gcfg.RelaxedParserMode = true
gcfgscanner.RelaxedScannerMode = true
if err := gcfg.ReadFileInto(&this.config, this.ConfigFile); err != nil {
return err
}
// We accept user & password in the form "${SOME_ENV_VARIABLE}" in which case we pull
// the given variable from os env
if submatch := envVariableRegexp.FindStringSubmatch(this.config.Client.User); len(submatch) > 1 {
this.config.Client.User = os.Getenv(submatch[1])
}
if submatch := envVariableRegexp.FindStringSubmatch(this.config.Client.Password); len(submatch) > 1 {
this.config.Client.Password = os.Getenv(submatch[1])
}
return nil
}

View File

@ -63,15 +63,6 @@ func (this *GoMySQLReader) ConnectBinlogStreamer(coordinates mysql.BinlogCoordin
return err
}
func (this *GoMySQLReader) Reconnect() error {
this.binlogSyncer.Close()
connectCoordinates := &mysql.BinlogCoordinates{LogFile: this.currentCoordinates.LogFile, LogPos: 4}
if err := this.ConnectBinlogStreamer(*connectCoordinates); err != nil {
return err
}
return nil
}
func (this *GoMySQLReader) GetCurrentBinlogCoordinates() *mysql.BinlogCoordinates {
this.currentCoordinatesMutex.Lock()
defer this.currentCoordinatesMutex.Unlock()
@ -137,33 +128,23 @@ func (this *GoMySQLReader) StreamEvents(canStopStreaming func() bool, entriesCha
if err != nil {
return err
}
// if rand.Intn(1000) == 0 {
// this.binlogSyncer.Close()
// log.Debugf("current: %+v, hint: %+v", this.currentCoordinates, this.LastAppliedRowsEventHint)
// return log.Errorf(".............haha got random error")
// }
// log.Debugf("0001 ........ currentCoordinates: %+v", this.currentCoordinates) //TODO
func() {
this.currentCoordinatesMutex.Lock()
defer this.currentCoordinatesMutex.Unlock()
this.currentCoordinates.LogPos = int64(ev.Header.LogPos)
}()
if rotateEvent, ok := ev.Event.(*replication.RotateEvent); ok {
// log.Debugf("0008 ........ currentCoordinates: %+v", this.currentCoordinates) //TODO
// ev.Dump(os.Stdout)
func() {
this.currentCoordinatesMutex.Lock()
defer this.currentCoordinatesMutex.Unlock()
this.currentCoordinates.LogFile = string(rotateEvent.NextLogName)
}()
// log.Debugf("0001 ........ currentCoordinates: %+v", this.currentCoordinates) //TODO
log.Infof("rotate to next log name: %s", rotateEvent.NextLogName)
} else if rowsEvent, ok := ev.Event.(*replication.RowsEvent); ok {
if err := this.handleRowsEvent(ev, rowsEvent, entriesChannel); err != nil {
return err
}
}
// log.Debugf("TODO ........ currentCoordinates: %+v", this.currentCoordinates) //TODO
}
log.Debugf("done streaming events")

View File

@ -39,7 +39,7 @@ func acceptSignals(migrationContext *base.MigrationContext) {
}()
}
// main is the application's entry point. It will either spawn a CLI or HTTP itnerfaces.
// main is the application's entry point. It will either spawn a CLI or HTTP interfaces.
func main() {
migrationContext := base.GetMigrationContext()
@ -53,6 +53,7 @@ func main() {
flag.StringVar(&migrationContext.OriginalTableName, "table", "", "table name (mandatory)")
flag.StringVar(&migrationContext.AlterStatement, "alter", "", "alter statement (mandatory)")
flag.BoolVar(&migrationContext.CountTableRows, "exact-rowcount", false, "actually count table rows as opposed to estimate them (results in more accurate progress estimation)")
flag.BoolVar(&migrationContext.ConcurrentCountTableRows, "concurrent-rowcount", false, "(with --exact-rowcount), when true: count rows after row-copy begins, concurrently, and adjust row estimate later on; defaults false: first count rows, then start row copy")
flag.BoolVar(&migrationContext.AllowedRunningOnMaster, "allow-on-master", false, "allow this migration to run directly on master. Preferably it would run on a replica")
flag.BoolVar(&migrationContext.AllowedMasterMaster, "allow-master-master", false, "explicitly allow running in a master-master setup")
flag.BoolVar(&migrationContext.NullableUniqueKeyAllowed, "allow-nullable-unique-key", false, "allow gh-ost to migrate based on a unique key with nullable columns. As long as no NULL values exist, this should be OK. If NULL values exist in chosen key, data may be corrupted. Use at your own risk!")
@ -61,28 +62,32 @@ func main() {
executeFlag := flag.Bool("execute", false, "actually execute the alter & migrate the table. Default is noop: do some tests and exit")
flag.BoolVar(&migrationContext.TestOnReplica, "test-on-replica", false, "Have the migration run on a replica, not on the master. At the end of migration replication is stopped, and tables are swapped and immediately swap-revert. Replication remains stopped and you can compare the two tables for building trust")
flag.BoolVar(&migrationContext.TestOnReplicaSkipReplicaStop, "test-on-replica-skip-replica-stop", false, "When --test-on-replica is enabled, do not issue commands stop replication (requires --test-on-replica)")
flag.BoolVar(&migrationContext.MigrateOnReplica, "migrate-on-replica", false, "Have the migration run on a replica, not on the master. This will do the full migration on the replica including cut-over (as opposed to --test-on-replica)")
flag.BoolVar(&migrationContext.OkToDropTable, "ok-to-drop-table", false, "Shall the tool drop the old table at end of operation. DROPping tables can be a long locking operation, which is why I'm not doing it by default. I'm an online tool, yes?")
flag.BoolVar(&migrationContext.InitiallyDropOldTable, "initially-drop-old-table", false, "Drop a possibly existing OLD table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
flag.BoolVar(&migrationContext.InitiallyDropGhostTable, "initially-drop-ghost-table", false, "Drop a possibly existing Ghost table (remains from a previous run?) before beginning operation. Default is to panic and abort if such table exists")
cutOver := flag.String("cut-over", "atomic", "choose cut-over type (atomic, two-step, voluntary-lock)")
cutOver := flag.String("cut-over", "atomic", "choose cut-over type (default|atomic, two-step)")
flag.BoolVar(&migrationContext.ManagedRowCopy, "managed-rowcopy", false, "Copy row data by first reading rows into app, then applying them (default: rowcopy local to applied server via INSERT INTO ... SELECT)")
flag.BoolVar(&migrationContext.SwitchToRowBinlogFormat, "switch-to-rbr", false, "let this tool automatically switch binary log format to 'ROW' on the replica, if needed. The format will NOT be switched back. I'm too scared to do that, and wish to protect you if you happen to execute another migration while this one is running")
flag.BoolVar(&migrationContext.AssumeRBR, "assume-rbr", false, "set to 'true' when you know for certain your server uses 'ROW' binlog_format. gh-ost is unable to tell, event after reading binlog_format, whether the replication process does indeed use 'ROW', and restarts replication to be certain RBR setting is applied. Such operation requires SUPER privileges which you might not have. Setting this flag avoids restarting replication and you can proceed to use gh-ost without SUPER privileges")
chunkSize := flag.Int64("chunk-size", 1000, "amount of rows to handle in each iteration (allowed range: 100-100,000)")
defaultRetries := flag.Int64("default-retries", 60, "Default number of retries for various operations before panicking")
flag.Int64Var(&migrationContext.NiceRatio, "nice-ratio", 0, "force being 'nice', imply sleep time per chunk time. Example values: 0 is aggressive. 3: for every ms spend in a rowcopy chunk, spend 3ms sleeping immediately after")
cutOverLockTimeoutSeconds := flag.Int64("cut-over-lock-timeout-seconds", 3, "Max number of seconds to hold locks on tables while attempting to cut-over (retry attempted when lock exceeds timeout)")
niceRatio := flag.Float64("nice-ratio", 0, "force being 'nice', imply sleep time per chunk time; range: [0.0..100.0]. Example values: 0 is aggressive. 1: for every 1ms spent copying rows, sleep additional 1ms (effectively doubling runtime); 0.7: for every 10ms spend in a rowcopy chunk, spend 7ms sleeping immediately after")
flag.Int64Var(&migrationContext.MaxLagMillisecondsThrottleThreshold, "max-lag-millis", 1500, "replication lag at which to throttle operation")
flag.StringVar(&migrationContext.ReplictionLagQuery, "replication-lag-query", "", "Query that detects replication lag in seconds. Result can be a floating point (by default gh-ost issues SHOW SLAVE STATUS and reads Seconds_behind_master). If you're using pt-heartbeat, query would be something like: SELECT ROUND(UNIX_TIMESTAMP() - MAX(UNIX_TIMESTAMP(ts))) AS delay FROM my_schema.heartbeat")
maxLagMillis := flag.Int64("max-lag-millis", 1500, "replication lag at which to throttle operation")
replicationLagQuery := flag.String("replication-lag-query", "", "Query that detects replication lag in seconds. Result can be a floating point (by default gh-ost issues SHOW SLAVE STATUS and reads Seconds_behind_master). If you're using pt-heartbeat, query would be something like: SELECT ROUND(UNIX_TIMESTAMP() - MAX(UNIX_TIMESTAMP(ts))) AS delay FROM my_schema.heartbeat")
throttleControlReplicas := flag.String("throttle-control-replicas", "", "List of replicas on which to check for lag; comma delimited. Example: myhost1.com:3306,myhost2.com,myhost3.com:3307")
flag.StringVar(&migrationContext.ThrottleQuery, "throttle-query", "", "when given, issued (every second) to check if operation should throttle. Expecting to return zero for no-throttle, >0 for throttle. Query is issued on the migrated server. Make sure this query is lightweight")
throttleQuery := flag.String("throttle-query", "", "when given, issued (every second) to check if operation should throttle. Expecting to return zero for no-throttle, >0 for throttle. Query is issued on the migrated server. Make sure this query is lightweight")
flag.StringVar(&migrationContext.ThrottleFlagFile, "throttle-flag-file", "", "operation pauses when this file exists; hint: use a file that is specific to the table being altered")
flag.StringVar(&migrationContext.ThrottleAdditionalFlagFile, "throttle-additional-flag-file", "/tmp/gh-ost.throttle", "operation pauses when this file exists; hint: keep default, use for throttling multiple gh-ost operations")
flag.StringVar(&migrationContext.PostponeCutOverFlagFile, "postpone-cut-over-flag-file", "", "while this file exists, migration will postpone the final stage of swapping tables, and will keep on syncing the ghost table. Cut-over/swapping would be ready to perform the moment the file is deleted.")
flag.StringVar(&migrationContext.PanicFlagFile, "panic-flag-file", "", "when this file is created, gh-ost will immediately terminate, without cleanup")
flag.BoolVar(&migrationContext.DropServeSocket, "initially-drop-socket-file", false, "Should gh-ost forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!")
flag.StringVar(&migrationContext.ServeSocketFile, "serve-socket-file", "", "Unix socket file to serve on. Default: auto-determined and advertised upon startup")
flag.Int64Var(&migrationContext.ServeTCPPort, "serve-tcp-port", 0, "TCP port to serve on. Default: disabled")
@ -144,12 +149,19 @@ func main() {
if migrationContext.MigrateOnReplica && migrationContext.TestOnReplica {
log.Fatalf("--migrate-on-replica and --test-on-replica are mutually exclusive")
}
if migrationContext.SwitchToRowBinlogFormat && migrationContext.AssumeRBR {
log.Fatalf("--switch-to-rbr and --assume-rbr are mutually exclusive")
}
if migrationContext.TestOnReplicaSkipReplicaStop {
if !migrationContext.TestOnReplica {
log.Fatalf("--test-on-replica-skip-replica-stop requires --test-on-replica to be enabled")
}
log.Warning("--test-on-replica-skip-replica-stop enabled. We will not stop replication before cut-over. Ensure you have a plugin that does this.")
}
switch *cutOver {
case "atomic", "default", "":
migrationContext.CutOverType = base.CutOverAtomic
case "safe":
migrationContext.CutOverType = base.CutOverSafe
case "two-step":
migrationContext.CutOverType = base.CutOverTwoStep
default:
@ -170,9 +182,16 @@ func main() {
if migrationContext.ServeSocketFile == "" {
migrationContext.ServeSocketFile = fmt.Sprintf("/tmp/gh-ost.%s.%s.sock", migrationContext.DatabaseName, migrationContext.OriginalTableName)
}
migrationContext.SetNiceRatio(*niceRatio)
migrationContext.SetChunkSize(*chunkSize)
migrationContext.SetMaxLagMillisecondsThrottleThreshold(*maxLagMillis)
migrationContext.SetReplicationLagQuery(*replicationLagQuery)
migrationContext.SetThrottleQuery(*throttleQuery)
migrationContext.SetDefaultNumRetries(*defaultRetries)
migrationContext.ApplyCredentials()
if err := migrationContext.SetCutOverLockTimeoutSeconds(*cutOverLockTimeoutSeconds); err != nil {
log.Errore(err)
}
log.Infof("starting gh-ost %+v", AppVersion)
acceptSignals(migrationContext)

View File

@ -107,7 +107,7 @@ func (this *Applier) ValidateOrDropExistingTables() error {
}
}
if this.tableExists(this.migrationContext.GetGhostTableName()) {
return fmt.Errorf("Table %s already exists. Panicking. Use --initially-drop-ghost-table to force dropping it", sql.EscapeName(this.migrationContext.GetGhostTableName()))
return fmt.Errorf("Table %s already exists. Panicking. Use --initially-drop-ghost-table to force dropping it, though I really prefer that you drop it or rename it away", sql.EscapeName(this.migrationContext.GetGhostTableName()))
}
if this.migrationContext.InitiallyDropOldTable {
if err := this.DropOldTable(); err != nil {
@ -115,7 +115,7 @@ func (this *Applier) ValidateOrDropExistingTables() error {
}
}
if this.tableExists(this.migrationContext.GetOldTableName()) {
return fmt.Errorf("Table %s already exists. Panicking. Use --initially-drop-old-table to force dropping it", sql.EscapeName(this.migrationContext.GetOldTableName()))
return fmt.Errorf("Table %s already exists. Panicking. Use --initially-drop-old-table to force dropping it, though I really prefer that you drop it or rename it away", sql.EscapeName(this.migrationContext.GetOldTableName()))
}
return nil
@ -419,7 +419,7 @@ func (this *Applier) ApplyIterationInsertQuery() (chunkSize int64, rowsAffected
return chunkSize, rowsAffected, duration, err
}
rowsAffected, _ = sqlResult.RowsAffected()
duration = time.Now().Sub(startTime)
duration = time.Since(startTime)
log.Debugf(
"Issued INSERT on range: [%s]..[%s]; iteration: %d; chunk-size: %d",
this.migrationContext.MigrationIterationRangeMinValues,
@ -488,22 +488,6 @@ func (this *Applier) SwapTablesQuickAndBumpy() error {
return nil
}
// RenameTable makes coffee. No, wait. It renames a table.
func (this *Applier) RenameTable(fromName, toName string) (err error) {
query := fmt.Sprintf(`rename /* gh-ost */ table %s.%s to %s.%s`,
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(fromName),
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(toName),
)
log.Infof("Renaming %s to %s", fromName, toName)
if _, err := sqlutils.ExecNoPrepare(this.db, query); err != nil {
return log.Errore(err)
}
log.Infof("Table renamed")
return nil
}
// RenameTablesRollback renames back both table: original back to ghost,
// _old back to original. This is used by `--test-on-replica`
func (this *Applier) RenameTablesRollback() (renameError error) {
@ -590,6 +574,7 @@ func (this *Applier) StopReplication() error {
if err := this.StopSlaveSQLThread(); err != nil {
return err
}
readBinlogCoordinates, executeBinlogCoordinates, err := mysql.GetReplicationBinlogCoordinates(this.db)
if err != nil {
return err
@ -603,151 +588,6 @@ func (this *Applier) GetSessionLockName(sessionId int64) string {
return fmt.Sprintf("gh-ost.%d.lock", sessionId)
}
// LockOriginalTableAndWait locks the original table, notifies the lock is in
// place, and awaits further instruction
func (this *Applier) LockOriginalTableAndWait(sessionIdChan chan int64, tableLocked chan<- error, okToUnlockTable <-chan bool, tableUnlocked chan<- error) error {
tx, err := this.db.Begin()
if err != nil {
tableLocked <- err
return err
}
defer func() {
tx.Rollback()
}()
var sessionId int64
if err := tx.QueryRow(`select connection_id()`).Scan(&sessionId); err != nil {
tableLocked <- err
return err
}
sessionIdChan <- sessionId
query := `select get_lock(?, 0)`
lockResult := 0
lockName := this.GetSessionLockName(sessionId)
log.Infof("Grabbing voluntary lock: %s", lockName)
if err := tx.QueryRow(query, lockName).Scan(&lockResult); err != nil || lockResult != 1 {
err := fmt.Errorf("Unable to acquire lock %s", lockName)
tableLocked <- err
return err
}
tableLockTimeoutSeconds := this.migrationContext.SwapTablesTimeoutSeconds * 2
log.Infof("Setting LOCK timeout as %d seconds", tableLockTimeoutSeconds)
query = fmt.Sprintf(`set session lock_wait_timeout:=%d`, tableLockTimeoutSeconds)
if _, err := tx.Exec(query); err != nil {
tableLocked <- err
return err
}
query = fmt.Sprintf(`lock /* gh-ost */ tables %s.%s write`,
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.OriginalTableName),
)
log.Infof("Locking %s.%s",
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.OriginalTableName),
)
this.migrationContext.LockTablesStartTime = time.Now()
if _, err := tx.Exec(query); err != nil {
tableLocked <- err
return err
}
log.Infof("Table locked")
tableLocked <- nil // No error.
// The cut-over phase will proceed to apply remaining backlon onto ghost table,
// and issue RENAMEs. We wait here until told to proceed.
<-okToUnlockTable
// Release
query = `unlock tables`
log.Infof("Releasing lock from %s.%s",
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.OriginalTableName),
)
if _, err := tx.Exec(query); err != nil {
tableUnlocked <- err
return log.Errore(err)
}
log.Infof("Table unlocked")
tableUnlocked <- nil
return nil
}
// RenameOriginalTable will attempt renaming the original table into _old
func (this *Applier) RenameOriginalTable(sessionIdChan chan int64, originalTableRenamed chan<- error) error {
tx, err := this.db.Begin()
if err != nil {
return err
}
defer func() {
tx.Rollback()
originalTableRenamed <- nil
}()
var sessionId int64
if err := tx.QueryRow(`select connection_id()`).Scan(&sessionId); err != nil {
return err
}
sessionIdChan <- sessionId
log.Infof("Setting RENAME timeout as %d seconds", this.migrationContext.SwapTablesTimeoutSeconds)
query := fmt.Sprintf(`set session lock_wait_timeout:=%d`, this.migrationContext.SwapTablesTimeoutSeconds)
if _, err := tx.Exec(query); err != nil {
return err
}
query = fmt.Sprintf(`rename /* gh-ost */ table %s.%s to %s.%s`,
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.OriginalTableName),
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.GetOldTableName()),
)
log.Infof("Issuing and expecting this to block: %s", query)
if _, err := tx.Exec(query); err != nil {
return log.Errore(err)
}
log.Infof("Original table renamed")
return nil
}
// RenameGhostTable will attempt renaming the ghost table into original
func (this *Applier) RenameGhostTable(sessionIdChan chan int64, ghostTableRenamed chan<- error) error {
tx, err := this.db.Begin()
if err != nil {
return err
}
defer func() {
tx.Rollback()
}()
var sessionId int64
if err := tx.QueryRow(`select connection_id()`).Scan(&sessionId); err != nil {
return err
}
sessionIdChan <- sessionId
log.Infof("Setting RENAME timeout as %d seconds", this.migrationContext.SwapTablesTimeoutSeconds)
query := fmt.Sprintf(`set session lock_wait_timeout:=%d`, this.migrationContext.SwapTablesTimeoutSeconds)
if _, err := tx.Exec(query); err != nil {
return err
}
query = fmt.Sprintf(`rename /* gh-ost */ table %s.%s to %s.%s`,
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.GetGhostTableName()),
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.OriginalTableName),
)
log.Infof("Issuing and expecting this to block: %s", query)
if _, err := tx.Exec(query); err != nil {
ghostTableRenamed <- err
return log.Errore(err)
}
log.Infof("Ghost table renamed")
ghostTableRenamed <- nil
return nil
}
// ExpectUsedLock expects the special hint voluntary lock to exist on given session
func (this *Applier) ExpectUsedLock(sessionId int64) error {
var result int64
@ -861,7 +701,7 @@ func (this *Applier) AtomicCutOverMagicLock(sessionIdChan chan int64, tableLocke
return err
}
tableLockTimeoutSeconds := this.migrationContext.SwapTablesTimeoutSeconds * 2
tableLockTimeoutSeconds := this.migrationContext.CutOverLockTimeoutSeconds * 2
log.Infof("Setting LOCK timeout as %d seconds", tableLockTimeoutSeconds)
query = fmt.Sprintf(`set session lock_wait_timeout:=%d`, tableLockTimeoutSeconds)
if _, err := tx.Exec(query); err != nil {
@ -931,7 +771,7 @@ func (this *Applier) AtomicCutOverMagicLock(sessionIdChan chan int64, tableLocke
return nil
}
// RenameOriginalTable will attempt renaming the original table into _old
// AtomicCutoverRename
func (this *Applier) AtomicCutoverRename(sessionIdChan chan int64, tablesRenamed chan<- error) error {
tx, err := this.db.Begin()
if err != nil {
@ -948,8 +788,8 @@ func (this *Applier) AtomicCutoverRename(sessionIdChan chan int64, tablesRenamed
}
sessionIdChan <- sessionId
log.Infof("Setting RENAME timeout as %d seconds", this.migrationContext.SwapTablesTimeoutSeconds)
query := fmt.Sprintf(`set session lock_wait_timeout:=%d`, this.migrationContext.SwapTablesTimeoutSeconds)
log.Infof("Setting RENAME timeout as %d seconds", this.migrationContext.CutOverLockTimeoutSeconds)
query := fmt.Sprintf(`set session lock_wait_timeout:=%d`, this.migrationContext.CutOverLockTimeoutSeconds)
if _, err := tx.Exec(query); err != nil {
return err
}
@ -993,12 +833,12 @@ func (this *Applier) buildDMLEventQuery(dmlEvent *binlog.BinlogDMLEvent) (query
}
case binlog.InsertDML:
{
query, sharedArgs, err := sql.BuildDMLInsertQuery(dmlEvent.DatabaseName, this.migrationContext.GetGhostTableName(), this.migrationContext.OriginalTableColumns, this.migrationContext.MappedSharedColumns, dmlEvent.NewColumnValues.AbstractValues())
query, sharedArgs, err := sql.BuildDMLInsertQuery(dmlEvent.DatabaseName, this.migrationContext.GetGhostTableName(), this.migrationContext.OriginalTableColumns, this.migrationContext.SharedColumns, this.migrationContext.MappedSharedColumns, dmlEvent.NewColumnValues.AbstractValues())
return query, sharedArgs, 1, err
}
case binlog.UpdateDML:
{
query, sharedArgs, uniqueKeyArgs, err := sql.BuildDMLUpdateQuery(dmlEvent.DatabaseName, this.migrationContext.GetGhostTableName(), this.migrationContext.OriginalTableColumns, this.migrationContext.MappedSharedColumns, &this.migrationContext.UniqueKey.Columns, dmlEvent.NewColumnValues.AbstractValues(), dmlEvent.WhereColumnValues.AbstractValues())
query, sharedArgs, uniqueKeyArgs, err := sql.BuildDMLUpdateQuery(dmlEvent.DatabaseName, this.migrationContext.GetGhostTableName(), this.migrationContext.OriginalTableColumns, this.migrationContext.SharedColumns, this.migrationContext.MappedSharedColumns, &this.migrationContext.UniqueKey.Columns, dmlEvent.NewColumnValues.AbstractValues(), dmlEvent.WhereColumnValues.AbstractValues())
args = append(args, sharedArgs...)
args = append(args, uniqueKeyArgs...)
return query, args, 0, err
@ -1014,12 +854,46 @@ func (this *Applier) ApplyDMLEventQuery(dmlEvent *binlog.BinlogDMLEvent) error {
if err != nil {
return err
}
_, err = sqlutils.Exec(this.db, query, args...)
if err == nil {
atomic.AddInt64(&this.migrationContext.TotalDMLEventsApplied, 1)
}
if this.migrationContext.CountTableRows {
atomic.AddInt64(&this.migrationContext.RowsEstimate, rowDelta)
}
// TODO The below is in preparation for transactional writes on the ghost tables.
// Such writes would be, for example:
// - prepended with sql_mode setup
// - prepended with time zone setup
// - prepended with SET SQL_LOG_BIN=0
// - prepended with SET FK_CHECKS=0
// etc.
//
// a known problem: https://github.com/golang/go/issues/9373 -- bitint unsigned values, not supported in database/sql
// is solved by silently converting unsigned bigints to string values.
//
err = func() error {
tx, err := this.db.Begin()
if err != nil {
return err
}
if _, err := tx.Exec(`SET
SESSION time_zone = '+00:00',
sql_mode = CONCAT(@@session.sql_mode, ',STRICT_ALL_TABLES')
`); err != nil {
return err
}
if _, err := tx.Exec(query, args...); err != nil {
return err
}
if err := tx.Commit(); err != nil {
return err
}
return nil
}()
if err != nil {
err = fmt.Errorf("%s; query=%s; args=%+v", err.Error(), query, args)
return log.Errore(err)
}
// no error
atomic.AddInt64(&this.migrationContext.TotalDMLEventsApplied, 1)
if this.migrationContext.CountTableRows {
atomic.AddInt64(&this.migrationContext.RowsDeltaEstimate, rowDelta)
}
return nil
}

View File

@ -9,6 +9,7 @@ import (
gosql "database/sql"
"fmt"
"strings"
"sync/atomic"
"github.com/github/gh-ost/go/base"
"github.com/github/gh-ost/go/mysql"
@ -49,9 +50,6 @@ func (this *Inspector) InitDBConnections() (err error) {
if err := this.validateGrants(); err != nil {
return err
}
// if err := this.restartReplication(); err != nil {
// return err
// }
if err := this.validateBinlogs(); err != nil {
return err
}
@ -68,6 +66,9 @@ func (this *Inspector) ValidateOriginalTable() (err error) {
if err := this.validateTableForeignKeys(); err != nil {
return err
}
if err := this.validateTableTriggers(); err != nil {
return err
}
if err := this.estimateTableRowsViaExplain(); err != nil {
return err
}
@ -130,6 +131,13 @@ func (this *Inspector) InspectOriginalAndGhostTables() (err error) {
this.migrationContext.SharedColumns, this.migrationContext.MappedSharedColumns = this.getSharedColumns(this.migrationContext.OriginalTableColumns, this.migrationContext.GhostTableColumns, this.migrationContext.ColumnRenameMap)
log.Infof("Shared columns are %s", this.migrationContext.SharedColumns)
// By fact that a non-empty unique key exists we also know the shared columns are non-empty
// This additional step looks at which columns are unsigned. We could have merged this within
// the `getTableColumns()` function, but it's a later patch and introduces some complexity; I feel
// comfortable in doing this as a separate step.
this.applyUnsignedColumns(this.migrationContext.DatabaseName, this.migrationContext.OriginalTableName, this.migrationContext.OriginalTableColumns, this.migrationContext.SharedColumns)
this.applyUnsignedColumns(this.migrationContext.DatabaseName, this.migrationContext.GetGhostTableName(), this.migrationContext.GhostTableColumns, this.migrationContext.MappedSharedColumns)
return nil
}
@ -153,6 +161,7 @@ func (this *Inspector) validateGrants() error {
query := `show /* gh-ost */ grants for current_user()`
foundAll := false
foundSuper := false
foundReplicationClient := false
foundReplicationSlave := false
foundDBAll := false
@ -165,6 +174,9 @@ func (this *Inspector) validateGrants() error {
if strings.Contains(grant, `SUPER`) && strings.Contains(grant, ` ON *.*`) {
foundSuper = true
}
if strings.Contains(grant, `REPLICATION CLIENT`) && strings.Contains(grant, ` ON *.*`) {
foundReplicationClient = true
}
if strings.Contains(grant, `REPLICATION SLAVE`) && strings.Contains(grant, ` ON *.*`) {
foundReplicationSlave = true
}
@ -183,16 +195,22 @@ func (this *Inspector) validateGrants() error {
if err != nil {
return err
}
this.migrationContext.HasSuperPrivilege = foundSuper
if foundAll {
log.Infof("User has ALL privileges")
return nil
}
if foundSuper && foundReplicationSlave && foundDBAll {
log.Infof("User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on `%s`", this.migrationContext.DatabaseName)
log.Infof("User has SUPER, REPLICATION SLAVE privileges, and has ALL privileges on %s.*", sql.EscapeName(this.migrationContext.DatabaseName))
return nil
}
return log.Errorf("User has insufficient privileges for migration.")
if foundReplicationClient && foundReplicationSlave && foundDBAll {
log.Infof("User has REPLICATION CLIENT, REPLICATION SLAVE privileges, and has ALL privileges on %s.*", sql.EscapeName(this.migrationContext.DatabaseName))
return nil
}
log.Debugf("Privileges: Super: %t, REPLICATION CLIENT: %t, REPLICATION SLAVE: %t, ALL on *.*: %t, ALL on %s.*: %t", foundSuper, foundReplicationClient, foundReplicationSlave, foundAll, sql.EscapeName(this.migrationContext.DatabaseName), foundDBAll)
return log.Errorf("User has insufficient privileges for migration. Needed: SUPER, REPLICATION SLAVE and ALL on %s.*", sql.EscapeName(this.migrationContext.DatabaseName))
}
// restartReplication is required so that we are _certain_ the binlog format and
@ -225,33 +243,40 @@ func (this *Inspector) restartReplication() error {
// the replication thread apply it.
func (this *Inspector) applyBinlogFormat() error {
if this.migrationContext.RequiresBinlogFormatChange() {
if !this.migrationContext.SwitchToRowBinlogFormat {
return fmt.Errorf("Existing binlog_format is %s. Am not switching it to ROW unless you specify --switch-to-rbr", this.migrationContext.OriginalBinlogFormat)
}
if _, err := sqlutils.ExecNoPrepare(this.db, `set global binlog_format='ROW'`); err != nil {
return err
}
if _, err := sqlutils.ExecNoPrepare(this.db, `set session binlog_format='ROW'`); err != nil {
return err
}
log.Debugf("'ROW' binlog format applied")
}
if err := this.restartReplication(); err != nil {
return err
}
log.Debugf("'ROW' binlog format applied")
return nil
}
// We already have RBR, no explicit switch
if !this.migrationContext.AssumeRBR {
if err := this.restartReplication(); err != nil {
return err
}
}
return nil
}
// validateBinlogs checks that binary log configuration is good to go
func (this *Inspector) validateBinlogs() error {
query := `select @@global.log_bin, @@global.log_slave_updates, @@global.binlog_format`
var hasBinaryLogs, logSlaveUpdates bool
if err := this.db.QueryRow(query).Scan(&hasBinaryLogs, &logSlaveUpdates, &this.migrationContext.OriginalBinlogFormat); err != nil {
query := `select @@global.log_bin, @@global.binlog_format`
var hasBinaryLogs bool
if err := this.db.QueryRow(query).Scan(&hasBinaryLogs, &this.migrationContext.OriginalBinlogFormat); err != nil {
return err
}
if !hasBinaryLogs {
return fmt.Errorf("%s:%d must have binary logs enabled", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
}
if !logSlaveUpdates {
return fmt.Errorf("%s:%d must have log_slave_updates enabled", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
}
if this.migrationContext.RequiresBinlogFormatChange() {
if !this.migrationContext.SwitchToRowBinlogFormat {
return fmt.Errorf("You must be using ROW binlog format. I can switch it for you, provided --switch-to-rbr and that %s:%d doesn't have replicas", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
@ -273,7 +298,7 @@ func (this *Inspector) validateBinlogs() error {
query = `select @@global.binlog_row_image`
if err := this.db.QueryRow(query).Scan(&this.migrationContext.OriginalBinlogRowImage); err != nil {
// Only as of 5.6. We wish to support 5.5 as well
this.migrationContext.OriginalBinlogRowImage = ""
this.migrationContext.OriginalBinlogRowImage = "FULL"
}
this.migrationContext.OriginalBinlogRowImage = strings.ToUpper(this.migrationContext.OriginalBinlogRowImage)
@ -281,6 +306,21 @@ func (this *Inspector) validateBinlogs() error {
return nil
}
// validateLogSlaveUpdates checks that binary log log_slave_updates is set. This test is not required when migrating on replica or when migrating directly on master
func (this *Inspector) validateLogSlaveUpdates() error {
query := `select @@global.log_slave_updates`
var logSlaveUpdates bool
if err := this.db.QueryRow(query).Scan(&logSlaveUpdates); err != nil {
return err
}
if !logSlaveUpdates && !this.migrationContext.InspectorIsAlsoApplier() {
return fmt.Errorf("%s:%d must have log_slave_updates enabled", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
}
log.Infof("binary logs updates validated on %s:%d", this.connectionConfig.Key.Hostname, this.connectionConfig.Key.Port)
return nil
}
// validateTable makes sure the table we need to operate on actually exists
func (this *Inspector) validateTable() error {
query := fmt.Sprintf(`show /* gh-ost */ table status from %s like '%s'`, sql.EscapeName(this.migrationContext.DatabaseName), this.migrationContext.OriginalTableName)
@ -311,7 +351,7 @@ func (this *Inspector) validateTable() error {
// validateTableForeignKeys makes sure no foreign keys exist on the migrated table
func (this *Inspector) validateTableForeignKeys() error {
query := `
SELECT COUNT(*) AS num_foreign_keys
SELECT TABLE_SCHEMA, TABLE_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE
REFERENCED_TABLE_NAME IS NOT NULL
@ -321,8 +361,10 @@ func (this *Inspector) validateTableForeignKeys() error {
`
numForeignKeys := 0
err := sqlutils.QueryRowsMap(this.db, query, func(rowMap sqlutils.RowMap) error {
numForeignKeys = rowMap.GetInt("num_foreign_keys")
fkSchema := rowMap.GetString("TABLE_SCHEMA")
fkTable := rowMap.GetString("TABLE_NAME")
log.Infof("Found foreign key on %s.%s related to %s.%s", sql.EscapeName(fkSchema), sql.EscapeName(fkTable), sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
numForeignKeys++
return nil
},
this.migrationContext.DatabaseName,
@ -334,12 +376,40 @@ func (this *Inspector) validateTableForeignKeys() error {
return err
}
if numForeignKeys > 0 {
return log.Errorf("Found %d foreign keys on %s.%s. Foreign keys are not supported. Bailing out", numForeignKeys, sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
return log.Errorf("Found %d foreign keys related to %s.%s. Foreign keys are not supported. Bailing out", numForeignKeys, sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
}
log.Debugf("Validated no foreign keys exist on table")
return nil
}
// validateTableTriggers makes sure no triggers exist on the migrated table
func (this *Inspector) validateTableTriggers() error {
query := `
SELECT COUNT(*) AS num_triggers
FROM INFORMATION_SCHEMA.TRIGGERS
WHERE
TRIGGER_SCHEMA=?
AND EVENT_OBJECT_TABLE=?
`
numTriggers := 0
err := sqlutils.QueryRowsMap(this.db, query, func(rowMap sqlutils.RowMap) error {
numTriggers = rowMap.GetInt("num_triggers")
return nil
},
this.migrationContext.DatabaseName,
this.migrationContext.OriginalTableName,
)
if err != nil {
return err
}
if numTriggers > 0 {
return log.Errorf("Found triggers on %s.%s. Triggers are not supported at this time. Bailing out", sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
}
log.Debugf("Validated no triggers exist on table")
return nil
}
// estimateTableRowsViaExplain estimates number of rows on original table
func (this *Inspector) estimateTableRowsViaExplain() error {
query := fmt.Sprintf(`explain select /* gh-ost */ * from %s.%s where 1=1`, sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
@ -364,13 +434,21 @@ func (this *Inspector) estimateTableRowsViaExplain() error {
// CountTableRows counts exact number of rows on the original table
func (this *Inspector) CountTableRows() error {
atomic.StoreInt64(&this.migrationContext.CountingRowsFlag, 1)
defer atomic.StoreInt64(&this.migrationContext.CountingRowsFlag, 0)
log.Infof("As instructed, I'm issuing a SELECT COUNT(*) on the table. This may take a while")
query := fmt.Sprintf(`select /* gh-ost */ count(*) as rows from %s.%s`, sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
if err := this.db.QueryRow(query).Scan(&this.migrationContext.RowsEstimate); err != nil {
var rowsEstimate int64
if err := this.db.QueryRow(query).Scan(&rowsEstimate); err != nil {
return err
}
atomic.StoreInt64(&this.migrationContext.RowsEstimate, rowsEstimate)
this.migrationContext.UsedRowsEstimateMethod = base.CountRowsEstimate
log.Infof("Exact number of rows via COUNT: %d", this.migrationContext.RowsEstimate)
log.Infof("Exact number of rows via COUNT: %d", rowsEstimate)
return nil
}
@ -399,6 +477,26 @@ func (this *Inspector) getTableColumns(databaseName, tableName string) (*sql.Col
return sql.NewColumnList(columnNames), nil
}
// applyUnsignedColumns
func (this *Inspector) applyUnsignedColumns(databaseName, tableName string, columnsLists ...*sql.ColumnList) error {
query := fmt.Sprintf(`
show columns from %s.%s
`,
sql.EscapeName(databaseName),
sql.EscapeName(tableName),
)
err := sqlutils.QueryRowsMap(this.db, query, func(rowMap sqlutils.RowMap) error {
columnName := rowMap.GetString("Field")
if strings.Contains(rowMap.GetString("Type"), "unsigned") {
for _, columnsList := range columnsLists {
columnsList.SetUnsigned(columnName)
}
}
return nil
})
return err
}
// getCandidateUniqueKeys investigates a table and returns the list of unique keys
// candidate for chunking
func (this *Inspector) getCandidateUniqueKeys(tableName string) (uniqueKeys [](*sql.UniqueKey), err error) {

View File

@ -45,7 +45,8 @@ type PrintStatusRule int
const (
HeuristicPrintStatusRule PrintStatusRule = iota
ForcePrintStatusRule = iota
ForcePrintStatusAndHint = iota
ForcePrintStatusOnlyRule = iota
ForcePrintStatusAndHintRule = iota
)
// Migrator is the main schema migration flow manager.
@ -148,17 +149,22 @@ func (this *Migrator) shouldThrottle() (result bool, reason string) {
}
}
// Replication lag throttle
maxLagMillisecondsThrottleThreshold := atomic.LoadInt64(&this.migrationContext.MaxLagMillisecondsThrottleThreshold)
lag := atomic.LoadInt64(&this.migrationContext.CurrentLag)
if time.Duration(lag) > time.Duration(this.migrationContext.MaxLagMillisecondsThrottleThreshold)*time.Millisecond {
if time.Duration(lag) > time.Duration(maxLagMillisecondsThrottleThreshold)*time.Millisecond {
return true, fmt.Sprintf("lag=%fs", time.Duration(lag).Seconds())
}
if (this.migrationContext.TestOnReplica || this.migrationContext.MigrateOnReplica) && (atomic.LoadInt64(&this.allEventsUpToLockProcessedInjectedFlag) == 0) {
replicationLag, err := mysql.GetMaxReplicationLag(this.migrationContext.InspectorConnectionConfig, this.migrationContext.ThrottleControlReplicaKeys, this.migrationContext.ReplictionLagQuery)
if err != nil {
return true, err.Error()
checkThrottleControlReplicas := true
if (this.migrationContext.TestOnReplica || this.migrationContext.MigrateOnReplica) && (atomic.LoadInt64(&this.allEventsUpToLockProcessedInjectedFlag) > 0) {
checkThrottleControlReplicas = false
}
if replicationLag > time.Duration(this.migrationContext.MaxLagMillisecondsThrottleThreshold)*time.Millisecond {
return true, fmt.Sprintf("replica-lag=%fs", replicationLag.Seconds())
if checkThrottleControlReplicas {
lagResult := mysql.GetMaxReplicationLag(this.migrationContext.InspectorConnectionConfig, this.migrationContext.GetThrottleControlReplicaKeys(), this.migrationContext.GetReplicationLagQuery())
if lagResult.Err != nil {
return true, fmt.Sprintf("%+v %+v", lagResult.Key, lagResult.Err)
}
if lagResult.Lag > time.Duration(maxLagMillisecondsThrottleThreshold)*time.Millisecond {
return true, fmt.Sprintf("%+v replica-lag=%fs", lagResult.Key, lagResult.Lag.Seconds())
}
}
@ -212,13 +218,15 @@ func (this *Migrator) initiateThrottler() error {
// calls callback functions, if any
func (this *Migrator) throttle(onThrottled func()) {
for {
// IsThrottled() is non-blocking; the throttling decision making takes place asynchronously.
// Therefore calling IsThrottled() is cheap
if shouldThrottle, _ := this.migrationContext.IsThrottled(); !shouldThrottle {
return
}
if onThrottled != nil {
onThrottled()
}
time.Sleep(time.Second)
time.Sleep(250 * time.Millisecond)
}
}
@ -326,7 +334,7 @@ func (this *Migrator) onChangelogHeartbeat(heartbeatValue string) (err error) {
if err != nil {
return log.Errore(err)
}
lag := time.Now().Sub(heartbeatTime)
lag := time.Since(heartbeatTime)
atomic.StoreInt64(&this.migrationContext.CurrentLag, int64(lag))
@ -346,13 +354,31 @@ func (this *Migrator) validateStatement() (err error) {
if this.parser.HasNonTrivialRenames() && !this.migrationContext.SkipRenamedColumns {
this.migrationContext.ColumnRenameMap = this.parser.GetNonTrivialRenames()
if !this.migrationContext.ApproveRenamedColumns {
return fmt.Errorf("Alter statement has column(s) renamed. gh-ost suspects the following renames: %v; but to proceed you must approve via `--approve-renamed-columns` (or you can skip renamed columns via `--skip-renamed-columns`)", this.parser.GetNonTrivialRenames())
return fmt.Errorf("gh-ost believes the ALTER statement renames columns, as follows: %v; as precation, you are asked to confirm gh-ost is correct, and provide with `--approve-renamed-columns`, and we're all happy. Or you can skip renamed columns via `--skip-renamed-columns`, in which case column data may be lost", this.parser.GetNonTrivialRenames())
}
log.Infof("Alter statement has column(s) renamed. gh-ost finds the following renames: %v; --approve-renamed-columns is given and so migration proceeds.", this.parser.GetNonTrivialRenames())
}
return nil
}
func (this *Migrator) countTableRows() (err error) {
if !this.migrationContext.CountTableRows {
// Not counting; we stay with an estimate
return nil
}
if this.migrationContext.Noop {
log.Debugf("Noop operation; not really counting table rows")
return nil
}
if this.migrationContext.ConcurrentCountTableRows {
go this.inspector.CountTableRows()
log.Infof("As instructed, counting rows in the background; meanwhile I will use an estimated count, and will update it later on")
// and we ignore errors, because this turns to be a background job
return nil
}
return this.inspector.CountTableRows()
}
// Migrate executes the complete migration logic. This is *the* major gh-ost function.
func (this *Migrator) Migrate() (err error) {
log.Infof("Migrating %s.%s", sql.EscapeName(this.migrationContext.DatabaseName), sql.EscapeName(this.migrationContext.OriginalTableName))
@ -379,7 +405,7 @@ func (this *Migrator) Migrate() (err error) {
return err
}
log.Debugf("Waiting for tables to be in place")
log.Infof("Waiting for tables to be in place")
<-this.tablesInPlace
log.Debugf("Tables are in place")
// Yay! We now know the Ghost and Changelog tables are good to examine!
@ -389,17 +415,15 @@ func (this *Migrator) Migrate() (err error) {
if err := this.inspector.InspectOriginalAndGhostTables(); err != nil {
return err
}
if this.migrationContext.CountTableRows {
if this.migrationContext.Noop {
log.Debugf("Noop operation; not really counting table rows")
} else if err := this.inspector.CountTableRows(); err != nil {
return err
}
}
if err := this.initiateServer(); err != nil {
return err
}
defer this.server.RemoveSocketFile()
if err := this.countTableRows(); err != nil {
return err
}
if err := this.addDMLEventsListener(); err != nil {
return err
}
@ -411,7 +435,7 @@ func (this *Migrator) Migrate() (err error) {
go this.initiateThrottler()
go this.executeWriteFuncs()
go this.iterateChunks()
this.migrationContext.RowCopyStartTime = time.Now()
this.migrationContext.MarkRowCopyStartTime()
go this.initiateStatus()
log.Debugf("Operating until row copy is complete")
@ -468,12 +492,18 @@ func (this *Migrator) cutOver() (err error) {
// the same cut-over phase as the master would use. That means we take locks
// and swap the tables.
// The difference is that we will later swap the tables back.
if this.migrationContext.TestOnReplicaSkipReplicaStop {
log.Warningf("--test-on-replica-skip-replica-stop enabled, we are not stopping replication.")
} else {
log.Debugf("testing on replica. Stopping replication IO thread")
if err := this.retryOperation(this.applier.StopReplication); err != nil {
return err
}
}
// We're merly testing, we don't want to keep this state. Rollback the renames as possible
defer this.applier.RenameTablesRollback()
// We further proceed to do the cutover by normal means; the 'defer' above will rollback the swap
}
if this.migrationContext.CutOverType == base.CutOverAtomic {
// Atomic solution: we use low timeout and multiple attempts. But for
@ -485,16 +515,6 @@ func (this *Migrator) cutOver() (err error) {
)
return err
}
if this.migrationContext.CutOverType == base.CutOverSafe {
// Lock-based solution: we use low timeout and multiple attempts. But for
// each failed attempt, we throttle until replication lag is back to normal
err := this.retryOperation(
func() error {
return this.executeAndThrottleOnError(this.safeCutOver)
},
)
return err
}
if this.migrationContext.CutOverType == base.CutOverTwoStep {
err := this.retryOperation(
func() error {
@ -519,10 +539,10 @@ func (this *Migrator) waitForEventsUpToLock() (err error) {
log.Infof("Waiting for events up to lock")
atomic.StoreInt64(&this.allEventsUpToLockProcessedInjectedFlag, 1)
<-this.allEventsUpToLockProcessed
waitForEventsUpToLockDuration := time.Now().Sub(waitForEventsUpToLockStartTime)
waitForEventsUpToLockDuration := time.Since(waitForEventsUpToLockStartTime)
log.Infof("Done waiting for events up to lock; duration=%+v", waitForEventsUpToLockDuration)
this.printStatus(ForcePrintStatusAndHint)
this.printStatus(ForcePrintStatusAndHintRule)
return nil
}
@ -534,6 +554,7 @@ func (this *Migrator) waitForEventsUpToLock() (err error) {
func (this *Migrator) cutOverTwoStep() (err error) {
atomic.StoreInt64(&this.inCutOverCriticalActionFlag, 1)
defer atomic.StoreInt64(&this.inCutOverCriticalActionFlag, 0)
atomic.StoreInt64(&this.allEventsUpToLockProcessedInjectedFlag, 0)
if err := this.retryOperation(this.applier.LockOriginalTable); err != nil {
return err
@ -564,6 +585,8 @@ func (this *Migrator) atomicCutOver() (err error) {
this.applier.DropAtomicCutOverSentryTableIfExists()
}()
atomic.StoreInt64(&this.allEventsUpToLockProcessedInjectedFlag, 0)
lockOriginalSessionIdChan := make(chan int64, 2)
tableLocked := make(chan error, 2)
okToUnlockTable := make(chan bool, 3)
@ -642,137 +665,6 @@ func (this *Migrator) atomicCutOver() (err error) {
return nil
}
// cutOverSafe performs a safe cut over, where normally (no failure) the original table
// is being locked until swapped, hence DML queries being locked and unaware of the cut-over.
// In the worst case, there will ba a minor outage, where the original table would not exist.
func (this *Migrator) safeCutOver() (err error) {
atomic.StoreInt64(&this.inCutOverCriticalActionFlag, 1)
defer atomic.StoreInt64(&this.inCutOverCriticalActionFlag, 0)
okToUnlockTable := make(chan bool, 2)
originalTableRenamed := make(chan error, 1)
var originalTableRenameIntended int64
defer func() {
log.Infof("Checking to see if we need to roll back")
// The following is to make sure we unlock the table no-matter-what!
// There's enough buffer in the channel to support a redundant write here.
okToUnlockTable <- true
if atomic.LoadInt64(&originalTableRenameIntended) == 1 {
log.Infof("Waiting for original table rename result")
// We need to make sure we wait for the original-rename, successful or not,
// so as to be able to rollback in case the ghost-rename fails.
// But we only wait on this queue if there's actually going to be a rename.
// As an example, what happens should the initial `lock tables` fail? We would
// never proceed to rename the table, hence this queue is never written to.
<-originalTableRenamed
}
// Rollback operation
if !this.applier.tableExists(this.migrationContext.OriginalTableName) {
log.Infof("Cannot find %s, rolling back", this.migrationContext.OriginalTableName)
err := this.applier.RenameTable(this.migrationContext.GetOldTableName(), this.migrationContext.OriginalTableName)
log.Errore(err)
} else {
log.Info("No need for rollback")
}
}()
lockOriginalSessionIdChan := make(chan int64, 1)
tableLocked := make(chan error, 1)
tableUnlocked := make(chan error, 1)
go func() {
if err := this.applier.LockOriginalTableAndWait(lockOriginalSessionIdChan, tableLocked, okToUnlockTable, tableUnlocked); err != nil {
log.Errore(err)
}
}()
if err := <-tableLocked; err != nil {
return log.Errore(err)
}
lockOriginalSessionId := <-lockOriginalSessionIdChan
log.Infof("Session locking original table is %+v", lockOriginalSessionId)
// At this point we know the table is locked.
// We know any newly incoming DML on original table is blocked.
this.waitForEventsUpToLock()
// Step 2
// We now attempt a RENAME on the original table, and expect it to block
renameOriginalSessionIdChan := make(chan int64, 1)
this.migrationContext.RenameTablesStartTime = time.Now()
atomic.StoreInt64(&originalTableRenameIntended, 1)
go func() {
this.applier.RenameOriginalTable(renameOriginalSessionIdChan, originalTableRenamed)
}()
renameOriginalSessionId := <-renameOriginalSessionIdChan
log.Infof("Session renaming original table is %+v", renameOriginalSessionId)
if err := this.retryOperation(
func() error {
return this.applier.ExpectProcess(renameOriginalSessionId, "metadata lock", "rename")
}); err != nil {
return err
}
log.Infof("Found RENAME on original table to be blocking, as expected. Double checking original is still being locked")
if err := this.applier.ExpectUsedLock(lockOriginalSessionId); err != nil {
// Abort operation; but make sure to unlock table!
return log.Errore(err)
}
log.Infof("Connection holding lock on original table still exists")
// Now that we've found the RENAME blocking, AND the locking connection still alive,
// we know it is safe to proceed to renaming ghost table.
// Step 3
// We now attempt a RENAME on the ghost table, and expect it to block
renameGhostSessionIdChan := make(chan int64, 1)
ghostTableRenamed := make(chan error, 1)
go func() {
this.applier.RenameGhostTable(renameGhostSessionIdChan, ghostTableRenamed)
}()
renameGhostSessionId := <-renameGhostSessionIdChan
log.Infof("Session renaming ghost table is %+v", renameGhostSessionId)
if err := this.retryOperation(
func() error {
return this.applier.ExpectProcess(renameGhostSessionId, "metadata lock", "rename")
}); err != nil {
return err
}
log.Infof("Found RENAME on ghost table to be blocking, as expected. Will next release lock on original table")
// Step 4
okToUnlockTable <- true
// BAM! original table lock is released, RENAME original->old released,
// RENAME ghost->original is released, queries on original are unblocked.
// (that is, assuming all went well)
if err := <-tableUnlocked; err != nil {
return log.Errore(err)
}
if err := <-ghostTableRenamed; err != nil {
return log.Errore(err)
}
this.migrationContext.RenameTablesEndTime = time.Now()
// ooh nice! We're actually truly and thankfully done
lockAndRenameDuration := this.migrationContext.RenameTablesEndTime.Sub(this.migrationContext.LockTablesStartTime)
log.Infof("Lock & rename duration: %s. During this time, queries on %s were blocked", lockAndRenameDuration, sql.EscapeName(this.migrationContext.OriginalTableName))
return nil
}
// stopWritesAndCompleteMigrationOnReplica will stop replication IO thread, apply
// what DML events are left, and that's it.
// This only applies in --test-on-replica. It leaves replication stopped, with both tables
// in sync. There is no table swap.
func (this *Migrator) stopWritesAndCompleteMigrationOnReplica() (err error) {
log.Debugf("testing on replica. Instead of LOCK tables I will STOP SLAVE")
if err := this.retryOperation(this.applier.StopReplication); err != nil {
return err
}
this.waitForEventsUpToLock()
log.Info("Table duplicated with new schema. Am not touching the original table. Replication is stopped. You may now compare the two tables to gain trust into this tool's operation")
return nil
}
// onServerCommand responds to a user's interactive command
func (this *Migrator) onServerCommand(command string, writer *bufio.Writer) (err error) {
defer writer.Flush()
@ -784,17 +676,22 @@ func (this *Migrator) onServerCommand(command string, writer *bufio.Writer) (err
arg = strings.TrimSpace(tokens[1])
}
throttleHint := "# Note: you may only throttle for as long as your binary logs are not purged\n"
switch command {
case "help":
{
fmt.Fprintln(writer, `available commands:
status # Print a status message
status # Print a detailed status message
sup # Print a short status message
chunk-size=<newsize> # Set a new chunk-size
nice-ratio=<ratio> # Set a new nice-ratio, integer (0 is agrressive)
nice-ratio=<ratio> # Set a new nice-ratio, immediate sleep after each row-copy operation, float (examples: 0 is agrressive, 0.7 adds 70% runtime, 1.0 doubles runtime, 2.0 triples runtime, ...)
critical-load=<load> # Set a new set of max-load thresholds
max-lag-millis=<max-lag> # Set a new replication lag threshold
replication-lag-query=<query> # Set a new query that determines replication lag (no quotes)
max-load=<load> # Set a new set of max-load thresholds
throttle-query=<query> # Set a new throttle-query
throttle-control-replicas=<replicas> #
throttle-query=<query> # Set a new throttle-query (no quotes)
throttle-control-replicas=<replicas> # Set a new comma delimited list of throttle control replicas
throttle # Force throttling
no-throttle # End forced throttling (other throttling may still apply)
unpostpone # Bail out a cut-over postpone; proceed to cut-over
@ -802,8 +699,10 @@ panic # panic and quit without cleanup
help # This message
`)
}
case "sup":
this.printStatus(ForcePrintStatusOnlyRule, writer)
case "info", "status":
this.printStatus(ForcePrintStatusAndHint, writer)
this.printStatus(ForcePrintStatusAndHintRule, writer)
case "chunk-size":
{
if chunkSize, err := strconv.Atoi(arg); err != nil {
@ -811,17 +710,32 @@ help # This message
return log.Errore(err)
} else {
this.migrationContext.SetChunkSize(int64(chunkSize))
this.printStatus(ForcePrintStatusAndHint, writer)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
}
case "nice-ratio":
case "max-lag-millis":
{
if niceRatio, err := strconv.Atoi(arg); err != nil {
if maxLagMillis, err := strconv.Atoi(arg); err != nil {
fmt.Fprintf(writer, "%s\n", err.Error())
return log.Errore(err)
} else {
atomic.StoreInt64(&this.migrationContext.NiceRatio, int64(niceRatio))
this.printStatus(ForcePrintStatusAndHint, writer)
this.migrationContext.SetMaxLagMillisecondsThrottleThreshold(int64(maxLagMillis))
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
}
case "replication-lag-query":
{
this.migrationContext.SetReplicationLagQuery(arg)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "nice-ratio":
{
if niceRatio, err := strconv.ParseFloat(arg, 64); err != nil {
fmt.Fprintf(writer, "%s\n", err.Error())
return log.Errore(err)
} else {
this.migrationContext.SetNiceRatio(niceRatio)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
}
case "max-load":
@ -830,7 +744,7 @@ help # This message
fmt.Fprintf(writer, "%s\n", err.Error())
return log.Errore(err)
}
this.printStatus(ForcePrintStatusAndHint, writer)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "critical-load":
{
@ -838,12 +752,13 @@ help # This message
fmt.Fprintf(writer, "%s\n", err.Error())
return log.Errore(err)
}
this.printStatus(ForcePrintStatusAndHint, writer)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "throttle-query":
{
this.migrationContext.SetThrottleQuery(arg)
this.printStatus(ForcePrintStatusAndHint, writer)
fmt.Fprintf(writer, throttleHint)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "throttle-control-replicas":
{
@ -852,11 +767,13 @@ help # This message
return log.Errore(err)
}
fmt.Fprintf(writer, "%s\n", this.migrationContext.GetThrottleControlReplicaKeys().ToCommaDelimitedList())
this.printStatus(ForcePrintStatusAndHint, writer)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "throttle", "pause", "suspend":
{
atomic.StoreInt64(&this.migrationContext.ThrottleCommandedByUser, 1)
fmt.Fprintf(writer, throttleHint)
this.printStatus(ForcePrintStatusAndHintRule, writer)
}
case "no-throttle", "unthrottle", "resume", "continue":
{
@ -930,12 +847,15 @@ func (this *Migrator) initiateInspector() (err error) {
*this.migrationContext.ApplierConnectionConfig.ImpliedKey, *this.migrationContext.InspectorConnectionConfig.ImpliedKey,
)
this.migrationContext.ApplierConnectionConfig = this.migrationContext.InspectorConnectionConfig.Duplicate()
if this.migrationContext.ThrottleControlReplicaKeys.Len() == 0 {
this.migrationContext.ThrottleControlReplicaKeys.AddKey(this.migrationContext.InspectorConnectionConfig.Key)
if this.migrationContext.GetThrottleControlReplicaKeys().Len() == 0 {
this.migrationContext.AddThrottleControlReplicaKey(this.migrationContext.InspectorConnectionConfig.Key)
}
} else if this.migrationContext.InspectorIsAlsoApplier() && !this.migrationContext.AllowedRunningOnMaster {
return fmt.Errorf("It seems like this migration attempt to run directly on master. Preferably it would be executed on a replica (and this reduces load from the master). To proceed please provide --allow-on-master")
}
if err := this.inspector.validateLogSlaveUpdates(); err != nil {
return err
}
log.Infof("Master found to be %+v", *this.migrationContext.ApplierConnectionConfig.ImpliedKey)
return nil
@ -943,7 +863,7 @@ func (this *Migrator) initiateInspector() (err error) {
// initiateStatus sets and activates the printStatus() ticker
func (this *Migrator) initiateStatus() error {
this.printStatus(ForcePrintStatusAndHint)
this.printStatus(ForcePrintStatusAndHintRule)
statusTick := time.Tick(1 * time.Second)
for range statusTick {
go this.printStatus(HeuristicPrintStatusRule)
@ -974,35 +894,52 @@ func (this *Migrator) printMigrationStatusHint(writers ...io.Writer) {
))
maxLoad := this.migrationContext.GetMaxLoad()
criticalLoad := this.migrationContext.GetCriticalLoad()
fmt.Fprintln(w, fmt.Sprintf("# chunk-size: %+v; max lag: %+vms; max-load: %s; critical-load: %s; nice-ratio: %d",
fmt.Fprintln(w, fmt.Sprintf("# chunk-size: %+v; max-lag-millis: %+vms; max-load: %s; critical-load: %s; nice-ratio: %f",
atomic.LoadInt64(&this.migrationContext.ChunkSize),
atomic.LoadInt64(&this.migrationContext.MaxLagMillisecondsThrottleThreshold),
maxLoad.String(),
criticalLoad.String(),
atomic.LoadInt64(&this.migrationContext.NiceRatio),
this.migrationContext.GetNiceRatio(),
))
if replicationLagQuery := this.migrationContext.GetReplicationLagQuery(); replicationLagQuery != "" {
fmt.Fprintln(w, fmt.Sprintf("# replication-lag-query: %+v",
replicationLagQuery,
))
}
if this.migrationContext.ThrottleFlagFile != "" {
fmt.Fprintln(w, fmt.Sprintf("# Throttle flag file: %+v",
this.migrationContext.ThrottleFlagFile,
setIndicator := ""
if base.FileExists(this.migrationContext.ThrottleFlagFile) {
setIndicator = "[set]"
}
fmt.Fprintln(w, fmt.Sprintf("# throttle-flag-file: %+v %+v",
this.migrationContext.ThrottleFlagFile, setIndicator,
))
}
if this.migrationContext.ThrottleAdditionalFlagFile != "" {
fmt.Fprintln(w, fmt.Sprintf("# Throttle additional flag file: %+v",
this.migrationContext.ThrottleAdditionalFlagFile,
setIndicator := ""
if base.FileExists(this.migrationContext.ThrottleAdditionalFlagFile) {
setIndicator = "[set]"
}
fmt.Fprintln(w, fmt.Sprintf("# throttle-additional-flag-file: %+v %+v",
this.migrationContext.ThrottleAdditionalFlagFile, setIndicator,
))
}
if throttleQuery := this.migrationContext.GetThrottleQuery(); throttleQuery != "" {
fmt.Fprintln(w, fmt.Sprintf("# Throttle query: %+v",
fmt.Fprintln(w, fmt.Sprintf("# throttle-query: %+v",
throttleQuery,
))
}
if this.migrationContext.PostponeCutOverFlagFile != "" {
fmt.Fprintln(w, fmt.Sprintf("# Postpone cut-over flag file: %+v",
this.migrationContext.PostponeCutOverFlagFile,
setIndicator := ""
if base.FileExists(this.migrationContext.PostponeCutOverFlagFile) {
setIndicator = "[set]"
}
fmt.Fprintln(w, fmt.Sprintf("# postpone-cut-over-flag-file: %+v %+v",
this.migrationContext.PostponeCutOverFlagFile, setIndicator,
))
}
if this.migrationContext.PanicFlagFile != "" {
fmt.Fprintln(w, fmt.Sprintf("# Panic flag file: %+v",
fmt.Fprintln(w, fmt.Sprintf("# panic-flag-file: %+v",
this.migrationContext.PanicFlagFile,
))
}
@ -1025,24 +962,31 @@ func (this *Migrator) printStatus(rule PrintStatusRule, writers ...io.Writer) {
elapsedTime := this.migrationContext.ElapsedTime()
elapsedSeconds := int64(elapsedTime.Seconds())
totalRowsCopied := this.migrationContext.GetTotalRowsCopied()
rowsEstimate := atomic.LoadInt64(&this.migrationContext.RowsEstimate)
rowsEstimate := atomic.LoadInt64(&this.migrationContext.RowsEstimate) + atomic.LoadInt64(&this.migrationContext.RowsDeltaEstimate)
var progressPct float64
if rowsEstimate > 0 {
if rowsEstimate == 0 {
progressPct = 100.0
} else {
progressPct = 100.0 * float64(totalRowsCopied) / float64(rowsEstimate)
}
// Before status, let's see if we should print a nice reminder for what exactly we're doing here.
shouldPrintMigrationStatusHint := (elapsedSeconds%600 == 0)
if rule == ForcePrintStatusAndHint {
if rule == ForcePrintStatusAndHintRule {
shouldPrintMigrationStatusHint = true
}
if rule == ForcePrintStatusOnlyRule {
shouldPrintMigrationStatusHint = false
}
if shouldPrintMigrationStatusHint {
this.printMigrationStatusHint(writers...)
}
var etaSeconds float64 = math.MaxFloat64
eta := "N/A"
if atomic.LoadInt64(&this.migrationContext.IsPostponingCutOver) > 0 {
if atomic.LoadInt64(&this.migrationContext.CountingRowsFlag) > 0 && !this.migrationContext.ConcurrentCountTableRows {
eta = "counting rows"
} else if atomic.LoadInt64(&this.migrationContext.IsPostponingCutOver) > 0 {
eta = "postponing cut-over"
} else if isThrottled, throttleReason := this.migrationContext.IsThrottled(); isThrottled {
eta = fmt.Sprintf("throttled, %s", throttleReason)
@ -1061,6 +1005,7 @@ func (this *Migrator) printStatus(rule PrintStatusRule, writers ...io.Writer) {
}
shouldPrintStatus := false
if rule == HeuristicPrintStatusRule {
if elapsedSeconds <= 60 {
shouldPrintStatus = true
} else if etaSeconds <= 60 {
@ -1074,7 +1019,8 @@ func (this *Migrator) printStatus(rule PrintStatusRule, writers ...io.Writer) {
} else {
shouldPrintStatus = (elapsedSeconds%30 == 0)
}
if rule == ForcePrintStatusRule || rule == ForcePrintStatusAndHint {
} else {
// Not heuristic
shouldPrintStatus = true
}
if !shouldPrintStatus {
@ -1219,6 +1165,10 @@ func (this *Migrator) iterateChunks() error {
return nil
}
copyRowsFunc := func() error {
if atomic.LoadInt64(&this.rowCopyCompleteFlag) == 1 {
// Done
return nil
}
hasFurtherRange, err := this.applier.CalculateNextIterationRangeEndValues()
if err != nil {
return terminateRowIteration(err)
@ -1281,9 +1231,10 @@ func (this *Migrator) executeWriteFuncs() error {
if err := copyRowsFunc(); err != nil {
return log.Errore(err)
}
if niceRatio := atomic.LoadInt64(&this.migrationContext.NiceRatio); niceRatio > 0 {
copyRowsDuration := time.Now().Sub(copyRowsStartTime)
sleepTime := copyRowsDuration * time.Duration(niceRatio)
if niceRatio := this.migrationContext.GetNiceRatio(); niceRatio > 0 {
copyRowsDuration := time.Since(copyRowsStartTime)
sleepTimeNanosecondFloat64 := niceRatio * float64(copyRowsDuration.Nanoseconds())
sleepTime := time.Duration(time.Duration(int64(sleepTimeNanosecondFloat64)) * time.Nanosecond)
time.Sleep(sleepTime)
}
}

View File

@ -36,7 +36,7 @@ func (this *Server) BindSocketFile() (err error) {
if this.migrationContext.ServeSocketFile == "" {
return nil
}
if base.FileExists(this.migrationContext.ServeSocketFile) {
if this.migrationContext.DropServeSocket && base.FileExists(this.migrationContext.ServeSocketFile) {
os.Remove(this.migrationContext.ServeSocketFile)
}
this.unixListener, err = net.Listen("unix", this.migrationContext.ServeSocketFile)
@ -47,6 +47,11 @@ func (this *Server) BindSocketFile() (err error) {
return nil
}
func (this *Server) RemoveSocketFile() (err error) {
log.Infof("Removing socket file: %s", this.migrationContext.ServeSocketFile)
return os.Remove(this.migrationContext.ServeSocketFile)
}
func (this *Server) BindTCPPort() (err error) {
if this.migrationContext.ServeTCPPort == 0 {
return nil

View File

@ -189,18 +189,27 @@ func (this *EventsStreamer) StreamEvents(canStopStreaming func() bool) error {
}
}()
// The next should block and execute forever, unless there's a serious error
var successiveFailures int64
var lastAppliedRowsEventHint mysql.BinlogCoordinates
for {
if err := this.binlogReader.StreamEvents(canStopStreaming, this.eventsChannel); err != nil {
log.Infof("StreamEvents encountered unexpected error: %+v", err)
this.migrationContext.MarkPointOfInterest()
time.Sleep(ReconnectStreamerSleepSeconds * time.Second)
// Reposition at same binlog file. Single attempt (TODO: make multiple attempts?)
lastAppliedRowsEventHint := this.binlogReader.LastAppliedRowsEventHint
// See if there's retry overflow
if this.binlogReader.LastAppliedRowsEventHint.Equals(&lastAppliedRowsEventHint) {
successiveFailures += 1
} else {
successiveFailures = 0
}
if successiveFailures > this.migrationContext.MaxRetries() {
return fmt.Errorf("%d successive failures in streamer reconnect at coordinates %+v", lastAppliedRowsEventHint)
}
// Reposition at same binlog file.
lastAppliedRowsEventHint = this.binlogReader.LastAppliedRowsEventHint
log.Infof("Reconnecting... Will resume at %+v", lastAppliedRowsEventHint)
// if err := this.binlogReader.Reconnect(); err != nil {
// return err
// }
if err := this.initBinlogReader(this.GetReconnectBinlogCoordinates()); err != nil {
return err
}

View File

@ -7,6 +7,7 @@ package mysql
import (
"fmt"
"net"
)
// ConnectionConfig is the minimal configuration required to connect to a MySQL server
@ -47,5 +48,11 @@ func (this *ConnectionConfig) Equals(other *ConnectionConfig) bool {
}
func (this *ConnectionConfig) GetDBUri(databaseName string) string {
var ip = net.ParseIP(this.Key.Hostname)
if (ip != nil) && (ip.To4() == nil) {
// Wrap IPv6 literals in square brackets
return fmt.Sprintf("%s:%s@tcp([%s]:%d)/%s", this.User, this.Password, this.Key.Hostname, this.Key.Port, databaseName)
} else {
return fmt.Sprintf("%s:%s@tcp(%s:%d)/%s", this.User, this.Password, this.Key.Hostname, this.Key.Port, databaseName)
}
}

View File

@ -14,6 +14,12 @@ import (
"github.com/outbrain/golib/sqlutils"
)
type ReplicationLagResult struct {
Key InstanceKey
Lag time.Duration
Err error
}
// GetReplicationLag returns replication lag for a given connection config; either by explicit query
// or via SHOW SLAVE STATUS
func GetReplicationLag(connectionConfig *ConnectionConfig, replicationLagQuery string) (replicationLag time.Duration, err error) {
@ -32,7 +38,7 @@ func GetReplicationLag(connectionConfig *ConnectionConfig, replicationLagQuery s
err = sqlutils.QueryRowsMap(db, `show slave status`, func(m sqlutils.RowMap) error {
secondsBehindMaster := m.GetNullInt64("Seconds_Behind_Master")
if !secondsBehindMaster.Valid {
return fmt.Errorf("Replication not running on %+v", connectionConfig.Key)
return fmt.Errorf("replication not running")
}
replicationLag = time.Duration(secondsBehindMaster.Int64) * time.Second
return nil
@ -42,30 +48,30 @@ func GetReplicationLag(connectionConfig *ConnectionConfig, replicationLagQuery s
// GetMaxReplicationLag concurrently checks for replication lag on given list of instance keys,
// each via GetReplicationLag
func GetMaxReplicationLag(baseConnectionConfig *ConnectionConfig, instanceKeyMap *InstanceKeyMap, replicationLagQuery string) (replicationLag time.Duration, err error) {
func GetMaxReplicationLag(baseConnectionConfig *ConnectionConfig, instanceKeyMap *InstanceKeyMap, replicationLagQuery string) (result *ReplicationLagResult) {
result = &ReplicationLagResult{Lag: 0}
if instanceKeyMap.Len() == 0 {
return 0, nil
return result
}
lagsChan := make(chan time.Duration, instanceKeyMap.Len())
errorsChan := make(chan error, instanceKeyMap.Len())
lagResults := make(chan *ReplicationLagResult, instanceKeyMap.Len())
for key := range *instanceKeyMap {
connectionConfig := baseConnectionConfig.Duplicate()
connectionConfig.Key = key
result := &ReplicationLagResult{Key: connectionConfig.Key}
go func() {
lag, err := GetReplicationLag(connectionConfig, replicationLagQuery)
lagsChan <- lag
errorsChan <- err
result.Lag, result.Err = GetReplicationLag(connectionConfig, replicationLagQuery)
lagResults <- result
}()
}
for range *instanceKeyMap {
if lagError := <-errorsChan; lagError != nil {
err = lagError
}
if lag := <-lagsChan; lag.Nanoseconds() > replicationLag.Nanoseconds() {
replicationLag = lag
lagResult := <-lagResults
if lagResult.Err != nil {
result = lagResult
} else if lagResult.Lag.Nanoseconds() > result.Lag.Nanoseconds() {
result = lagResult
}
}
return replicationLag, err
return result
}
func GetMasterKeyFromSlaveStatus(connectionConfig *ConnectionConfig) (masterKey *InstanceKey, err error) {

View File

@ -32,6 +32,29 @@ func EscapeName(name string) string {
return fmt.Sprintf("`%s`", name)
}
func fixArgType(arg interface{}, isUnsigned bool) interface{} {
if !isUnsigned {
return arg
}
// unsigned
if i, ok := arg.(int8); ok {
return uint8(i)
}
if i, ok := arg.(int16); ok {
return uint16(i)
}
if i, ok := arg.(int32); ok {
return uint32(i)
}
if i, ok := arg.(int64); ok {
return strconv.FormatUint(uint64(i), 10)
}
if i, ok := arg.(int); ok {
return uint(i)
}
return arg
}
func buildPreparedValues(length int) []string {
values := make([]string, length, length)
for i := 0; i < length; i++ {
@ -309,7 +332,8 @@ func BuildDMLDeleteQuery(databaseName, tableName string, tableColumns, uniqueKey
}
for _, column := range uniqueKeyColumns.Names {
tableOrdinal := tableColumns.Ordinals[column]
uniqueKeyArgs = append(uniqueKeyArgs, args[tableOrdinal])
arg := fixArgType(args[tableOrdinal], uniqueKeyColumns.IsUnsigned(column))
uniqueKeyArgs = append(uniqueKeyArgs, arg)
}
databaseName = EscapeName(databaseName)
tableName = EscapeName(tableName)
@ -330,7 +354,7 @@ func BuildDMLDeleteQuery(databaseName, tableName string, tableColumns, uniqueKey
return result, uniqueKeyArgs, nil
}
func BuildDMLInsertQuery(databaseName, tableName string, tableColumns, sharedColumns *ColumnList, args []interface{}) (result string, sharedArgs []interface{}, err error) {
func BuildDMLInsertQuery(databaseName, tableName string, tableColumns, sharedColumns, mappedSharedColumns *ColumnList, args []interface{}) (result string, sharedArgs []interface{}, err error) {
if len(args) != tableColumns.Len() {
return result, args, fmt.Errorf("args count differs from table column count in BuildDMLInsertQuery")
}
@ -343,16 +367,17 @@ func BuildDMLInsertQuery(databaseName, tableName string, tableColumns, sharedCol
databaseName = EscapeName(databaseName)
tableName = EscapeName(tableName)
for _, column := range sharedColumns.Names {
for _, column := range mappedSharedColumns.Names {
tableOrdinal := tableColumns.Ordinals[column]
sharedArgs = append(sharedArgs, args[tableOrdinal])
arg := fixArgType(args[tableOrdinal], mappedSharedColumns.IsUnsigned(column))
sharedArgs = append(sharedArgs, arg)
}
sharedColumnNames := duplicateNames(sharedColumns.Names)
for i := range sharedColumnNames {
sharedColumnNames[i] = EscapeName(sharedColumnNames[i])
mappedSharedColumnNames := duplicateNames(mappedSharedColumns.Names)
for i := range mappedSharedColumnNames {
mappedSharedColumnNames[i] = EscapeName(mappedSharedColumnNames[i])
}
preparedValues := buildPreparedValues(sharedColumns.Len())
preparedValues := buildPreparedValues(mappedSharedColumns.Len())
result = fmt.Sprintf(`
replace /* gh-ost %s.%s */ into
@ -362,13 +387,13 @@ func BuildDMLInsertQuery(databaseName, tableName string, tableColumns, sharedCol
(%s)
`, databaseName, tableName,
databaseName, tableName,
strings.Join(sharedColumnNames, ", "),
strings.Join(mappedSharedColumnNames, ", "),
strings.Join(preparedValues, ", "),
)
return result, sharedArgs, nil
}
func BuildDMLUpdateQuery(databaseName, tableName string, tableColumns, sharedColumns, uniqueKeyColumns *ColumnList, valueArgs, whereArgs []interface{}) (result string, sharedArgs, uniqueKeyArgs []interface{}, err error) {
func BuildDMLUpdateQuery(databaseName, tableName string, tableColumns, sharedColumns, mappedSharedColumns, uniqueKeyColumns *ColumnList, valueArgs, whereArgs []interface{}) (result string, sharedArgs, uniqueKeyArgs []interface{}, err error) {
if len(valueArgs) != tableColumns.Len() {
return result, sharedArgs, uniqueKeyArgs, fmt.Errorf("value args count differs from table column count in BuildDMLUpdateQuery")
}
@ -390,21 +415,24 @@ func BuildDMLUpdateQuery(databaseName, tableName string, tableColumns, sharedCol
databaseName = EscapeName(databaseName)
tableName = EscapeName(tableName)
for _, column := range sharedColumns.Names {
for i, column := range sharedColumns.Names {
mappedColumn := mappedSharedColumns.Names[i]
tableOrdinal := tableColumns.Ordinals[column]
sharedArgs = append(sharedArgs, valueArgs[tableOrdinal])
arg := fixArgType(valueArgs[tableOrdinal], mappedSharedColumns.IsUnsigned(mappedColumn))
sharedArgs = append(sharedArgs, arg)
}
for _, column := range uniqueKeyColumns.Names {
tableOrdinal := tableColumns.Ordinals[column]
uniqueKeyArgs = append(uniqueKeyArgs, whereArgs[tableOrdinal])
arg := fixArgType(whereArgs[tableOrdinal], uniqueKeyColumns.IsUnsigned(column))
uniqueKeyArgs = append(uniqueKeyArgs, arg)
}
sharedColumnNames := duplicateNames(sharedColumns.Names)
for i := range sharedColumnNames {
sharedColumnNames[i] = EscapeName(sharedColumnNames[i])
mappedSharedColumnNames := duplicateNames(mappedSharedColumns.Names)
for i := range mappedSharedColumnNames {
mappedSharedColumnNames[i] = EscapeName(mappedSharedColumnNames[i])
}
setClause, err := BuildSetPreparedClause(sharedColumnNames)
setClause, err := BuildSetPreparedClause(mappedSharedColumnNames)
equalsComparison, err := BuildEqualsPreparedComparison(uniqueKeyColumns.Names)
result = fmt.Sprintf(`

View File

@ -397,6 +397,44 @@ func TestBuildDMLDeleteQuery(t *testing.T) {
}
}
func TestBuildDMLDeleteQuerySignedUnsigned(t *testing.T) {
databaseName := "mydb"
tableName := "tbl"
tableColumns := NewColumnList([]string{"id", "name", "rank", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"position"})
{
// test signed (expect no change)
args := []interface{}{3, "testname", "first", -1, 23}
query, uniqueKeyArgs, err := BuildDMLDeleteQuery(databaseName, tableName, tableColumns, uniqueKeyColumns, args)
test.S(t).ExpectNil(err)
expected := `
delete /* gh-ost mydb.tbl */
from
mydb.tbl
where
((position = ?))
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(uniqueKeyArgs, []interface{}{-1}))
}
{
// test unsigned
args := []interface{}{3, "testname", "first", int8(-1), 23}
uniqueKeyColumns.SetUnsigned("position")
query, uniqueKeyArgs, err := BuildDMLDeleteQuery(databaseName, tableName, tableColumns, uniqueKeyColumns, args)
test.S(t).ExpectNil(err)
expected := `
delete /* gh-ost mydb.tbl */
from
mydb.tbl
where
((position = ?))
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(uniqueKeyArgs, []interface{}{uint8(255)}))
}
}
func TestBuildDMLInsertQuery(t *testing.T) {
databaseName := "mydb"
tableName := "tbl"
@ -404,7 +442,7 @@ func TestBuildDMLInsertQuery(t *testing.T) {
args := []interface{}{3, "testname", "first", 17, 23}
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, args)
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNil(err)
expected := `
replace /* gh-ost mydb.tbl */
@ -418,7 +456,7 @@ func TestBuildDMLInsertQuery(t *testing.T) {
}
{
sharedColumns := NewColumnList([]string{"position", "name", "age", "id"})
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, args)
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNil(err)
expected := `
replace /* gh-ost mydb.tbl */
@ -432,16 +470,71 @@ func TestBuildDMLInsertQuery(t *testing.T) {
}
{
sharedColumns := NewColumnList([]string{"position", "name", "surprise", "id"})
_, _, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, args)
_, _, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNotNil(err)
}
{
sharedColumns := NewColumnList([]string{})
_, _, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, args)
_, _, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNotNil(err)
}
}
func TestBuildDMLInsertQuerySignedUnsigned(t *testing.T) {
databaseName := "mydb"
tableName := "tbl"
tableColumns := NewColumnList([]string{"id", "name", "rank", "position", "age"})
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
{
// testing signed
args := []interface{}{3, "testname", "first", int8(-1), 23}
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNil(err)
expected := `
replace /* gh-ost mydb.tbl */
into mydb.tbl
(id, name, position, age)
values
(?, ?, ?, ?)
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", int8(-1), 23}))
}
{
// testing unsigned
args := []interface{}{3, "testname", "first", int8(-1), 23}
sharedColumns.SetUnsigned("position")
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNil(err)
expected := `
replace /* gh-ost mydb.tbl */
into mydb.tbl
(id, name, position, age)
values
(?, ?, ?, ?)
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", uint8(255), 23}))
}
{
// testing unsigned
args := []interface{}{3, "testname", "first", int32(-1), 23}
sharedColumns.SetUnsigned("position")
query, sharedArgs, err := BuildDMLInsertQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, args)
test.S(t).ExpectNil(err)
expected := `
replace /* gh-ost mydb.tbl */
into mydb.tbl
(id, name, position, age)
values
(?, ?, ?, ?)
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", uint32(4294967295), 23}))
}
}
func TestBuildDMLUpdateQuery(t *testing.T) {
databaseName := "mydb"
tableName := "tbl"
@ -451,7 +544,7 @@ func TestBuildDMLUpdateQuery(t *testing.T) {
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"position"})
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
@ -467,7 +560,7 @@ func TestBuildDMLUpdateQuery(t *testing.T) {
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"position", "name"})
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
@ -483,7 +576,7 @@ func TestBuildDMLUpdateQuery(t *testing.T) {
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"age"})
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
@ -499,7 +592,7 @@ func TestBuildDMLUpdateQuery(t *testing.T) {
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"age", "position", "id", "name"})
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
@ -515,13 +608,72 @@ func TestBuildDMLUpdateQuery(t *testing.T) {
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"age", "surprise"})
_, _, _, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
_, _, _, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNotNil(err)
}
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{})
_, _, _, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
_, _, _, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNotNil(err)
}
{
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
mappedColumns := NewColumnList([]string{"id", "name", "role", "age"})
uniqueKeyColumns := NewColumnList([]string{"id"})
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, mappedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
mydb.tbl
set id=?, name=?, role=?, age=?
where
((id = ?))
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", 17, 23}))
test.S(t).ExpectTrue(reflect.DeepEqual(uniqueKeyArgs, []interface{}{3}))
}
}
func TestBuildDMLUpdateQuerySignedUnsigned(t *testing.T) {
databaseName := "mydb"
tableName := "tbl"
tableColumns := NewColumnList([]string{"id", "name", "rank", "position", "age"})
valueArgs := []interface{}{3, "testname", "newval", int8(-17), int8(-2)}
whereArgs := []interface{}{3, "testname", "findme", int8(-3), 56}
sharedColumns := NewColumnList([]string{"id", "name", "position", "age"})
uniqueKeyColumns := NewColumnList([]string{"position"})
{
// test signed
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
mydb.tbl
set id=?, name=?, position=?, age=?
where
((position = ?))
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", int8(-17), int8(-2)}))
test.S(t).ExpectTrue(reflect.DeepEqual(uniqueKeyArgs, []interface{}{int8(-3)}))
}
{
// test unsigned
sharedColumns.SetUnsigned("age")
uniqueKeyColumns.SetUnsigned("position")
query, sharedArgs, uniqueKeyArgs, err := BuildDMLUpdateQuery(databaseName, tableName, tableColumns, sharedColumns, sharedColumns, uniqueKeyColumns, valueArgs, whereArgs)
test.S(t).ExpectNil(err)
expected := `
update /* gh-ost mydb.tbl */
mydb.tbl
set id=?, name=?, position=?, age=?
where
((position = ?))
`
test.S(t).ExpectEquals(normalizeQuery(query), normalizeQuery(expected))
test.S(t).ExpectTrue(reflect.DeepEqual(sharedArgs, []interface{}{3, "testname", int8(-17), uint8(254)}))
test.S(t).ExpectTrue(reflect.DeepEqual(uniqueKeyArgs, []interface{}{uint8(253)}))
}
}

View File

@ -14,24 +14,31 @@ import (
// ColumnsMap maps a column onto its ordinal position
type ColumnsMap map[string]int
func NewColumnsMap(orderedNames []string) ColumnsMap {
func NewEmptyColumnsMap() ColumnsMap {
columnsMap := make(map[string]int)
return ColumnsMap(columnsMap)
}
func NewColumnsMap(orderedNames []string) ColumnsMap {
columnsMap := NewEmptyColumnsMap()
for i, column := range orderedNames {
columnsMap[column] = i
}
return ColumnsMap(columnsMap)
return columnsMap
}
// ColumnList makes for a named list of columns
type ColumnList struct {
Names []string
Ordinals ColumnsMap
UnsignedFlags ColumnsMap
}
// NewColumnList creates an object given ordered list of column names
func NewColumnList(names []string) *ColumnList {
result := &ColumnList{
Names: names,
UnsignedFlags: NewEmptyColumnsMap(),
}
result.Ordinals = NewColumnsMap(result.Names)
return result
@ -41,11 +48,20 @@ func NewColumnList(names []string) *ColumnList {
func ParseColumnList(columns string) *ColumnList {
result := &ColumnList{
Names: strings.Split(columns, ","),
UnsignedFlags: NewEmptyColumnsMap(),
}
result.Ordinals = NewColumnsMap(result.Names)
return result
}
func (this *ColumnList) SetUnsigned(columnName string) {
this.UnsignedFlags[columnName] = 1
}
func (this *ColumnList) IsUnsigned(columnName string) bool {
return this.UnsignedFlags[columnName] == 1
}
func (this *ColumnList) String() string {
return strings.Join(this.Names, ",")
}
@ -89,7 +105,7 @@ func (this *UniqueKey) Len() int {
func (this *UniqueKey) String() string {
description := this.Name
if this.IsAutoIncrement {
description = fmt.Sprintf("%s (auto_incrmenet)", description)
description = fmt.Sprintf("%s (auto_increment)", description)
}
return fmt.Sprintf("%s: %s; has nullable: %+v", description, this.Columns.Names, this.HasNullable)
}

View File

@ -0,0 +1,27 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
e enum('red', 'green', 'blue', 'orange') null default null collate 'utf8_bin',
primary key(id)
) auto_increment=1;
drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, 11, 'red');
insert into gh_ost_test values (null, 13, 'green');
insert into gh_ost_test values (null, 17, 'blue');
set @last_insert_id := last_insert_id();
update gh_ost_test set e='orange' where id = @last_insert_id;
insert into gh_ost_test values (null, 23, null);
set @last_insert_id := last_insert_id();
update gh_ost_test set i=i+1, e=null where id = @last_insert_id;
end ;;

View File

@ -0,0 +1 @@
--alter="change e e enum('red', 'green', 'blue', 'orange', 'yellow') null default null collate 'utf8_bin'"

View File

@ -0,0 +1,26 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
c1 int not null,
c2 int not null,
primary key (id)
) auto_increment=1;
drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert ignore into gh_ost_test values (1, 11, 23);
insert ignore into gh_ost_test values (2, 13, 23);
insert into gh_ost_test values (null, 17, 23);
set @last_insert_id := last_insert_id();
update gh_ost_test set c1=c1+@last_insert_id, c2=c2+@last_insert_id where id=@last_insert_id order by id desc limit 1;
delete from gh_ost_test where id=1;
delete from gh_ost_test where c1=13; -- id=2
end ;;

View File

@ -0,0 +1 @@
--alter="change column c2 c3 int not null" --approve-renamed-columns

112
localtests/test.sh Executable file
View File

@ -0,0 +1,112 @@
#!/bin/bash
# Local integration tests. To be used by CI.
# See https://github.com/github/gh-ost/tree/doc/local-tests.md
#
tests_path=$(dirname $0)
test_logfile=/tmp/gh-ost-test.log
exec_command_file=/tmp/gh-ost-test.bash
master_host=
master_port=
replica_host=
replica_port=
verify_master_and_replica() {
if [ "$(gh-ost-test-mysql-master -e "select 1" -ss)" != "1" ] ; then
echo "Cannot verify gh-ost-test-mysql-master"
exit 1
fi
read master_host master_port <<< $(gh-ost-test-mysql-master -e "select @@hostname, @@port" -ss)
if [ "$(gh-ost-test-mysql-replica -e "select 1" -ss)" != "1" ] ; then
echo "Cannot verify gh-ost-test-mysql-replica"
exit 1
fi
read replica_host replica_port <<< $(gh-ost-test-mysql-replica -e "select @@hostname, @@port" -ss)
}
exec_cmd() {
echo "$@"
command "$@" 1> $test_logfile 2>&1
return $?
}
test_single() {
local test_name
test_name="$1"
echo "Testing: $test_name"
gh-ost-test-mysql-replica -e "start slave"
gh-ost-test-mysql-master test < $tests_path/$test_name/create.sql
extra_args=""
if [ -f $tests_path/$test_name/extra_args ] ; then
extra_args=$(cat $tests_path/$test_name/extra_args)
fi
columns="*"
if [ -f $tests_path/$test_name/test_columns ] ; then
columns=$(cat $tests_path/$test_name/test_columns)
fi
# graceful sleep for replica to catch up
sleep 1
#
cmd="go run go/cmd/gh-ost/main.go \
--user=gh-ost \
--password=gh-ost \
--host=$replica_host \
--port=$replica_port \
--database=test \
--table=gh_ost_test \
--alter='engine=innodb' \
--exact-rowcount \
--switch-to-rbr \
--initially-drop-old-table \
--initially-drop-ghost-table \
--throttle-query='select timestampdiff(second, min(last_update), now()) < 5 from _gh_ost_test_ghc' \
--serve-socket-file=/tmp/gh-ost.test.sock \
--initially-drop-socket-file \
--postpone-cut-over-flag-file=/tmp/gh-ost.postpone.flag \
--test-on-replica \
--default-retries=1 \
--verbose \
--debug \
--stack \
--execute ${extra_args[@]}"
echo $cmd > $exec_command_file
bash $exec_command_file 1> $test_logfile 2>&1
if [ $? -ne 0 ] ; then
echo "ERROR $test_name execution failure. See $test_logfile"
return 1
fi
orig_checksum=$(gh-ost-test-mysql-replica test -e "select ${columns} from gh_ost_test" -ss | md5sum)
ghost_checksum=$(gh-ost-test-mysql-replica test -e "select ${columns} from _gh_ost_test_gho" -ss | md5sum)
if [ "$orig_checksum" != "$ghost_checksum" ] ; then
echo "ERROR $test_name: checksum mismatch"
echo "---"
gh-ost-test-mysql-replica test -e "select ${columns} from gh_ost_test" -ss
echo "---"
gh-ost-test-mysql-replica test -e "select ${columns} from _gh_ost_test_gho" -ss
return 1
fi
}
test_all() {
find $tests_path ! -path . -type d -mindepth 1 -maxdepth 1 | cut -d "/" -f 3 | while read test_name ; do
test_single "$test_name"
if [ $? -ne 0 ] ; then
echo "+ FAIL"
return 1
else
echo "+ pass"
fi
gh-ost-test-mysql-replica -e "start slave"
done
}
verify_master_and_replica
test_all

41
localtests/tz/create.sql Normal file
View File

@ -0,0 +1,41 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
ts0 timestamp default current_timestamp,
ts1 timestamp,
ts2 timestamp,
updated tinyint unsigned default 0,
primary key(id),
key i_idx(i)
) auto_increment=1;
drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, 11, null, now(), now(), 0);
update gh_ost_test set ts2=now() + interval 10 minute, updated = 1 where i = 11 order by id desc limit 1;
set session time_zone='system';
insert into gh_ost_test values (null, 13, null, now(), now(), 0);
update gh_ost_test set ts2=now() + interval 10 minute, updated = 1 where i = 13 order by id desc limit 1;
set session time_zone='+00:00';
insert into gh_ost_test values (null, 17, null, now(), now(), 0);
update gh_ost_test set ts2=now() + interval 10 minute, updated = 1 where i = 17 order by id desc limit 1;
set session time_zone='-03:00';
insert into gh_ost_test values (null, 19, null, now(), now(), 0);
update gh_ost_test set ts2=now() + interval 10 minute, updated = 1 where i = 19 order by id desc limit 1;
set session time_zone='+05:00';
insert into gh_ost_test values (null, 23, null, now(), now(), 0);
update gh_ost_test set ts2=now() + interval 10 minute, updated = 1 where i = 23 order by id desc limit 1;
end ;;

View File

@ -0,0 +1,24 @@
drop table if exists gh_ost_test;
create table gh_ost_test (
id int auto_increment,
i int not null,
bi bigint not null,
iu int unsigned not null,
biu bigint unsigned not null,
primary key(id)
) auto_increment=1;
drop event if exists gh_ost_test;
delimiter ;;
create event gh_ost_test
on schedule every 1 second
starts current_timestamp
ends current_timestamp + interval 60 second
on completion not preserve
enable
do
begin
insert into gh_ost_test values (null, -2147483647, -9223372036854775807, 4294967295, 18446744073709551615);
set @last_insert_id := cast(last_insert_id() as signed);
update gh_ost_test set i=-2147483647+@last_insert_id, bi=-9223372036854775807+@last_insert_id, iu=4294967295-@last_insert_id, biu=18446744073709551615-@last_insert_id where id < @last_insert_id order by id desc limit 1;
end ;;

13
test.sh Executable file
View File

@ -0,0 +1,13 @@
#!/bin/bash
retval=0
for testsuite in base mysql sql
do
pushd go/${testsuite} > /dev/null;
go test $*;
[ $? -ne 0 ] && retval=1
popd > /dev/null;
done
exit $retval

View File

@ -609,7 +609,7 @@ func decodeTimestamp2(data []byte, dec uint16) (string, int, error) {
return "0000-00-00 00:00:00", n, nil
}
t := time.Unix(sec, usec*1000)
t := time.Unix(sec, usec*1000).UTC()
return t.Format(TimeFormat), n, nil
}

View File

@ -16,12 +16,12 @@ import (
"path/filepath"
"unicode"
"unicode/utf8"
)
import (
"gopkg.in/gcfg.v1/token"
)
var RelaxedScannerMode = false
// An ErrorHandler may be provided to Scanner.Init. If a syntax error is
// encountered and a handler was installed, the handler is called with a
// position and an error message. The position points to the beginning of
@ -148,7 +148,7 @@ func (s *Scanner) scanComment() string {
}
func isLetter(ch rune) bool {
return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch >= 0x80 && unicode.IsLetter(ch)
return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_' || ch >= 0x80 && unicode.IsLetter(ch)
}
func isDigit(ch rune) bool {
@ -231,7 +231,7 @@ loop:
hasCR = true
s.next()
}
if s.ch != '\n' {
if s.ch != '\n' && !RelaxedScannerMode {
s.error(offs, "unquoted '\\' must be followed by new line")
break loop
}

8
vendor/gopkg.in/gcfg.v1/set.go generated vendored
View File

@ -16,6 +16,8 @@ type tag struct {
intMode string
}
var RelaxedParserMode = false
func newTag(ts string) tag {
t := tag{}
s := strings.Split(ts, ",")
@ -197,6 +199,9 @@ func set(cfg interface{}, sect, sub, name string, blank bool, value string) erro
vCfg := vPCfg.Elem()
vSect, _ := fieldFold(vCfg, sect)
if !vSect.IsValid() {
if RelaxedParserMode {
return nil
}
return fmt.Errorf("invalid section: section %q", sect)
}
if vSect.Kind() == reflect.Map {
@ -232,6 +237,9 @@ func set(cfg interface{}, sect, sub, name string, blank bool, value string) erro
}
vVar, t := fieldFold(vSect, name)
if !vVar.IsValid() {
if RelaxedParserMode {
return nil
}
return fmt.Errorf("invalid variable: "+
"section %q subsection %q variable %q", sect, sub, name)
}