438 Commits

Author SHA1 Message Date
Shlomi Noach
cd6b3c5e9e not throttling during cut-over operation 2016-06-21 09:21:58 +02:00
Shlomi Noach
80fcc05eb5 supporting interactive command throttle-control-replicas 2016-06-20 12:09:04 +02:00
Shlomi Noach
f0b012b238 support for 'panic' interactive command 2016-06-20 06:38:29 +02:00
Shlomi Noach
62b8a897e3 Retries, better visibility, documentation
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9 Throttling & critical load
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
d38ff68a15 minor formatting 2016-06-17 11:41:10 +02:00
Shlomi Noach
94f311ec7b supporting --panic-flag-file; when it exists - app panics and exits without cleanup 2016-06-17 11:40:08 +02:00
Shlomi Noach
836d0fe119 Supporting column rename
- Parsing `alter` statement to catch `change old_name new_name ...` statements
- Auto deducing renamed columns
- When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns`
- updated tests
2016-06-17 08:03:18 +02:00
Shlomi Noach
3e83202b97 more elaborate check that user has privileges 2016-06-16 16:06:26 +02:00
Shlomi Noach
7d0ec9c9dc added --migrate-on-replica flag; runs complete migration on replica 2016-06-15 12:18:59 +02:00
Shlomi Noach
85d6883e69 printing migration status on waitForEventsUpToLock() 2016-06-15 10:13:06 +02:00
Shlomi Noach
96bc3804eb test-on-replica stops replication completely 2016-06-14 12:50:07 +02:00
Shlomi Noach
97adbf1ff8 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:01:06 +02:00
Shlomi Noach
cb1c61ac47 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:00:56 +02:00
Shlomi Noach
8292f5608f Safe cut-over
- Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption)
- Self-rollsback in case of failure (restored original table)
2016-06-14 08:35:07 +02:00
Shlomi Noach
e4ed801df5 noting posponing status 2016-06-13 18:36:29 +02:00
Shlomi Noach
b8c7e046a1 test-on-replica to invoke cut-over swap 2016-06-10 11:15:11 +02:00
Shlomi Noach
087d1dd64d suuporting dynamic reconfiguration of max-load 2016-06-09 11:25:01 +02:00
Shlomi Noach
2cdc72bd1c fixed nil TCP listener when TCP undfined 2016-06-07 14:24:30 +02:00
Shlomi Noach
a6c21dcdb0 - --postpone-swap-tables-flag-file renamed to --postpone-cut-over-flag-file
- More `README` documentation
- Added "throttle" documentation
2016-06-07 14:05:25 +02:00
Shlomi Noach
fc00cb2289 adding interactive user commands 2016-06-07 11:59:17 +02:00
Shlomi Noach
bbd19abc9a - requiring --cut-over argument to be two-step|voluntary-lock (will add udf-wait once it is ready)
The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks.
- More data in status hint
- `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`
2016-06-06 12:33:05 +02:00
Shlomi Noach
42ae3e37f5 dropping _osc (changelog) table at end of operation; also better status hint at end of operation 2016-06-01 10:40:49 +02:00
Shlomi Noach
2df94f9c51 printing courtesy reminder once per 10 minutes 2016-05-31 21:12:39 +02:00
Shlomi Noach
9519a66825 added courtesy-reminder 2016-05-26 14:25:32 +02:00
Shlomi Noach
583d6d3147 accepting SIGHUP. Reloads configuration and marks as point of interest 2016-05-25 12:27:58 +02:00
Shlomi Noach
e7239091d7 Merge pull request #45 from github/print-status-point-of-interest
support for marking point-of-interest in migration
2016-05-24 08:48:44 +02:00
Shlomi Noach
20f000833f support for marking point-of-interest in migration 2016-05-23 14:58:53 +02:00
Shlomi Noach
896f560dce after timeout: reconnecting as new replica; skipping queries correctly 2016-05-23 11:12:59 +02:00
Shlomi Noach
5375aa4f69 - Removed use of master_pos_wait(). It was unneccessary in the first place and introduced new problems.
- Supporting `--allow-nullable-unique-key`
  - Tool will bail out if chosen key has nullable columns and the above is not provided
- Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue)
- Introduced reasonable streamer reconnect sleep time
2016-05-20 12:52:14 +02:00
Shlomi Noach
9b54d0208f - Handling gomysql.replication connection timeouts: reconnecting on last known position
- `printStatus()` takes ETA into account
- More info around `master_pos_wait()`
2016-05-19 15:11:36 +02:00
Shlomi Noach
ec34a5ef75 master_pos_wait is now OK to return NULL. We only care if it returns with -1 2016-05-18 15:08:47 +02:00
Shlomi Noach
9f56a84b57 Fixing single-row table migration
- `BuildUniqueKeyRangeEndPreparedQuery` supports `includeRangeStartValues` argument
- `applier` sends `this.migrationContext.GetIteration() == 0` as argument
2016-05-18 14:53:09 +02:00
Shlomi Noach
45371d9374 Merge pull request #36 from github/master-pos-wait-fix
some messagages are now Info instead of Debug
2016-05-18 12:21:22 +02:00
Shlomi Noach
df0a7513f5 - user/password provided in CLI override those in config file
- user no longer defaults to .
- config is now part of Context, and is protected by mutex
2016-05-17 15:35:44 +02:00
Shlomi Noach
879b2b425e - Support for --postpone-swap-tables-flag-file: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized
- Fixed version printing
- `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end
2016-05-17 14:40:37 +02:00
Shlomi Noach
065d9c40ec some messagages are now Info instead of Debug 2016-05-17 11:57:43 +02:00
Shlomi Noach
41b0a4f317 supporting --version 2016-05-17 11:51:21 +02:00
Shlomi Noach
21f6ae9dca renaming to gh-ost 2016-05-16 11:10:12 +02:00
Shlomi Noach
9d055dbda7 renaming to gh-ost 2016-05-16 11:09:17 +02:00
Shlomi Noach
1e10f1f29e Solved various race conditions:
- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
2016-05-16 11:03:15 +02:00
Shlomi Noach
134bf385fd initial, simple solution to our-of-order applying of DML events 2016-05-05 17:14:55 +03:00
Shlomi Noach
6528010742 Adding ETA starting at 2% progress 2016-05-05 09:18:19 +03:00
Shlomi Noach
800c1109b0 fixed statistics query: getting the correct column names by unique key 2016-05-04 09:50:00 +03:00
Shlomi Noach
74d8b06db1 exact-rowcount implices updating number of rows as we make progress 2016-05-04 08:23:34 +03:00
Shlomi Noach
36905d82e3 - supporting --initially-drop-old-table
- supporting `--initially-drop-ghost-table`
- validating existence of `old` and `ghost` before beginning operation
2016-05-03 12:55:17 +03:00
Shlomi Noach
627e412b6b fixed password assignment 2016-05-03 11:56:53 +03:00
Shlomi Noach
86fd2b617a initial support for config file 2016-05-03 10:28:48 +03:00
Shlomi Noach
07063a4181 - added throttle-control-replicas flag, a list of control replicas
- when `--test-on-replica`, the tested replica is implicitly a control replica
- added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag
- throttling takes both the above into consideration
2016-05-01 21:36:36 +03:00
Shlomi Noach
421ab0fc83 woohoo, logic complete
- Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout
- If `RENAME` fails (due to the above), we throttle and retry
- `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id
- `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released.
- `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced
- `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.
2016-04-22 19:46:34 -07:00