332 Commits

Author SHA1 Message Date
Shlomi Noach
b53ee24a1f dynamic replication-lag-query 2016-07-26 14:14:25 +02:00
Shlomi Noach
5d23b72955 Merge pull request #107 from github/throttle-control-replicas
fix to throttle-control-replicas check
2016-07-26 12:13:22 +02:00
Shlomi Noach
7a70c24503 replica-migration cleanup; updating allEventsUpToLockProcessedInjectedFlag 2016-07-26 12:06:20 +02:00
Shlomi Noach
6dbf5c31a2 resolved conflict 2016-07-26 11:57:01 +02:00
Shlomi Noach
034ea7646a fix to throttle-control-replicas check 2016-07-26 11:51:24 +02:00
Shlomi Noach
1d77425fbe capped streamer retries 2016-07-25 15:17:30 +02:00
Shlomi Noach
74804559c8 supporting --initially-drop-socket-file
- by default gh-ost will not delete an existing socket file
  and thus, will fail running if socket file exists. This is the desired behavior.
- The flag --initially-drop-socket-file indicates we take responsibility and wish gh-ost to delete this file on startup
2016-07-22 17:34:18 +02:00
Shlomi Noach
ef59a866d8 Removed legacy 'safe cut-over'
Now that we have the atomic cut-over, the former is redundant
2016-07-16 08:12:19 -06:00
Shlomi Noach
8e46b4ceea max-lag-millis is dynamicly controllable 2016-07-13 09:44:00 +02:00
Shlomi Noach
8217536898 supporting --cut-over-lock-timeout-seconds 2016-07-08 10:14:58 +02:00
Shlomi Noach
c116d84acb added nice-ratio 2016-07-04 14:29:09 +02:00
Shlomi Noach
37e3c94c87 supporting 'unpostpone' command 2016-07-01 10:59:09 +02:00
Shlomi Noach
0191b2897d an atomic cut-over implementation, as per issue #82 2016-06-27 11:08:06 +02:00
Shlomi Noach
4f299f320e noop more verbose 2016-06-27 08:49:26 +02:00
Shlomi Noach
e0de69b028 a noop operation dumps SHOW CREATE TABLE 2016-06-22 12:39:13 +02:00
Shlomi Noach
5b20122957 on noop operation, drop ghost table at end 2016-06-22 10:48:17 +02:00
Shlomi Noach
690e046c51 adding --allow-master-master 2016-06-22 10:38:13 +02:00
Shlomi Noach
96e8419a35 Solved cut-over stall; change of table names
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
  - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
  - "old" table name is now `_tbl_del`, "del" standing for "delete"
  - ghost table name is now `_tbl_gho`
  - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
  `--test-on-replica` uses `_tbl_ght` (ghost-test)
  - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
  - changelog table is now `_tbl_ghc` (ghost-changelog)
  - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
2016-06-21 12:56:01 +02:00
Shlomi Noach
cd6b3c5e9e not throttling during cut-over operation 2016-06-21 09:21:58 +02:00
Shlomi Noach
80fcc05eb5 supporting interactive command throttle-control-replicas 2016-06-20 12:09:04 +02:00
Shlomi Noach
f0b012b238 support for 'panic' interactive command 2016-06-20 06:38:29 +02:00
Shlomi Noach
62b8a897e3 Retries, better visibility, documentation
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9 Throttling & critical load
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
d38ff68a15 minor formatting 2016-06-17 11:41:10 +02:00
Shlomi Noach
94f311ec7b supporting --panic-flag-file; when it exists - app panics and exits without cleanup 2016-06-17 11:40:08 +02:00
Shlomi Noach
836d0fe119 Supporting column rename
- Parsing `alter` statement to catch `change old_name new_name ...` statements
- Auto deducing renamed columns
- When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns`
- updated tests
2016-06-17 08:03:18 +02:00
Shlomi Noach
3e83202b97 more elaborate check that user has privileges 2016-06-16 16:06:26 +02:00
Shlomi Noach
7d0ec9c9dc added --migrate-on-replica flag; runs complete migration on replica 2016-06-15 12:18:59 +02:00
Shlomi Noach
85d6883e69 printing migration status on waitForEventsUpToLock() 2016-06-15 10:13:06 +02:00
Shlomi Noach
96bc3804eb test-on-replica stops replication completely 2016-06-14 12:50:07 +02:00
Shlomi Noach
97adbf1ff8 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:01:06 +02:00
Shlomi Noach
cb1c61ac47 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:00:56 +02:00
Shlomi Noach
8292f5608f Safe cut-over
- Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption)
- Self-rollsback in case of failure (restored original table)
2016-06-14 08:35:07 +02:00
Shlomi Noach
e4ed801df5 noting posponing status 2016-06-13 18:36:29 +02:00
Shlomi Noach
b8c7e046a1 test-on-replica to invoke cut-over swap 2016-06-10 11:15:11 +02:00
Shlomi Noach
087d1dd64d suuporting dynamic reconfiguration of max-load 2016-06-09 11:25:01 +02:00
Shlomi Noach
2cdc72bd1c fixed nil TCP listener when TCP undfined 2016-06-07 14:24:30 +02:00
Shlomi Noach
a6c21dcdb0 - --postpone-swap-tables-flag-file renamed to --postpone-cut-over-flag-file
- More `README` documentation
- Added "throttle" documentation
2016-06-07 14:05:25 +02:00
Shlomi Noach
fc00cb2289 adding interactive user commands 2016-06-07 11:59:17 +02:00
Shlomi Noach
bbd19abc9a - requiring --cut-over argument to be two-step|voluntary-lock (will add udf-wait once it is ready)
The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks.
- More data in status hint
- `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`
2016-06-06 12:33:05 +02:00
Shlomi Noach
42ae3e37f5 dropping _osc (changelog) table at end of operation; also better status hint at end of operation 2016-06-01 10:40:49 +02:00
Shlomi Noach
2df94f9c51 printing courtesy reminder once per 10 minutes 2016-05-31 21:12:39 +02:00
Shlomi Noach
9519a66825 added courtesy-reminder 2016-05-26 14:25:32 +02:00
Shlomi Noach
583d6d3147 accepting SIGHUP. Reloads configuration and marks as point of interest 2016-05-25 12:27:58 +02:00
Shlomi Noach
e7239091d7 Merge pull request #45 from github/print-status-point-of-interest
support for marking point-of-interest in migration
2016-05-24 08:48:44 +02:00
Shlomi Noach
20f000833f support for marking point-of-interest in migration 2016-05-23 14:58:53 +02:00
Shlomi Noach
896f560dce after timeout: reconnecting as new replica; skipping queries correctly 2016-05-23 11:12:59 +02:00
Shlomi Noach
5375aa4f69 - Removed use of master_pos_wait(). It was unneccessary in the first place and introduced new problems.
- Supporting `--allow-nullable-unique-key`
  - Tool will bail out if chosen key has nullable columns and the above is not provided
- Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue)
- Introduced reasonable streamer reconnect sleep time
2016-05-20 12:52:14 +02:00
Shlomi Noach
9b54d0208f - Handling gomysql.replication connection timeouts: reconnecting on last known position
- `printStatus()` takes ETA into account
- More info around `master_pos_wait()`
2016-05-19 15:11:36 +02:00
Shlomi Noach
ec34a5ef75 master_pos_wait is now OK to return NULL. We only care if it returns with -1 2016-05-18 15:08:47 +02:00