gh-ost

Author	SHA1	Message	Date
Shlomi Noach	690e046c51	adding --allow-master-master	2016-06-22 10:38:13 +02:00
Shlomi Noach	96e8419a35	Solved cut-over stall; change of table names - Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed. - Change of table names. Heres the story: - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision. - "old" table name is now `_tbl_del`, "del" standing for "delete" - ghost table name is now `_tbl_gho` - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet). `--test-on-replica` uses `_tbl_ght` (ghost-test) - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica) - changelog table is now `_tbl_ghc` (ghost-changelog) - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc. - just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.	2016-06-21 12:56:01 +02:00
Shlomi Noach	cd6b3c5e9e	not throttling during cut-over operation	2016-06-21 09:21:58 +02:00
Shlomi Noach	80fcc05eb5	supporting interactive command throttle-control-replicas	2016-06-20 12:09:04 +02:00
Shlomi Noach	f0b012b238	support for 'panic' interactive command	2016-06-20 06:38:29 +02:00
Shlomi Noach	62b8a897e3	Retries, better visibility, documentation - Rowcopy time is bounded by copy end-time - Retries are configurable via `--default-retries` (default: `60`) - `migrator` notes the hostname - `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`) - Added lots of code comments - Adding documentation for "triggerless design"	2016-06-19 17:55:37 +02:00
Shlomi Noach	23cb8ea7e9	Throttling & critical load - Added `--throttle-query` param (when returns > 0, throttling applies) - Added `--critical-load`, similar to `--max-load` but implies panic and quit - Recoded -load as `LoadMap` - More info on -load throttle/panic - `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"` - Fixed `change column` (aka rename) handling with quotes - Removed legacy `mysqlbinlog` parser code - Added tests	2016-06-18 21:12:07 +02:00
Shlomi Noach	d38ff68a15	minor formatting	2016-06-17 11:41:10 +02:00
Shlomi Noach	94f311ec7b	supporting `--panic-flag-file`; when it exists - app panics and exits without cleanup	2016-06-17 11:40:08 +02:00
Shlomi Noach	836d0fe119	Supporting column rename - Parsing `alter` statement to catch `change old_name new_name ...` statements - Auto deducing renamed columns - When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns` - updated tests	2016-06-17 08:03:18 +02:00
Shlomi Noach	3e83202b97	more elaborate check that user has privileges	2016-06-16 16:06:26 +02:00
Shlomi Noach	7d0ec9c9dc	added --migrate-on-replica flag; runs complete migration on replica	2016-06-15 12:18:59 +02:00
Shlomi Noach	85d6883e69	printing migration status on waitForEventsUpToLock()	2016-06-15 10:13:06 +02:00
Shlomi Noach	96bc3804eb	test-on-replica stops replication completely	2016-06-14 12:50:07 +02:00
Shlomi Noach	97adbf1ff8	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:01:06 +02:00
Shlomi Noach	cb1c61ac47	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:00:56 +02:00
Shlomi Noach	8292f5608f	Safe cut-over - Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption) - Self-rollsback in case of failure (restored original table)	2016-06-14 08:35:07 +02:00
Shlomi Noach	e4ed801df5	noting posponing status	2016-06-13 18:36:29 +02:00
Shlomi Noach	b8c7e046a1	test-on-replica to invoke cut-over swap	2016-06-10 11:15:11 +02:00
Shlomi Noach	087d1dd64d	suuporting dynamic reconfiguration of max-load	2016-06-09 11:25:01 +02:00
Shlomi Noach	2cdc72bd1c	fixed nil TCP listener when TCP undfined	2016-06-07 14:24:30 +02:00
Shlomi Noach	a6c21dcdb0	- `--postpone-swap-tables-flag-file` renamed to `--postpone-cut-over-flag-file` - More `README` documentation - Added "throttle" documentation	2016-06-07 14:05:25 +02:00
Shlomi Noach	fc00cb2289	adding interactive user commands	2016-06-07 11:59:17 +02:00
Shlomi Noach	bbd19abc9a	- requiring `--cut-over` argument to be `two-step\|voluntary-lock` (will add `udf-wait` once it is ready) The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks. - More data in status hint - `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`	2016-06-06 12:33:05 +02:00
Shlomi Noach	42ae3e37f5	dropping _osc (changelog) table at end of operation; also better status hint at end of operation	2016-06-01 10:40:49 +02:00
Shlomi Noach	2df94f9c51	printing courtesy reminder once per 10 minutes	2016-05-31 21:12:39 +02:00
Shlomi Noach	9519a66825	added courtesy-reminder	2016-05-26 14:25:32 +02:00
Shlomi Noach	583d6d3147	accepting SIGHUP. Reloads configuration and marks as point of interest	2016-05-25 12:27:58 +02:00
Shlomi Noach	e7239091d7	Merge pull request #45 from github/print-status-point-of-interest support for marking point-of-interest in migration	2016-05-24 08:48:44 +02:00
Shlomi Noach	20f000833f	support for marking point-of-interest in migration	2016-05-23 14:58:53 +02:00
Shlomi Noach	896f560dce	after timeout: reconnecting as new replica; skipping queries correctly	2016-05-23 11:12:59 +02:00
Shlomi Noach	5375aa4f69	- Removed use of `master_pos_wait()`. It was unneccessary in the first place and introduced new problems. - Supporting `--allow-nullable-unique-key` - Tool will bail out if chosen key has nullable columns and the above is not provided - Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue) - Introduced reasonable streamer reconnect sleep time	2016-05-20 12:52:14 +02:00
Shlomi Noach	9b54d0208f	- Handling gomysql.replication connection timeouts: reconnecting on last known position - `printStatus()` takes ETA into account - More info around `master_pos_wait()`	2016-05-19 15:11:36 +02:00
Shlomi Noach	ec34a5ef75	master_pos_wait is now OK to return NULL. We only care if it returns with -1	2016-05-18 15:08:47 +02:00
Shlomi Noach	9f56a84b57	Fixing single-row table migration - `BuildUniqueKeyRangeEndPreparedQuery` supports `includeRangeStartValues` argument - `applier` sends `this.migrationContext.GetIteration() == 0` as argument	2016-05-18 14:53:09 +02:00
Shlomi Noach	45371d9374	Merge pull request #36 from github/master-pos-wait-fix some messagages are now Info instead of Debug	2016-05-18 12:21:22 +02:00
Shlomi Noach	879b2b425e	- Support for `--postpone-swap-tables-flag-file`: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized - Fixed version printing - `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end	2016-05-17 14:40:37 +02:00
Shlomi Noach	065d9c40ec	some messagages are now Info instead of Debug	2016-05-17 11:57:43 +02:00
Shlomi Noach	9d055dbda7	renaming to gh-ost	2016-05-16 11:09:17 +02:00
Shlomi Noach	1e10f1f29e	Solved various race conditions: - Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe. - Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking - Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked. - `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration - With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()` - With `--test-on-replica`, not throttling on replication after replication is stopped (duh) - More debug output	2016-05-16 11:03:15 +02:00
Shlomi Noach	134bf385fd	initial, simple solution to our-of-order applying of DML events	2016-05-05 17:14:55 +03:00
Shlomi Noach	6528010742	Adding ETA starting at 2% progress	2016-05-05 09:18:19 +03:00
Shlomi Noach	800c1109b0	fixed statistics query: getting the correct column names by unique key	2016-05-04 09:50:00 +03:00
Shlomi Noach	74d8b06db1	exact-rowcount implices updating number of rows as we make progress	2016-05-04 08:23:34 +03:00
Shlomi Noach	36905d82e3	- supporting `--initially-drop-old-table` - supporting `--initially-drop-ghost-table` - validating existence of `old` and `ghost` before beginning operation	2016-05-03 12:55:17 +03:00
Shlomi Noach	07063a4181	- added `throttle-control-replicas` flag, a list of control replicas - when `--test-on-replica`, the tested replica is implicitly a control replica - added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag - throttling takes both the above into consideration	2016-05-01 21:36:36 +03:00
Shlomi Noach	421ab0fc83	woohoo, logic complete - Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout - If `RENAME` fails (due to the above), we throttle and retry - `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id - `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released. - `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced - `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.	2016-04-22 19:46:34 -07:00
Shlomi Noach	1ed1b0d156	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:41:20 -07:00
Shlomi Noach	54c6d059b5	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:18:56 -07:00
Shlomi Noach	3c85298b77	- Better, fewer NOOP checks around the code - Keeping track of `TotalDMLEventsApplied`	2016-04-19 04:25:32 -07:00

1 2

66 Commits