gh-ost

Author	SHA1	Message	Date
Shlomi Noach	1773f338c2	keeping track of delta rows on concurrent count(*) this means we re-apply delta onto new estimate	2016-08-24 12:16:34 +02:00
Shlomi Noach	553f4c8d13	concurrent row-count	2016-08-24 11:39:44 +02:00
Shlomi Noach	56fd82a824	Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control outstanding. Thank you!	2016-08-24 09:12:21 +02:00
Paulo Bittencourt	2e43718ef3	Add --test-on-replica-skip-replica-stop flag	2016-08-19 17:34:08 -04:00
Paulo Bittencourt	a62f9e0754	Add --test-on-replica-manual-replication-control flag This will wait indefinitely for the replication status to change. This allows us to run test schema changes in RDS without needing custom RDS commands in gh-ost.	2016-08-18 11:53:25 -04:00
Shlomi Noach	00369d7e5d	relaxed config scanner mode - does not fail on MySQL 'prompt' config	2016-08-18 13:58:38 +02:00
Shlomi Noach	ac0b788153	rename trust-rbr to assume-rbr	2016-08-15 11:05:51 +02:00
Shlomi Noach	1995be2b3f	accepting `--trust-rbr` - avoiding need to restart replication - in turn avoiding need for SUPER	2016-08-12 14:26:58 +02:00
Damian Gryski	e02a49449e	all: use time.Since() instead of time.Now().Sub Patch created with: gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .	2016-08-02 08:38:56 -04:00
Shlomi Noach	46bbea2a32	ETA counting rows, fixed copy time on count	2016-07-29 10:40:23 +02:00
Shlomi Noach	be8a023350	nice-ratio is now float64	2016-07-28 14:37:17 +02:00
Shlomi Noach	b53ee24a1f	dynamic replication-lag-query	2016-07-26 14:14:25 +02:00
Shlomi Noach	6dbf5c31a2	resolved conflict	2016-07-26 11:57:01 +02:00
Shlomi Noach	4774b67ffd	config file supports environment variables	2016-07-25 15:46:37 +02:00
Shlomi Noach	74804559c8	supporting --initially-drop-socket-file - by default gh-ost will not delete an existing socket file and thus, will fail running if socket file exists. This is the desired behavior. - The flag --initially-drop-socket-file indicates we take responsibility and wish gh-ost to delete this file on startup	2016-07-22 17:34:18 +02:00
Shlomi Noach	ef59a866d8	Removed legacy 'safe cut-over' Now that we have the atomic cut-over, the former is redundant	2016-07-16 08:12:19 -06:00
Shlomi Noach	8e46b4ceea	max-lag-millis is dynamicly controllable	2016-07-13 09:44:00 +02:00
Shlomi Noach	8217536898	supporting --cut-over-lock-timeout-seconds	2016-07-08 10:14:58 +02:00
Shlomi Noach	c116d84acb	added nice-ratio	2016-07-04 14:29:09 +02:00
Shlomi Noach	0191b2897d	an atomic cut-over implementation, as per issue #82	2016-06-27 11:08:06 +02:00
Shlomi Noach	690e046c51	adding --allow-master-master	2016-06-22 10:38:13 +02:00
Shlomi Noach	96e8419a35	Solved cut-over stall; change of table names - Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed. - Change of table names. Heres the story: - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision. - "old" table name is now `_tbl_del`, "del" standing for "delete" - ghost table name is now `_tbl_gho` - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet). `--test-on-replica` uses `_tbl_ght` (ghost-test) - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica) - changelog table is now `_tbl_ghc` (ghost-changelog) - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc. - just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.	2016-06-21 12:56:01 +02:00
Shlomi Noach	80fcc05eb5	supporting interactive command throttle-control-replicas	2016-06-20 12:09:04 +02:00
Shlomi Noach	62b8a897e3	Retries, better visibility, documentation - Rowcopy time is bounded by copy end-time - Retries are configurable via `--default-retries` (default: `60`) - `migrator` notes the hostname - `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`) - Added lots of code comments - Adding documentation for "triggerless design"	2016-06-19 17:55:37 +02:00
Shlomi Noach	23cb8ea7e9	Throttling & critical load - Added `--throttle-query` param (when returns > 0, throttling applies) - Added `--critical-load`, similar to `--max-load` but implies panic and quit - Recoded -load as `LoadMap` - More info on -load throttle/panic - `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"` - Fixed `change column` (aka rename) handling with quotes - Removed legacy `mysqlbinlog` parser code - Added tests	2016-06-18 21:12:07 +02:00
Shlomi Noach	94f311ec7b	supporting `--panic-flag-file`; when it exists - app panics and exits without cleanup	2016-06-17 11:40:08 +02:00
Shlomi Noach	836d0fe119	Supporting column rename - Parsing `alter` statement to catch `change old_name new_name ...` statements - Auto deducing renamed columns - When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns` - updated tests	2016-06-17 08:03:18 +02:00
Shlomi Noach	7d0ec9c9dc	added --migrate-on-replica flag; runs complete migration on replica	2016-06-15 12:18:59 +02:00
Shlomi Noach	97adbf1ff8	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:01:06 +02:00
Shlomi Noach	cb1c61ac47	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:00:56 +02:00
Shlomi Noach	e4ed801df5	noting posponing status	2016-06-13 18:36:29 +02:00
Shlomi Noach	b8c7e046a1	test-on-replica to invoke cut-over swap	2016-06-10 11:15:11 +02:00
Shlomi Noach	087d1dd64d	suuporting dynamic reconfiguration of max-load	2016-06-09 11:25:01 +02:00
Shlomi Noach	a6c21dcdb0	- `--postpone-swap-tables-flag-file` renamed to `--postpone-cut-over-flag-file` - More `README` documentation - Added "throttle" documentation	2016-06-07 14:05:25 +02:00
Shlomi Noach	fc00cb2289	adding interactive user commands	2016-06-07 11:59:17 +02:00
Shlomi Noach	bbd19abc9a	- requiring `--cut-over` argument to be `two-step\|voluntary-lock` (will add `udf-wait` once it is ready) The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks. - More data in status hint - `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`	2016-06-06 12:33:05 +02:00
Shlomi Noach	20f000833f	support for marking point-of-interest in migration	2016-05-23 14:58:53 +02:00
Shlomi Noach	5375aa4f69	- Removed use of `master_pos_wait()`. It was unneccessary in the first place and introduced new problems. - Supporting `--allow-nullable-unique-key` - Tool will bail out if chosen key has nullable columns and the above is not provided - Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue) - Introduced reasonable streamer reconnect sleep time	2016-05-20 12:52:14 +02:00
Shlomi Noach	df0a7513f5	- user/password provided in CLI override those in config file - user no longer defaults to . - config is now part of Context, and is protected by mutex	2016-05-17 15:35:44 +02:00
Shlomi Noach	879b2b425e	- Support for `--postpone-swap-tables-flag-file`: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized - Fixed version printing - `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end	2016-05-17 14:40:37 +02:00
Shlomi Noach	9d055dbda7	renaming to gh-ost	2016-05-16 11:09:17 +02:00
Shlomi Noach	1e10f1f29e	Solved various race conditions: - Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe. - Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking - Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked. - `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration - With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()` - With `--test-on-replica`, not throttling on replication after replication is stopped (duh) - More debug output	2016-05-16 11:03:15 +02:00
Shlomi Noach	36905d82e3	- supporting `--initially-drop-old-table` - supporting `--initially-drop-ghost-table` - validating existence of `old` and `ghost` before beginning operation	2016-05-03 12:55:17 +03:00
Shlomi Noach	627e412b6b	fixed password assignment	2016-05-03 11:56:53 +03:00
Shlomi Noach	86fd2b617a	initial support for config file	2016-05-03 10:28:48 +03:00
Shlomi Noach	07063a4181	- added `throttle-control-replicas` flag, a list of control replicas - when `--test-on-replica`, the tested replica is implicitly a control replica - added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag - throttling takes both the above into consideration	2016-05-01 21:36:36 +03:00
Shlomi Noach	421ab0fc83	woohoo, logic complete - Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout - If `RENAME` fails (due to the above), we throttle and retry - `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id - `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released. - `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced - `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.	2016-04-22 19:46:34 -07:00
Shlomi Noach	1ed1b0d156	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:41:20 -07:00
Shlomi Noach	54c6d059b5	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:18:56 -07:00
Shlomi Noach	3c85298b77	- Better, fewer NOOP checks around the code - Keeping track of `TotalDMLEventsApplied`	2016-04-19 04:25:32 -07:00

1 2

64 Commits