gh-ost

Author	SHA1	Message	Date
Shlomi Noach	4d903d0119	Merge pull request #264 from github/discard-foreign-keys Discard foreign keys	2016-10-12 08:16:27 +02:00
Shlomi Noach	5d92da4a74	Merge branch 'master' into tz-a-different-approach	2016-10-11 17:17:42 +02:00
Shlomi Noach	dbf50afbc7	reading time_zone settings for Inspector and Applier separately. --time-zone overrides both of them, if given	2016-10-11 16:00:26 +02:00
Shlomi Noach	15f4ddfd8a	support for --critical-load-interval-millis this optional flag gives --critical-load a second chance. When configured to positive value, meeting with critical-load spawns a timer. When this timer expires a second check for critical-load is made; if still met, gh-ost bails out. By default the interval is zero, in which case gh-ost bails out immediately on meeting critical load.	2016-10-10 13:21:01 +02:00
Shlomi Noach	6750959e1a	configurable time-zone, native time parsing	2016-10-10 11:39:57 +02:00
Shlomi Noach	1f65473e69	support for --discard-foreign-keys This flag makes migration silently and happily discard any existing foreign keys on migrated table. This is useful for intentional dropping of foreign keys, as gh-ost does not otherwise have support for foreign key migration. At some time in the future gh-ost may support foreign key migration, at which time this flag will be removed	2016-10-07 10:20:50 +02:00
Shlomi Noach	72f63d3042	safe access to applier/inspector hostnames for hooks	2016-10-04 21:18:44 +02:00
Shlomi Noach	e5e0444cc6	supporting --force-named-cut-over - when given, user _must_ specify table name and of course table name must match migrated table	2016-09-12 19:17:36 +02:00
Shlomi Noach	1c6f828091	refactored server command into server.go - added support for cut-over=<tablename> - refactored more code into context	2016-09-12 12:38:14 +02:00
Shlomi Noach	88f2af8111	support for --assume-master-host, master-master/tungsten	2016-09-02 13:09:18 +02:00
randall	82110fcfcf	Add -override-applier-host for use with -allow-master-master for configurations where writes are meant to go to one master, but gh-ost can't automatically determine which	2016-09-01 20:29:26 -07:00
Shlomi Noach	b2c71931c6	refactored all throttling code into throttler.so	2016-08-30 12:25:45 +02:00
Shlomi Noach	23357d0643	WIP: decoupling general throttling from throttle logic	2016-08-30 11:32:17 +02:00
Shlomi Noach	75b2542f26	Merge branch 'master' into reduce-minimum-max-lag	2016-08-30 09:47:33 +02:00
Shlomi Noach	2afb86b9e4	support for millisecond throttling - `--max-lag-millis` is at least `100ms` - `--heartbeat-interval-millis` introduced; defaults `500ms`, can range `100ms` - `1s` - Control replicas lag calculated asynchronously to throttle test - aggressive when `max-lag-millis < 1000` and when `replication-lag-query` is given	2016-08-30 09:41:59 +02:00
Jonah Berquist	10b222bc7b	Reduce minimum maxLagMillisecondsThrottleThreshold to 100ms	2016-08-26 16:44:40 -07:00
Shlomi Noach	c70f405d06	Merge branch 'master' into hooks	2016-08-26 08:39:02 +02:00
Shlomi Noach	fad6743150	added hooks hint message	2016-08-25 13:54:42 +02:00
Shlomi Noach	cb1a7e2805	merged master	2016-08-25 12:32:03 +02:00
Shlomi Noach	1773f338c2	keeping track of delta rows on concurrent count(*) this means we re-apply delta onto new estimate	2016-08-24 12:16:34 +02:00
Shlomi Noach	553f4c8d13	concurrent row-count	2016-08-24 11:39:44 +02:00
Shlomi Noach	56fd82a824	Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control outstanding. Thank you!	2016-08-24 09:12:21 +02:00
Shlomi Noach	1c2a77ef95	hook names; added on-stop-replication hook	2016-08-23 11:35:48 +02:00
Shlomi Noach	6acbe7e3ae	detecting and executing hooks	2016-08-20 08:24:20 +02:00
Paulo Bittencourt	2e43718ef3	Add --test-on-replica-skip-replica-stop flag	2016-08-19 17:34:08 -04:00
Paulo Bittencourt	a62f9e0754	Add --test-on-replica-manual-replication-control flag This will wait indefinitely for the replication status to change. This allows us to run test schema changes in RDS without needing custom RDS commands in gh-ost.	2016-08-18 11:53:25 -04:00
Shlomi Noach	00369d7e5d	relaxed config scanner mode - does not fail on MySQL 'prompt' config	2016-08-18 13:58:38 +02:00
Shlomi Noach	ac0b788153	rename trust-rbr to assume-rbr	2016-08-15 11:05:51 +02:00
Shlomi Noach	1995be2b3f	accepting `--trust-rbr` - avoiding need to restart replication - in turn avoiding need for SUPER	2016-08-12 14:26:58 +02:00
Damian Gryski	e02a49449e	all: use time.Since() instead of time.Now().Sub Patch created with: gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .	2016-08-02 08:38:56 -04:00
Shlomi Noach	46bbea2a32	ETA counting rows, fixed copy time on count	2016-07-29 10:40:23 +02:00
Shlomi Noach	be8a023350	nice-ratio is now float64	2016-07-28 14:37:17 +02:00
Shlomi Noach	b53ee24a1f	dynamic replication-lag-query	2016-07-26 14:14:25 +02:00
Shlomi Noach	6dbf5c31a2	resolved conflict	2016-07-26 11:57:01 +02:00
Shlomi Noach	4774b67ffd	config file supports environment variables	2016-07-25 15:46:37 +02:00
Shlomi Noach	74804559c8	supporting --initially-drop-socket-file - by default gh-ost will not delete an existing socket file and thus, will fail running if socket file exists. This is the desired behavior. - The flag --initially-drop-socket-file indicates we take responsibility and wish gh-ost to delete this file on startup	2016-07-22 17:34:18 +02:00
Shlomi Noach	ef59a866d8	Removed legacy 'safe cut-over' Now that we have the atomic cut-over, the former is redundant	2016-07-16 08:12:19 -06:00
Shlomi Noach	8e46b4ceea	max-lag-millis is dynamicly controllable	2016-07-13 09:44:00 +02:00
Shlomi Noach	8217536898	supporting --cut-over-lock-timeout-seconds	2016-07-08 10:14:58 +02:00
Shlomi Noach	c116d84acb	added nice-ratio	2016-07-04 14:29:09 +02:00
Shlomi Noach	0191b2897d	an atomic cut-over implementation, as per issue #82	2016-06-27 11:08:06 +02:00
Shlomi Noach	690e046c51	adding --allow-master-master	2016-06-22 10:38:13 +02:00
Shlomi Noach	96e8419a35	Solved cut-over stall; change of table names - Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed. - Change of table names. Heres the story: - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision. - "old" table name is now `_tbl_del`, "del" standing for "delete" - ghost table name is now `_tbl_gho` - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet). `--test-on-replica` uses `_tbl_ght` (ghost-test) - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica) - changelog table is now `_tbl_ghc` (ghost-changelog) - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc. - just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.	2016-06-21 12:56:01 +02:00
Shlomi Noach	80fcc05eb5	supporting interactive command throttle-control-replicas	2016-06-20 12:09:04 +02:00
Shlomi Noach	62b8a897e3	Retries, better visibility, documentation - Rowcopy time is bounded by copy end-time - Retries are configurable via `--default-retries` (default: `60`) - `migrator` notes the hostname - `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`) - Added lots of code comments - Adding documentation for "triggerless design"	2016-06-19 17:55:37 +02:00
Shlomi Noach	23cb8ea7e9	Throttling & critical load - Added `--throttle-query` param (when returns > 0, throttling applies) - Added `--critical-load`, similar to `--max-load` but implies panic and quit - Recoded -load as `LoadMap` - More info on -load throttle/panic - `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"` - Fixed `change column` (aka rename) handling with quotes - Removed legacy `mysqlbinlog` parser code - Added tests	2016-06-18 21:12:07 +02:00
Shlomi Noach	94f311ec7b	supporting `--panic-flag-file`; when it exists - app panics and exits without cleanup	2016-06-17 11:40:08 +02:00
Shlomi Noach	836d0fe119	Supporting column rename - Parsing `alter` statement to catch `change old_name new_name ...` statements - Auto deducing renamed columns - When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns` - updated tests	2016-06-17 08:03:18 +02:00
Shlomi Noach	7d0ec9c9dc	added --migrate-on-replica flag; runs complete migration on replica	2016-06-15 12:18:59 +02:00
Shlomi Noach	97adbf1ff8	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:01:06 +02:00
Shlomi Noach	cb1c61ac47	- `--cut-over` no longer mandatory; default to `safe` - Removed `CutOverVoluntaryLock` and associated code - Removed `CutOverUdfWait` - `RenameTablesRollback()` first attempts an atomic swap	2016-06-14 09:00:56 +02:00
Shlomi Noach	e4ed801df5	noting posponing status	2016-06-13 18:36:29 +02:00
Shlomi Noach	b8c7e046a1	test-on-replica to invoke cut-over swap	2016-06-10 11:15:11 +02:00
Shlomi Noach	087d1dd64d	suuporting dynamic reconfiguration of max-load	2016-06-09 11:25:01 +02:00
Shlomi Noach	a6c21dcdb0	- `--postpone-swap-tables-flag-file` renamed to `--postpone-cut-over-flag-file` - More `README` documentation - Added "throttle" documentation	2016-06-07 14:05:25 +02:00
Shlomi Noach	fc00cb2289	adding interactive user commands	2016-06-07 11:59:17 +02:00
Shlomi Noach	bbd19abc9a	- requiring `--cut-over` argument to be `two-step\|voluntary-lock` (will add `udf-wait` once it is ready) The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks. - More data in status hint - `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`	2016-06-06 12:33:05 +02:00
Shlomi Noach	20f000833f	support for marking point-of-interest in migration	2016-05-23 14:58:53 +02:00
Shlomi Noach	5375aa4f69	- Removed use of `master_pos_wait()`. It was unneccessary in the first place and introduced new problems. - Supporting `--allow-nullable-unique-key` - Tool will bail out if chosen key has nullable columns and the above is not provided - Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue) - Introduced reasonable streamer reconnect sleep time	2016-05-20 12:52:14 +02:00
Shlomi Noach	df0a7513f5	- user/password provided in CLI override those in config file - user no longer defaults to . - config is now part of Context, and is protected by mutex	2016-05-17 15:35:44 +02:00
Shlomi Noach	879b2b425e	- Support for `--postpone-swap-tables-flag-file`: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized - Fixed version printing - `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end	2016-05-17 14:40:37 +02:00
Shlomi Noach	9d055dbda7	renaming to gh-ost	2016-05-16 11:09:17 +02:00
Shlomi Noach	1e10f1f29e	Solved various race conditions: - Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe. - Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking - Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked. - `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration - With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()` - With `--test-on-replica`, not throttling on replication after replication is stopped (duh) - More debug output	2016-05-16 11:03:15 +02:00
Shlomi Noach	36905d82e3	- supporting `--initially-drop-old-table` - supporting `--initially-drop-ghost-table` - validating existence of `old` and `ghost` before beginning operation	2016-05-03 12:55:17 +03:00
Shlomi Noach	627e412b6b	fixed password assignment	2016-05-03 11:56:53 +03:00
Shlomi Noach	86fd2b617a	initial support for config file	2016-05-03 10:28:48 +03:00
Shlomi Noach	07063a4181	- added `throttle-control-replicas` flag, a list of control replicas - when `--test-on-replica`, the tested replica is implicitly a control replica - added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag - throttling takes both the above into consideration	2016-05-01 21:36:36 +03:00
Shlomi Noach	421ab0fc83	woohoo, logic complete - Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout - If `RENAME` fails (due to the above), we throttle and retry - `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id - `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released. - `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced - `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.	2016-04-22 19:46:34 -07:00
Shlomi Noach	1ed1b0d156	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:41:20 -07:00
Shlomi Noach	54c6d059b5	- `quick-and-bumpy-swap-tables` uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames) - refactored lock-and-swap code, in preparation for atomic swap	2016-04-22 13:18:56 -07:00
Shlomi Noach	3c85298b77	- Better, fewer NOOP checks around the code - Keeping track of `TotalDMLEventsApplied`	2016-04-19 04:25:32 -07:00
Shlomi Noach	4efbfd6e0f	Merge pull request #19 from github/oops-leftovers oops, leftover file	2016-04-18 10:59:58 -07:00
Shlomi Noach	9dce88e6c0	oops, leftover file	2016-04-18 10:59:34 -07:00
Shlomi Noach	eeffa701d6	- Added `ok-to-drop-table` flag - Added `switch-to-rbr` flag; applying binlog format change if needed - Using dedicated db instance for locking & renaming on applier (must be used from within same connection) - Heartbeat now uses `time.RFC3339Nano` - Swap tables works! Caveat: short table outage - `--test-on-replica` works! - retries: using `panicAbort`: from any goroutine, regardless of context, it is possible to terminate the operation - Reintroduced changelog events listener on streamer. This is the correct implementation.	2016-04-18 10:57:18 -07:00
Shlomi Noach	a4ee80df13	- Building and applying queries from binlog event data! - `INSERT`, `DELETE`, `UPDATE` statements - support for `--noop` - initial support for `--test-on-replica`. Verifying against `--allow-on-master` - Changelog events no longer read from binlog stream, because reading it may be throttled, and we have to be able to keep reading the heartbeat and state events. They are now being read directly from table, mapping already-seen-events to avoid confusion Changlelog listener pools table in 2*frequency of heartbeat injection	2016-04-14 13:37:56 +02:00
Shlomi Noach	04525887f3	- Throttling-check is now an async routine running once per second - Throttling variables protected by mutex - Added `--throttle-additional-flag-file`: `operation pauses when this file exists; hint: keep default, use for throttling multiple gh-osc operations` - ColumnList is not a `struct` which contains ordinal mapping - More implicit write changelog + audit changelog - builder now builds `DELETE` and `INSERT` queries from data it will eventually get from DML event - Sanity check for binlog_row_image - Restarting replication to be sure binlog settings apply - Prepare for accepting `SIGHUP` (reloading configuration)	2016-04-11 17:27:16 +02:00
Shlomi Noach	a1a34b8150	ongoing development: - accepts --max-load - accepts multiple conditions in --max-load - throttle includes reason - chunk-size sanity check - change log state writes both in appending (history) mode and in replacing (current) mode - more atomic checks - inspecting ghost table columns, unique key - comparing unique keys between tables; sanity - intersecting columns between tables - prettify status - refactored throttle() and retries()	2016-04-08 14:35:06 +02:00
Shlomi Noach	75f68c0752	- row copy and row events are now handled by a single routine which prioritizes events over rowcopy - Supporting `--throttle-file-flag` - Printing status - Supporting transactional table syntax - code cleanup; refactoring - proper use of atomic where required - iterations are in changelog (erm... maybe too much) - `LOCK TABLES`, `UNLOCK TABLES` working	2016-04-08 10:34:44 +02:00
Shlomi Noach	0e7b23e6fe	- Creating an populating Changelog table - Using heartbeat - Throttling works based on heartbeat - Refactored binlog_reader stuff. Now streaming events (into golang channel, which makes for nice buffering and throttling) - Binlog table listeners work - More Migrator logic; existing logic for waiting on `state` events (e.g. `TablesCreatedState`)	2016-04-07 15:57:12 +02:00
Shlomi Noach	3583ab5dc5	beginning support for ranges and iteration. Still WIP	2016-04-05 09:14:22 +02:00
Shlomi Noach	ea0906f4e5	reading table (range) min/max values, right now according to hardcoded unique key	2016-04-04 18:19:46 +02:00
Shlomi Noach	cf87d16044	detecting master (includes sanity checks). Introducing Applier. Creating and altering ghost table	2016-04-04 15:29:02 +02:00
Shlomi Noach	c75cd998fb	a bunch of 'inspector' initial tests on the replica	2016-04-04 12:27:51 +02:00
Shlomi Noach	f5b276415a	initial work on context	2016-04-01 16:05:44 +02:00
Shlomi Noach	39ebc75c43	initial work on sql query building	2016-04-01 13:36:56 +02:00

1 2 3 4

185 Commits