Commit Graph

161 Commits

Author SHA1 Message Date
Shlomi Noach
ad47f7c147 throttling just prior to leaving hibernation, so as to allow re-throttle checks to apply 2017-05-24 10:42:47 +03:00
Shlomi Noach
3955a6d67f hibernate on critical-load 2017-05-24 08:32:13 +03:00
Shlomi Noach
aca288e88d -postpone-cut-over-flag-file implies touching indicated file 2017-05-07 14:58:18 +03:00
Shlomi Noach
eaa3489fe0 Merge branch 'master' into server-report-coordinates 2017-05-01 07:50:13 +03:00
Shlomi Noach
79cee99f57 support for 'coordinates' command 2017-04-28 15:50:51 -07:00
Shlomi Noach
8a0f1413eb dropped columns are not 'shared' and no data copy attempted for such columns 2017-04-23 08:38:35 +03:00
Shlomi Noach
23ce390d69 supporting throttle-http 2017-03-26 13:10:34 +03:00
Shlomi Noach
8db8e127eb supporting --timestamp-old-table 2017-02-21 10:34:24 -07:00
Shlomi Noach
b289041ccb Merge branch 'master' into fix-reappearing-throttled-reasons 2017-02-08 13:11:34 +02:00
Shlomi Noach
ed5dd7f049 Collecting and presenting MySQL versions of applier and inspector 2017-02-07 09:41:33 +02:00
Shlomi Noach
baee4f69f9 fixing phantom throttle-control-replicas lag result 2017-01-29 10:18:39 +02:00
Shlomi Noach
220bf83a5b Merge branch 'master' into batch-apply-dml-events 2017-01-04 08:37:08 +02:00
Shlomi Noach
445c67635d batching DML writes, configurable --dml-batch-size 2017-01-03 14:31:19 +02:00
Shlomi Noach
9177f7553f Merge branch 'master' into heartbeat-control-replicas-unified 2016-12-30 08:07:07 +02:00
Cyprian Nosek
2bfc672b03 Add possibility to set serverId as gh-ost argument
Signed-off-by: Cyprian Nosek <cyprian.nosek@pearson.com>
2016-12-28 11:49:41 +01:00
Shlomi Noach
c66356bd05 --replication-lag-query is deprecated. 2016-12-26 21:38:37 +02:00
Shlomi Noach
ba2a9d9e55 support for --master-user and --master-password 2016-12-13 16:09:34 +01:00
Shlomi Noach
7126b28169 support for --skip-foreign-key-checks 2016-11-21 09:18:40 +01:00
Shlomi Noach
7ab6af8f5f never throttling inside cut-over critical section 2016-11-17 17:22:13 +01:00
Shlomi Noach
034683f482 Merge branch 'master' into close-streamer-connection 2016-11-01 12:22:21 +01:00
Shlomi Noach
7fa5e405d4 avoid writing heartbeat when throttle commanded by user
when throttling on user command there really is no need for injecting heartbeat. The user commanded, therefore gh-ost complies and trusts the reasoning for throttling. What this will allow is complete quiet time. This, in turn, will allow such features as relocating via orchestrator/pseudo-gtid at time of throttling
2016-10-27 14:51:38 +02:00
Shlomi Noach
7b63b4a275 proper cleanup of streamer connection 2016-10-27 13:52:37 +02:00
Shlomi Noach
bf92eec214 validating table structure on applier and migrator
- reading column list on applier
- comparing original table on applier and migrator, expecting exact column list
- or else bailing out
2016-10-20 11:29:30 +02:00
Shlomi Noach
f988309800 no need for Inspector timezone; it is irrelevant 2016-10-13 13:08:23 +02:00
Shlomi Noach
184643157b Merge branch 'master' into tz-a-different-approach 2016-10-12 08:33:19 +02:00
Shlomi Noach
c11c611755 Merge pull request #260 from github/hooks-nil-check
safe access to applier/inspector hostnames for hooks
2016-10-12 08:16:50 +02:00
Shlomi Noach
4d903d0119 Merge pull request #264 from github/discard-foreign-keys
Discard foreign keys
2016-10-12 08:16:27 +02:00
Shlomi Noach
5d92da4a74 Merge branch 'master' into tz-a-different-approach 2016-10-11 17:17:42 +02:00
Shlomi Noach
dbf50afbc7 reading time_zone settings for Inspector and Applier separately.
--time-zone overrides both of them, if given
2016-10-11 16:00:26 +02:00
Shlomi Noach
15f4ddfd8a support for --critical-load-interval-millis
this optional flag gives --critical-load a second chance. When configured to positive value, meeting with critical-load spawns a timer. When this timer expires a second check for critical-load is made; if still met, gh-ost bails out.
By default the interval is zero, in which case gh-ost bails out immediately on meeting critical load.
2016-10-10 13:21:01 +02:00
Shlomi Noach
6750959e1a configurable time-zone, native time parsing 2016-10-10 11:39:57 +02:00
Shlomi Noach
1f65473e69 support for --discard-foreign-keys
This flag makes migration silently and happily discard any existing foreign keys on migrated table. This is useful for intentional dropping of foreign keys, as gh-ost does not otherwise have support for foreign key migration.
At some time in the future gh-ost may support foreign key migration, at which time this flag will be removed
2016-10-07 10:20:50 +02:00
Shlomi Noach
72f63d3042 safe access to applier/inspector hostnames for hooks 2016-10-04 21:18:44 +02:00
Shlomi Noach
e5e0444cc6 supporting --force-named-cut-over
- when given, user _must_ specify table name
  and of course table name must match migrated table
2016-09-12 19:17:36 +02:00
Shlomi Noach
1c6f828091 refactored server command into server.go
- added support for cut-over=<tablename>
- refactored more code into context
2016-09-12 12:38:14 +02:00
Shlomi Noach
88f2af8111 support for --assume-master-host, master-master/tungsten 2016-09-02 13:09:18 +02:00
randall
82110fcfcf Add -override-applier-host for use with -allow-master-master
for configurations where writes are meant to go to one master, but gh-ost can't automatically determine which
2016-09-01 20:29:26 -07:00
Shlomi Noach
b2c71931c6 refactored all throttling code into throttler.so 2016-08-30 12:25:45 +02:00
Shlomi Noach
23357d0643 WIP: decoupling general throttling from throttle logic 2016-08-30 11:32:17 +02:00
Shlomi Noach
75b2542f26 Merge branch 'master' into reduce-minimum-max-lag 2016-08-30 09:47:33 +02:00
Shlomi Noach
2afb86b9e4 support for millisecond throttling
- `--max-lag-millis` is at least `100ms`
- `--heartbeat-interval-millis` introduced; defaults `500ms`, can range `100ms` - `1s`
- Control replicas lag calculated asynchronously to throttle test
  - aggressive when `max-lag-millis < 1000` and when `replication-lag-query` is given
2016-08-30 09:41:59 +02:00
Jonah Berquist
10b222bc7b Reduce minimum maxLagMillisecondsThrottleThreshold to 100ms 2016-08-26 16:44:40 -07:00
Shlomi Noach
c70f405d06 Merge branch 'master' into hooks 2016-08-26 08:39:02 +02:00
Shlomi Noach
fad6743150 added hooks hint message 2016-08-25 13:54:42 +02:00
Shlomi Noach
cb1a7e2805 merged master 2016-08-25 12:32:03 +02:00
Shlomi Noach
1773f338c2 keeping track of delta rows
on concurrent count(*) this means we re-apply delta onto new estimate
2016-08-24 12:16:34 +02:00
Shlomi Noach
553f4c8d13 concurrent row-count 2016-08-24 11:39:44 +02:00
Shlomi Noach
56fd82a824 Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control
outstanding. Thank you!
2016-08-24 09:12:21 +02:00
Shlomi Noach
1c2a77ef95 hook names; added on-stop-replication hook 2016-08-23 11:35:48 +02:00
Shlomi Noach
6acbe7e3ae detecting and executing hooks 2016-08-20 08:24:20 +02:00
Paulo Bittencourt
2e43718ef3 Add --test-on-replica-skip-replica-stop flag 2016-08-19 17:34:08 -04:00
Paulo Bittencourt
a62f9e0754 Add --test-on-replica-manual-replication-control flag
This will wait indefinitely for the replication status to change.
This allows us to run test schema changes in RDS without needing
custom RDS commands in gh-ost.
2016-08-18 11:53:25 -04:00
Shlomi Noach
00369d7e5d relaxed config scanner mode
- does not fail on MySQL 'prompt' config
2016-08-18 13:58:38 +02:00
Shlomi Noach
ac0b788153 rename trust-rbr to assume-rbr 2016-08-15 11:05:51 +02:00
Shlomi Noach
1995be2b3f accepting --trust-rbr
- avoiding need to restart replication
- in turn avoiding need for SUPER
2016-08-12 14:26:58 +02:00
Damian Gryski
e02a49449e all: use time.Since() instead of time.Now().Sub
Patch created with:
    gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .
2016-08-02 08:38:56 -04:00
Shlomi Noach
46bbea2a32 ETA counting rows, fixed copy time on count 2016-07-29 10:40:23 +02:00
Shlomi Noach
be8a023350 nice-ratio is now float64 2016-07-28 14:37:17 +02:00
Shlomi Noach
b53ee24a1f dynamic replication-lag-query 2016-07-26 14:14:25 +02:00
Shlomi Noach
6dbf5c31a2 resolved conflict 2016-07-26 11:57:01 +02:00
Shlomi Noach
4774b67ffd config file supports environment variables 2016-07-25 15:46:37 +02:00
Shlomi Noach
74804559c8 supporting --initially-drop-socket-file
- by default gh-ost will not delete an existing socket file
  and thus, will fail running if socket file exists. This is the desired behavior.
- The flag --initially-drop-socket-file indicates we take responsibility and wish gh-ost to delete this file on startup
2016-07-22 17:34:18 +02:00
Shlomi Noach
ef59a866d8 Removed legacy 'safe cut-over'
Now that we have the atomic cut-over, the former is redundant
2016-07-16 08:12:19 -06:00
Shlomi Noach
8e46b4ceea max-lag-millis is dynamicly controllable 2016-07-13 09:44:00 +02:00
Shlomi Noach
8217536898 supporting --cut-over-lock-timeout-seconds 2016-07-08 10:14:58 +02:00
Shlomi Noach
c116d84acb added nice-ratio 2016-07-04 14:29:09 +02:00
Shlomi Noach
0191b2897d an atomic cut-over implementation, as per issue #82 2016-06-27 11:08:06 +02:00
Shlomi Noach
690e046c51 adding --allow-master-master 2016-06-22 10:38:13 +02:00
Shlomi Noach
96e8419a35 Solved cut-over stall; change of table names
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
  - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
  - "old" table name is now `_tbl_del`, "del" standing for "delete"
  - ghost table name is now `_tbl_gho`
  - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
  `--test-on-replica` uses `_tbl_ght` (ghost-test)
  - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
  - changelog table is now `_tbl_ghc` (ghost-changelog)
  - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
2016-06-21 12:56:01 +02:00
Shlomi Noach
80fcc05eb5 supporting interactive command throttle-control-replicas 2016-06-20 12:09:04 +02:00
Shlomi Noach
62b8a897e3 Retries, better visibility, documentation
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9 Throttling & critical load
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
94f311ec7b supporting --panic-flag-file; when it exists - app panics and exits without cleanup 2016-06-17 11:40:08 +02:00
Shlomi Noach
836d0fe119 Supporting column rename
- Parsing `alter` statement to catch `change old_name new_name ...` statements
- Auto deducing renamed columns
- When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns`
- updated tests
2016-06-17 08:03:18 +02:00
Shlomi Noach
7d0ec9c9dc added --migrate-on-replica flag; runs complete migration on replica 2016-06-15 12:18:59 +02:00
Shlomi Noach
97adbf1ff8 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:01:06 +02:00
Shlomi Noach
cb1c61ac47 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:00:56 +02:00
Shlomi Noach
e4ed801df5 noting posponing status 2016-06-13 18:36:29 +02:00
Shlomi Noach
b8c7e046a1 test-on-replica to invoke cut-over swap 2016-06-10 11:15:11 +02:00
Shlomi Noach
087d1dd64d suuporting dynamic reconfiguration of max-load 2016-06-09 11:25:01 +02:00
Shlomi Noach
a6c21dcdb0 - --postpone-swap-tables-flag-file renamed to --postpone-cut-over-flag-file
- More `README` documentation
- Added "throttle" documentation
2016-06-07 14:05:25 +02:00
Shlomi Noach
fc00cb2289 adding interactive user commands 2016-06-07 11:59:17 +02:00
Shlomi Noach
bbd19abc9a - requiring --cut-over argument to be two-step|voluntary-lock (will add udf-wait once it is ready)
The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks.
- More data in status hint
- `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`
2016-06-06 12:33:05 +02:00
Shlomi Noach
20f000833f support for marking point-of-interest in migration 2016-05-23 14:58:53 +02:00
Shlomi Noach
5375aa4f69 - Removed use of master_pos_wait(). It was unneccessary in the first place and introduced new problems.
- Supporting `--allow-nullable-unique-key`
  - Tool will bail out if chosen key has nullable columns and the above is not provided
- Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue)
- Introduced reasonable streamer reconnect sleep time
2016-05-20 12:52:14 +02:00
Shlomi Noach
df0a7513f5 - user/password provided in CLI override those in config file
- user no longer defaults to .
- config is now part of Context, and is protected by mutex
2016-05-17 15:35:44 +02:00
Shlomi Noach
879b2b425e - Support for --postpone-swap-tables-flag-file: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized
- Fixed version printing
- `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end
2016-05-17 14:40:37 +02:00
Shlomi Noach
9d055dbda7 renaming to gh-ost 2016-05-16 11:09:17 +02:00
Shlomi Noach
1e10f1f29e Solved various race conditions:
- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
2016-05-16 11:03:15 +02:00
Shlomi Noach
36905d82e3 - supporting --initially-drop-old-table
- supporting `--initially-drop-ghost-table`
- validating existence of `old` and `ghost` before beginning operation
2016-05-03 12:55:17 +03:00
Shlomi Noach
627e412b6b fixed password assignment 2016-05-03 11:56:53 +03:00
Shlomi Noach
86fd2b617a initial support for config file 2016-05-03 10:28:48 +03:00
Shlomi Noach
07063a4181 - added throttle-control-replicas flag, a list of control replicas
- when `--test-on-replica`, the tested replica is implicitly a control replica
- added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag
- throttling takes both the above into consideration
2016-05-01 21:36:36 +03:00
Shlomi Noach
421ab0fc83 woohoo, logic complete
- Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout
- If `RENAME` fails (due to the above), we throttle and retry
- `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id
- `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released.
- `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced
- `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.
2016-04-22 19:46:34 -07:00
Shlomi Noach
1ed1b0d156 - quick-and-bumpy-swap-tables uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames)
- refactored lock-and-swap code, in preparation for atomic swap
2016-04-22 13:41:20 -07:00
Shlomi Noach
54c6d059b5 - quick-and-bumpy-swap-tables uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames)
- refactored lock-and-swap code, in preparation for atomic swap
2016-04-22 13:18:56 -07:00
Shlomi Noach
3c85298b77 - Better, fewer NOOP checks around the code
- Keeping track of `TotalDMLEventsApplied`
2016-04-19 04:25:32 -07:00
Shlomi Noach
4efbfd6e0f Merge pull request #19 from github/oops-leftovers
oops, leftover file
2016-04-18 10:59:58 -07:00
Shlomi Noach
9dce88e6c0 oops, leftover file 2016-04-18 10:59:34 -07:00
Shlomi Noach
eeffa701d6 - Added ok-to-drop-table flag
- Added `switch-to-rbr` flag; applying binlog format change if needed
- Using dedicated db instance for locking & renaming on applier (must be used from within same connection)
- Heartbeat now uses `time.RFC3339Nano`
- Swap tables works! Caveat: short table outage
- `--test-on-replica` works!
- retries: using `panicAbort`: from any goroutine, regardless of context, it is possible to terminate the operation
- Reintroduced changelog events listener on streamer. This is the correct implementation.
2016-04-18 10:57:18 -07:00