Shlomi Noach
bf92eec214
validating table structure on applier and migrator
...
- reading column list on applier
- comparing original table on applier and migrator, expecting exact column list
- or else bailing out
2016-10-20 11:29:30 +02:00
Shlomi Noach
c1a6773c02
better handling of --assume-master-host
...
separated logic and not even attempting to crawl topology
2016-10-11 16:42:19 +02:00
Shlomi Noach
ef04fa49f5
assume-master-host now applied ImpliedKey
2016-10-06 12:00:34 +02:00
Shlomi Noach
a7627091a7
Merge branch 'master' into named-cut-over
2016-09-13 05:25:16 -07:00
Shlomi Noach
0a8be1dd22
excplicitly breaking on NoPrintStatusRule
2016-09-12 17:39:56 +02:00
Shlomi Noach
1c6f828091
refactored server command into server.go
...
- added support for cut-over=<tablename>
- refactored more code into context
2016-09-12 12:38:14 +02:00
Shlomi Noach
16fc19b157
rowcount progress at 100% when row-copy completes
2016-09-12 10:25:55 +02:00
Shlomi Noach
88f2af8111
support for --assume-master-host, master-master/tungsten
2016-09-02 13:09:18 +02:00
Shlomi Noach
96f108d3b4
Merge pull request #221 from twotwotwo/override-applier-host
...
Add -override-applier-host for use with -allow-master-master
2016-09-02 11:32:04 +02:00
Shlomi Noach
75d225353f
Merge pull request #220 from Wattpad/exit-on-hook-replication-stop-failure
...
Fail operation if onStopReplication hook fails
2016-09-02 09:39:43 +02:00
randall
82110fcfcf
Add -override-applier-host for use with -allow-master-master
...
for configurations where writes are meant to go to one master, but gh-ost can't automatically determine which
2016-09-01 20:29:26 -07:00
Paulo Bittencourt
e3662f2398
Fail operation if onStopReplication hook fails
2016-09-01 15:58:20 -04:00
Shlomi Noach
c562df42cd
status: State and ETA decoupling
2016-09-01 10:51:40 +02:00
Shlomi Noach
904215e286
Merge pull request #204 from github/reduce-minimum-max-lag
...
Reduce minimum maxLagMillisecondsThrottleThreshold to 100ms
2016-08-31 09:29:16 +02:00
Shlomi Noach
aef56c55f7
indicating 100% when rowcopy is complete
2016-08-30 17:02:29 +02:00
Shlomi Noach
b2c71931c6
refactored all throttling code into throttler.so
2016-08-30 12:25:45 +02:00
Shlomi Noach
23357d0643
WIP: decoupling general throttling from throttle logic
2016-08-30 11:32:17 +02:00
Shlomi Noach
75b2542f26
Merge branch 'master' into reduce-minimum-max-lag
2016-08-30 09:47:33 +02:00
Shlomi Noach
2afb86b9e4
support for millisecond throttling
...
- `--max-lag-millis` is at least `100ms`
- `--heartbeat-interval-millis` introduced; defaults `500ms`, can range `100ms` - `1s`
- Control replicas lag calculated asynchronously to throttle test
- aggressive when `max-lag-millis < 1000` and when `replication-lag-query` is given
2016-08-30 09:41:59 +02:00
Shlomi Noach
6dfa4873c2
removed excessive argument
2016-08-29 10:44:43 +02:00
Shlomi Noach
6e5db089c8
supporting onRowCountComplete hook
2016-08-29 09:58:31 +02:00
Shlomi Noach
c70f405d06
Merge branch 'master' into hooks
2016-08-26 08:39:02 +02:00
Shlomi Noach
2b595b15f2
Merge pull request #196 from github/concurrent-rowcount
...
concurrent row-count
2016-08-26 08:29:01 +02:00
Shlomi Noach
b064174ab4
added elapsedSeconds to status hook
2016-08-25 13:54:21 +02:00
Shlomi Noach
cb1a7e2805
merged master
2016-08-25 12:32:03 +02:00
Shlomi Noach
c7d88499af
Merge branch 'master' into row-copy-complete
2016-08-25 10:15:32 +02:00
Shlomi Noach
1773f338c2
keeping track of delta rows
...
on concurrent count(*) this means we re-apply delta onto new estimate
2016-08-24 12:16:34 +02:00
Shlomi Noach
553f4c8d13
concurrent row-count
2016-08-24 11:39:44 +02:00
Shlomi Noach
56fd82a824
Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control
...
outstanding. Thank you!
2016-08-24 09:12:21 +02:00
Paulo Bittencourt
6b21ade6d0
Check for --test-on-replica-skip-replica-stop in cutOver method
2016-08-23 18:34:10 -04:00
Shlomi Noach
56d09c4105
avoiding writing rows when rowcopy complete
2016-08-23 14:26:47 +02:00
Shlomi Noach
a1e191078a
rename 'about'->'before'
2016-08-23 11:40:32 +02:00
Shlomi Noach
1c2a77ef95
hook names; added on-stop-replication hook
2016-08-23 11:35:48 +02:00
Shlomi Noach
1021a83ac0
Merge pull request #189 from dveeden/feedback_on_wait
...
thank you
2016-08-23 09:50:50 +02:00
Daniël van Eeden
d8cfd49e2c
Message about waiting should be INFO not DEBUG
2016-08-23 09:41:07 +02:00
Shlomi Noach
972728cf40
added onStatus hook
2016-08-22 16:24:41 +02:00
Shlomi Noach
1376f0af23
fixed UPDATE dml on renamed column
2016-08-22 08:49:27 +02:00
Shlomi Noach
6acbe7e3ae
detecting and executing hooks
2016-08-20 08:24:20 +02:00
Shlomi Noach
cdf393a30e
initial support for hooks
2016-08-19 14:52:49 +02:00
Shlomi Noach
d8e30fcd85
fixed sup printing heuristic
2016-08-19 09:41:25 +02:00
Shlomi Noach
9752179723
interactive command: sup
2016-08-19 09:16:17 +02:00
Shlomi Noach
e6a02d81e0
Merge pull request #170 from github/nice-ratio-doc-clarification
...
clarifying meaning of sleep-ratio
2016-08-19 08:27:22 +02:00
Shlomi Noach
7e9f578e12
progress is 100% when 0/0 rows copied
2016-08-18 13:20:09 +02:00
Shlomi Noach
5dbd2e1c85
clarifying meaning of sleep-ratio
2016-08-18 13:13:51 +02:00
Shlomi Noach
8bf07c506f
Merge pull request #147 from github/cleanup-socket-file
...
Cleanup socket file
2016-08-12 11:26:37 +02:00
Shlomi Noach
a46022f727
localized function name
2016-08-11 17:37:50 +02:00
Shlomi Noach
66ff5964ed
relaxed check for log_slave_updates
2016-08-11 14:49:14 +02:00
Shlomi Noach
dd1ef29dac
cleaning up socket file
2016-08-11 09:01:14 +02:00
Damian Gryski
e02a49449e
all: use time.Since() instead of time.Now().Sub
...
Patch created with:
gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .
2016-08-02 08:38:56 -04:00
Shlomi Noach
46bbea2a32
ETA counting rows, fixed copy time on count
2016-07-29 10:40:23 +02:00
Shlomi Noach
25ce8b0758
status hint parameters using normalized names
2016-07-29 09:20:00 +02:00
Shlomi Noach
edacb8f959
Merge pull request #116 from github/nice-ratio-float
...
nice-ratio is now float64
2016-07-29 07:16:33 +02:00
Shlomi Noach
be8a023350
nice-ratio is now float64
2016-07-28 14:37:17 +02:00
Shlomi Noach
b99ce969c7
serving socket before counting table rows
2016-07-28 13:01:26 +02:00
Shlomi Noach
b548a6a172
adding human friendly hint re: throttling and binary logs
2016-07-27 10:45:22 +02:00
Shlomi Noach
dbcc0e09c7
status hint shows [set] next to existing flag files
2016-07-27 10:36:24 +02:00
Shlomi Noach
e900dae2e9
More informative information upon control-replicas lagging
2016-07-27 09:59:46 +02:00
Shlomi Noach
b53ee24a1f
dynamic replication-lag-query
2016-07-26 14:14:25 +02:00
Shlomi Noach
5d23b72955
Merge pull request #107 from github/throttle-control-replicas
...
fix to throttle-control-replicas check
2016-07-26 12:13:22 +02:00
Shlomi Noach
7a70c24503
replica-migration cleanup; updating allEventsUpToLockProcessedInjectedFlag
2016-07-26 12:06:20 +02:00
Shlomi Noach
034ea7646a
fix to throttle-control-replicas check
2016-07-26 11:51:24 +02:00
Shlomi Noach
ef59a866d8
Removed legacy 'safe cut-over'
...
Now that we have the atomic cut-over, the former is redundant
2016-07-16 08:12:19 -06:00
Shlomi Noach
8e46b4ceea
max-lag-millis is dynamicly controllable
2016-07-13 09:44:00 +02:00
Shlomi Noach
c116d84acb
added nice-ratio
2016-07-04 14:29:09 +02:00
Shlomi Noach
37e3c94c87
supporting 'unpostpone' command
2016-07-01 10:59:09 +02:00
Shlomi Noach
0191b2897d
an atomic cut-over implementation, as per issue #82
2016-06-27 11:08:06 +02:00
Shlomi Noach
4f299f320e
noop more verbose
2016-06-27 08:49:26 +02:00
Shlomi Noach
e0de69b028
a noop operation dumps SHOW CREATE TABLE
2016-06-22 12:39:13 +02:00
Shlomi Noach
5b20122957
on noop operation, drop ghost table at end
2016-06-22 10:48:17 +02:00
Shlomi Noach
96e8419a35
Solved cut-over stall; change of table names
...
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
- Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
- "old" table name is now `_tbl_del`, "del" standing for "delete"
- ghost table name is now `_tbl_gho`
- when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
`--test-on-replica` uses `_tbl_ght` (ghost-test)
- similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
- changelog table is now `_tbl_ghc` (ghost-changelog)
- To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
2016-06-21 12:56:01 +02:00
Shlomi Noach
cd6b3c5e9e
not throttling during cut-over operation
2016-06-21 09:21:58 +02:00
Shlomi Noach
80fcc05eb5
supporting interactive command throttle-control-replicas
2016-06-20 12:09:04 +02:00
Shlomi Noach
f0b012b238
support for 'panic' interactive command
2016-06-20 06:38:29 +02:00
Shlomi Noach
62b8a897e3
Retries, better visibility, documentation
...
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9
Throttling & critical load
...
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
d38ff68a15
minor formatting
2016-06-17 11:41:10 +02:00
Shlomi Noach
94f311ec7b
supporting --panic-flag-file
; when it exists - app panics and exits without cleanup
2016-06-17 11:40:08 +02:00
Shlomi Noach
836d0fe119
Supporting column rename
...
- Parsing `alter` statement to catch `change old_name new_name ...` statements
- Auto deducing renamed columns
- When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns`
- updated tests
2016-06-17 08:03:18 +02:00
Shlomi Noach
7d0ec9c9dc
added --migrate-on-replica flag; runs complete migration on replica
2016-06-15 12:18:59 +02:00
Shlomi Noach
85d6883e69
printing migration status on waitForEventsUpToLock()
2016-06-15 10:13:06 +02:00
Shlomi Noach
cb1c61ac47
- --cut-over
no longer mandatory; default to safe
...
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:00:56 +02:00
Shlomi Noach
8292f5608f
Safe cut-over
...
- Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption)
- Self-rollsback in case of failure (restored original table)
2016-06-14 08:35:07 +02:00
Shlomi Noach
e4ed801df5
noting posponing status
2016-06-13 18:36:29 +02:00
Shlomi Noach
b8c7e046a1
test-on-replica to invoke cut-over swap
2016-06-10 11:15:11 +02:00
Shlomi Noach
087d1dd64d
suuporting dynamic reconfiguration of max-load
2016-06-09 11:25:01 +02:00
Shlomi Noach
a6c21dcdb0
- --postpone-swap-tables-flag-file
renamed to --postpone-cut-over-flag-file
...
- More `README` documentation
- Added "throttle" documentation
2016-06-07 14:05:25 +02:00
Shlomi Noach
fc00cb2289
adding interactive user commands
2016-06-07 11:59:17 +02:00
Shlomi Noach
bbd19abc9a
- requiring --cut-over
argument to be two-step|voluntary-lock
(will add udf-wait
once it is ready)
...
The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks.
- More data in status hint
- `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`
2016-06-06 12:33:05 +02:00
Shlomi Noach
42ae3e37f5
dropping _osc (changelog) table at end of operation; also better status hint at end of operation
2016-06-01 10:40:49 +02:00
Shlomi Noach
2df94f9c51
printing courtesy reminder once per 10 minutes
2016-05-31 21:12:39 +02:00
Shlomi Noach
9519a66825
added courtesy-reminder
2016-05-26 14:25:32 +02:00
Shlomi Noach
583d6d3147
accepting SIGHUP. Reloads configuration and marks as point of interest
2016-05-25 12:27:58 +02:00
Shlomi Noach
20f000833f
support for marking point-of-interest in migration
2016-05-23 14:58:53 +02:00
Shlomi Noach
9b54d0208f
- Handling gomysql.replication connection timeouts: reconnecting on last known position
...
- `printStatus()` takes ETA into account
- More info around `master_pos_wait()`
2016-05-19 15:11:36 +02:00
Shlomi Noach
879b2b425e
- Support for --postpone-swap-tables-flag-file
: while this file exists, final table swap does not take place, and the ghost table keeps being synchronized
...
- Fixed version printing
- `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end
2016-05-17 14:40:37 +02:00
Shlomi Noach
9d055dbda7
renaming to gh-ost
2016-05-16 11:09:17 +02:00
Shlomi Noach
1e10f1f29e
Solved various race conditions:
...
- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
2016-05-16 11:03:15 +02:00
Shlomi Noach
134bf385fd
initial, simple solution to our-of-order applying of DML events
2016-05-05 17:14:55 +03:00
Shlomi Noach
6528010742
Adding ETA starting at 2% progress
2016-05-05 09:18:19 +03:00
Shlomi Noach
74d8b06db1
exact-rowcount implices updating number of rows as we make progress
2016-05-04 08:23:34 +03:00