Commit Graph

192 Commits

Author SHA1 Message Date
Shlomi Noach
f11f6f978f mitigating cut-over/write race condition 2017-02-24 14:48:10 -07:00
Shlomi Noach
289ce46a2b Merge branch 'master' into fix-infinite-cutover-loop 2017-02-16 14:16:32 +02:00
Shlomi Noach
6ecf7cc0ee Auto-merged master into interactive-command-question on deployment 2017-02-08 13:03:45 +02:00
Shlomi Noach
10edf3c063 Migration only starting after first replication lag metric collected 2017-02-07 12:13:19 +02:00
Shlomi Noach
6fa32f6d54 cut-over failure on test-on-replica starts replication again 2017-02-07 09:31:52 +02:00
Shlomi Noach
be1ab175c7 status presents with '# throttle-control-replicas count:' 2017-01-29 09:56:25 +02:00
Shlomi Noach
f7d2beb4d2 handling a non-DML event at the end of a dml-event sequence 2017-01-05 08:13:51 +02:00
Shlomi Noach
baaa255182 bailing out from onApplyEventStruct() 2017-01-04 12:42:21 +02:00
Shlomi Noach
645af21d03 extracted onApplyEventStruct() 2017-01-04 12:39:57 +02:00
Shlomi Noach
220bf83a5b Merge branch 'master' into batch-apply-dml-events 2017-01-04 08:37:08 +02:00
Shlomi Noach
445c67635d batching DML writes, configurable --dml-batch-size 2017-01-03 14:31:19 +02:00
Shlomi Noach
c66356bd05 --replication-lag-query is deprecated. 2016-12-26 21:38:37 +02:00
Shlomi Noach
ba2a9d9e55 support for --master-user and --master-password 2016-12-13 16:09:34 +01:00
Shlomi Noach
5119ea4d31 added tests to verify no false positives rename-column found 2016-11-29 11:08:35 +01:00
Shlomi Noach
7ab6af8f5f never throttling inside cut-over critical section 2016-11-17 17:22:13 +01:00
Shlomi Noach
b00cae11fa retry cut-over 2016-11-17 17:10:17 +01:00
Shlomi Noach
8d987b5aaf extracted parsing of ChangelogState 2016-11-17 15:56:59 +01:00
Shlomi Noach
ef874b8551 AllEventsUpToLockProcessed uses unique signature 2016-11-17 15:50:54 +01:00
Shlomi Noach
ee447ad560 waitForEventsUpToLock timeout
more info on AllEventsUpToLockProcessed, before and after injecting/intercepting
2016-11-17 15:20:44 +01:00
Shlomi Noach
88ffb75b8c reading and reporting replication lag before waiting on initial replication event 2016-11-02 12:48:35 +01:00
Shlomi Noach
034683f482 Merge branch 'master' into close-streamer-connection 2016-11-01 12:22:21 +01:00
Shlomi Noach
7fa5e405d4 avoid writing heartbeat when throttle commanded by user
when throttling on user command there really is no need for injecting heartbeat. The user commanded, therefore gh-ost complies and trusts the reasoning for throttling. What this will allow is complete quiet time. This, in turn, will allow such features as relocating via orchestrator/pseudo-gtid at time of throttling
2016-10-27 14:51:38 +02:00
Shlomi Noach
7b63b4a275 proper cleanup of streamer connection 2016-10-27 13:52:37 +02:00
Shlomi Noach
bf92eec214 validating table structure on applier and migrator
- reading column list on applier
- comparing original table on applier and migrator, expecting exact column list
- or else bailing out
2016-10-20 11:29:30 +02:00
Shlomi Noach
c1a6773c02 better handling of --assume-master-host
separated logic and not even attempting to crawl topology
2016-10-11 16:42:19 +02:00
Shlomi Noach
ef04fa49f5 assume-master-host now applied ImpliedKey 2016-10-06 12:00:34 +02:00
Shlomi Noach
a7627091a7 Merge branch 'master' into named-cut-over 2016-09-13 05:25:16 -07:00
Shlomi Noach
0a8be1dd22 excplicitly breaking on NoPrintStatusRule 2016-09-12 17:39:56 +02:00
Shlomi Noach
1c6f828091 refactored server command into server.go
- added support for cut-over=<tablename>
- refactored more code into context
2016-09-12 12:38:14 +02:00
Shlomi Noach
16fc19b157 rowcount progress at 100% when row-copy completes 2016-09-12 10:25:55 +02:00
Shlomi Noach
88f2af8111 support for --assume-master-host, master-master/tungsten 2016-09-02 13:09:18 +02:00
Shlomi Noach
96f108d3b4 Merge pull request #221 from twotwotwo/override-applier-host
Add -override-applier-host for use with -allow-master-master
2016-09-02 11:32:04 +02:00
Shlomi Noach
75d225353f Merge pull request #220 from Wattpad/exit-on-hook-replication-stop-failure
Fail operation if onStopReplication hook fails
2016-09-02 09:39:43 +02:00
randall
82110fcfcf Add -override-applier-host for use with -allow-master-master
for configurations where writes are meant to go to one master, but gh-ost can't automatically determine which
2016-09-01 20:29:26 -07:00
Paulo Bittencourt
e3662f2398 Fail operation if onStopReplication hook fails 2016-09-01 15:58:20 -04:00
Shlomi Noach
c562df42cd status: State and ETA decoupling 2016-09-01 10:51:40 +02:00
Shlomi Noach
904215e286 Merge pull request #204 from github/reduce-minimum-max-lag
Reduce minimum maxLagMillisecondsThrottleThreshold to 100ms
2016-08-31 09:29:16 +02:00
Shlomi Noach
aef56c55f7 indicating 100% when rowcopy is complete 2016-08-30 17:02:29 +02:00
Shlomi Noach
b2c71931c6 refactored all throttling code into throttler.so 2016-08-30 12:25:45 +02:00
Shlomi Noach
23357d0643 WIP: decoupling general throttling from throttle logic 2016-08-30 11:32:17 +02:00
Shlomi Noach
75b2542f26 Merge branch 'master' into reduce-minimum-max-lag 2016-08-30 09:47:33 +02:00
Shlomi Noach
2afb86b9e4 support for millisecond throttling
- `--max-lag-millis` is at least `100ms`
- `--heartbeat-interval-millis` introduced; defaults `500ms`, can range `100ms` - `1s`
- Control replicas lag calculated asynchronously to throttle test
  - aggressive when `max-lag-millis < 1000` and when `replication-lag-query` is given
2016-08-30 09:41:59 +02:00
Shlomi Noach
6dfa4873c2 removed excessive argument 2016-08-29 10:44:43 +02:00
Shlomi Noach
6e5db089c8 supporting onRowCountComplete hook 2016-08-29 09:58:31 +02:00
Shlomi Noach
c70f405d06 Merge branch 'master' into hooks 2016-08-26 08:39:02 +02:00
Shlomi Noach
2b595b15f2 Merge pull request #196 from github/concurrent-rowcount
concurrent row-count
2016-08-26 08:29:01 +02:00
Shlomi Noach
b064174ab4 added elapsedSeconds to status hook 2016-08-25 13:54:21 +02:00
Shlomi Noach
cb1a7e2805 merged master 2016-08-25 12:32:03 +02:00
Shlomi Noach
c7d88499af Merge branch 'master' into row-copy-complete 2016-08-25 10:15:32 +02:00
Shlomi Noach
1773f338c2 keeping track of delta rows
on concurrent count(*) this means we re-apply delta onto new estimate
2016-08-24 12:16:34 +02:00
Shlomi Noach
553f4c8d13 concurrent row-count 2016-08-24 11:39:44 +02:00
Shlomi Noach
56fd82a824 Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control
outstanding. Thank you!
2016-08-24 09:12:21 +02:00
Paulo Bittencourt
6b21ade6d0 Check for --test-on-replica-skip-replica-stop in cutOver method 2016-08-23 18:34:10 -04:00
Shlomi Noach
56d09c4105 avoiding writing rows when rowcopy complete 2016-08-23 14:26:47 +02:00
Shlomi Noach
a1e191078a rename 'about'->'before' 2016-08-23 11:40:32 +02:00
Shlomi Noach
1c2a77ef95 hook names; added on-stop-replication hook 2016-08-23 11:35:48 +02:00
Shlomi Noach
1021a83ac0 Merge pull request #189 from dveeden/feedback_on_wait
thank you
2016-08-23 09:50:50 +02:00
Daniël van Eeden
d8cfd49e2c Message about waiting should be INFO not DEBUG 2016-08-23 09:41:07 +02:00
Shlomi Noach
972728cf40 added onStatus hook 2016-08-22 16:24:41 +02:00
Shlomi Noach
1376f0af23 fixed UPDATE dml on renamed column 2016-08-22 08:49:27 +02:00
Shlomi Noach
6acbe7e3ae detecting and executing hooks 2016-08-20 08:24:20 +02:00
Shlomi Noach
cdf393a30e initial support for hooks 2016-08-19 14:52:49 +02:00
Shlomi Noach
d8e30fcd85 fixed sup printing heuristic 2016-08-19 09:41:25 +02:00
Shlomi Noach
9752179723 interactive command: sup 2016-08-19 09:16:17 +02:00
Shlomi Noach
e6a02d81e0 Merge pull request #170 from github/nice-ratio-doc-clarification
clarifying meaning of sleep-ratio
2016-08-19 08:27:22 +02:00
Shlomi Noach
7e9f578e12 progress is 100% when 0/0 rows copied 2016-08-18 13:20:09 +02:00
Shlomi Noach
5dbd2e1c85 clarifying meaning of sleep-ratio 2016-08-18 13:13:51 +02:00
Shlomi Noach
8bf07c506f Merge pull request #147 from github/cleanup-socket-file
Cleanup socket file
2016-08-12 11:26:37 +02:00
Shlomi Noach
a46022f727 localized function name 2016-08-11 17:37:50 +02:00
Shlomi Noach
66ff5964ed relaxed check for log_slave_updates 2016-08-11 14:49:14 +02:00
Shlomi Noach
dd1ef29dac cleaning up socket file 2016-08-11 09:01:14 +02:00
Damian Gryski
e02a49449e all: use time.Since() instead of time.Now().Sub
Patch created with:
    gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .
2016-08-02 08:38:56 -04:00
Shlomi Noach
46bbea2a32 ETA counting rows, fixed copy time on count 2016-07-29 10:40:23 +02:00
Shlomi Noach
25ce8b0758 status hint parameters using normalized names 2016-07-29 09:20:00 +02:00
Shlomi Noach
edacb8f959 Merge pull request #116 from github/nice-ratio-float
nice-ratio is now float64
2016-07-29 07:16:33 +02:00
Shlomi Noach
be8a023350 nice-ratio is now float64 2016-07-28 14:37:17 +02:00
Shlomi Noach
b99ce969c7 serving socket before counting table rows 2016-07-28 13:01:26 +02:00
Shlomi Noach
b548a6a172 adding human friendly hint re: throttling and binary logs 2016-07-27 10:45:22 +02:00
Shlomi Noach
dbcc0e09c7 status hint shows [set] next to existing flag files 2016-07-27 10:36:24 +02:00
Shlomi Noach
e900dae2e9 More informative information upon control-replicas lagging 2016-07-27 09:59:46 +02:00
Shlomi Noach
b53ee24a1f dynamic replication-lag-query 2016-07-26 14:14:25 +02:00
Shlomi Noach
5d23b72955 Merge pull request #107 from github/throttle-control-replicas
fix to throttle-control-replicas check
2016-07-26 12:13:22 +02:00
Shlomi Noach
7a70c24503 replica-migration cleanup; updating allEventsUpToLockProcessedInjectedFlag 2016-07-26 12:06:20 +02:00
Shlomi Noach
034ea7646a fix to throttle-control-replicas check 2016-07-26 11:51:24 +02:00
Shlomi Noach
ef59a866d8 Removed legacy 'safe cut-over'
Now that we have the atomic cut-over, the former is redundant
2016-07-16 08:12:19 -06:00
Shlomi Noach
8e46b4ceea max-lag-millis is dynamicly controllable 2016-07-13 09:44:00 +02:00
Shlomi Noach
c116d84acb added nice-ratio 2016-07-04 14:29:09 +02:00
Shlomi Noach
37e3c94c87 supporting 'unpostpone' command 2016-07-01 10:59:09 +02:00
Shlomi Noach
0191b2897d an atomic cut-over implementation, as per issue #82 2016-06-27 11:08:06 +02:00
Shlomi Noach
4f299f320e noop more verbose 2016-06-27 08:49:26 +02:00
Shlomi Noach
e0de69b028 a noop operation dumps SHOW CREATE TABLE 2016-06-22 12:39:13 +02:00
Shlomi Noach
5b20122957 on noop operation, drop ghost table at end 2016-06-22 10:48:17 +02:00
Shlomi Noach
96e8419a35 Solved cut-over stall; change of table names
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
  - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
  - "old" table name is now `_tbl_del`, "del" standing for "delete"
  - ghost table name is now `_tbl_gho`
  - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
  `--test-on-replica` uses `_tbl_ght` (ghost-test)
  - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
  - changelog table is now `_tbl_ghc` (ghost-changelog)
  - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
2016-06-21 12:56:01 +02:00
Shlomi Noach
cd6b3c5e9e not throttling during cut-over operation 2016-06-21 09:21:58 +02:00
Shlomi Noach
80fcc05eb5 supporting interactive command throttle-control-replicas 2016-06-20 12:09:04 +02:00
Shlomi Noach
f0b012b238 support for 'panic' interactive command 2016-06-20 06:38:29 +02:00
Shlomi Noach
62b8a897e3 Retries, better visibility, documentation
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9 Throttling & critical load
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
d38ff68a15 minor formatting 2016-06-17 11:41:10 +02:00
Shlomi Noach
94f311ec7b supporting --panic-flag-file; when it exists - app panics and exits without cleanup 2016-06-17 11:40:08 +02:00