Commit Graph

138 Commits

Author SHA1 Message Date
Nikhil Mathew
84bdfdb1ad Cache DB connection pools on the migrationContext where applicable 2017-09-22 16:06:06 -07:00
Nikhil Mathew
5788ab5347 CR Revisions: Make concurrent operations safe 2017-09-22 15:17:58 -07:00
Zach Moazeni
df27c5b76f Allow gh-ost to modify the server using extra port
Both Percona and Maria allow MySQL to be configured to listen on an extra port when their thread pool is enable.

* https://www.percona.com/doc/percona-server/5.7/performance/threadpool.html
* https://mariadb.com/kb/en/the-mariadb-library/thread-pool-in-mariadb-51-53/

This is valuable because if the table has a lot of traffic (read or write load), gh-ost can end up starving the thread pool as incomming connections are immediately blocked.

By using gh-ost on the extra port, MySQL locking will still behave the same, but MySQL will keep a dedicated thread for each gh-ost connection.

When doing this, it's important to inspect the extra-max-connections variable. Both Percona and Maria default to 1, so gh-ost may easily exceed with its threads.

An example local run using this

```
$ mysql -S /tmp/mysql_sandbox20393.sock -e "select @@global.port, @@global.extra_port"
+---------------+---------------------+
| @@global.port | @@global.extra_port |
+---------------+---------------------+
|         20393 |               30393 |
+---------------+---------------------+

./bin/gh-ost \
--initially-drop-ghost-table \
--initially-drop-old-table \
--assume-rbr \
--port="20395" \
--assume-master-host="127.0.0.1:30393" \
--max-load=Threads_running=25 \
--critical-load=Threads_running=1000 \
--chunk-size=1000 \
--max-lag-millis=1500 \
--user="gh-ost" \
--password="gh-ost" \
--database="test" \
--table="mytable" \
--verbose \
--alter="ADD mynewcol decimal(11,2) DEFAULT 0.0 NOT NULL" \
--exact-rowcount \
--concurrent-rowcount \
--default-retries=120 \
--panic-flag-file=/tmp/ghost.panic.flag \
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag \
--execute
```
2017-09-20 16:05:20 -04:00
Shlomi Noach
a2847015d6 Merge branch 'master' into nm-refactor-migration-context 2017-09-10 08:08:39 +03:00
Shlomi Noach
e2171e0162 Validating password length 2017-09-03 10:27:04 +03:00
Nikhil Mathew
0a7e713e9f Ensure cleanup happens, even on error 2017-08-28 15:53:47 -07:00
Nikhil Mathew
3b21f4db37 Close remaining DB connections 2017-08-28 14:05:15 -07:00
Shlomi Noach
79e4f1cdbe detect range end based on OFFSET 2017-08-21 08:12:41 +03:00
Nikhil Mathew
a1473dfb58 Stop infinite goroutines 2017-08-08 15:31:25 -07:00
Nikhil Mathew
982b8eede9 Refactor global migration context 2017-08-08 13:36:54 -07:00
Shlomi Noach
3955a6d67f hibernate on critical-load 2017-05-24 08:32:13 +03:00
Shlomi Noach
7517238bd3 Merge branch 'master' into fix-infinite-cutover-loop 2017-03-06 21:50:26 +02:00
Shlomi Noach
06c909bd10 Validating table name length 2017-02-21 17:34:49 -07:00
Shlomi Noach
289ce46a2b Merge branch 'master' into fix-infinite-cutover-loop 2017-02-16 14:16:32 +02:00
Shlomi Noach
ed5dd7f049 Collecting and presenting MySQL versions of applier and inspector 2017-02-07 09:41:33 +02:00
Shlomi Noach
6fa32f6d54 cut-over failure on test-on-replica starts replication again 2017-02-07 09:31:52 +02:00
Shlomi Noach
8d0faa55e3 explicit rollback in ApplyDMLEventQueries() 2017-01-04 08:44:04 +02:00
Shlomi Noach
445c67635d batching DML writes, configurable --dml-batch-size 2017-01-03 14:31:19 +02:00
Shlomi Noach
3e28f462d8 Initial support for batching multiple DMLs when writing to ghost table 2017-01-03 13:44:52 +02:00
rj03hou
ffbd35e180 fix TableEngine correlates to the 3rd placeholder in the template string, not the 1st 2016-12-02 11:56:29 +08:00
Shlomi Noach
9b068ec222 Merge branch 'master' into myisam-gtid 2016-12-01 09:43:38 +01:00
rj03hou
a11bec1785 If the original table is MyISAM and the default engine is Innodb, and the gtid mode is on, there will be error when execute 'LOCK TABLES tbl WRITE, tbl_magic WRITE'. If make the magic cut-over table's engine same with the original table, there will not have this problem. 2016-12-01 16:04:04 +08:00
Shlomi Noach
b00cae11fa retry cut-over 2016-11-17 17:10:17 +01:00
Shlomi Noach
7fa5e405d4 avoid writing heartbeat when throttle commanded by user
when throttling on user command there really is no need for injecting heartbeat. The user commanded, therefore gh-ost complies and trusts the reasoning for throttling. What this will allow is complete quiet time. This, in turn, will allow such features as relocating via orchestrator/pseudo-gtid at time of throttling
2016-10-27 14:51:38 +02:00
Shlomi Noach
ac6159791d merged master, resolved conflicts 2016-10-26 09:57:59 +02:00
Shlomi Noach
bf92eec214 validating table structure on applier and migrator
- reading column list on applier
- comparing original table on applier and migrator, expecting exact column list
- or else bailing out
2016-10-20 11:29:30 +02:00
Shlomi Noach
25166e33c7 solving the enum-as-part-of-pk bug 2016-10-19 15:22:29 +02:00
Shlomi Noach
f9c15127cd simplified applier read of timezone 2016-10-14 12:56:43 +02:00
Shlomi Noach
ac304def4d applier always uses UTC 2016-10-13 13:08:02 +02:00
Shlomi Noach
dbf50afbc7 reading time_zone settings for Inspector and Applier separately.
--time-zone overrides both of them, if given
2016-10-11 16:00:26 +02:00
Shlomi Noach
6750959e1a configurable time-zone, native time parsing 2016-10-10 11:39:57 +02:00
Shlomi Noach
0c35a811f7 setting time_zoe='+00:00' on rowcopy 2016-10-08 11:06:27 +02:00
Shlomi Noach
791d963ea0 Character set recognition and manipulation
- Identifying textual characters sets; converting into specific type when applying dml events
- Refactored `ColumnsList`: introducing `Column` type
- Refactored `unsigned` handling, as part of `Column`
- `Column` type supports `convertArg()`: converting value of argument according to column data type
- DB URI attempts `utf8mb4,utf8,latin1` charsets in that order (first one to be recognized wins)
- Local tests filter by pattern
- Local tests append table schema on failure
- Local tests do not have postpone flag file
- Added character set local tests: `utf8`, `utf8mb4`, `latin1`
2016-09-07 14:24:11 +02:00
Shlomi Noach
2afb86b9e4 support for millisecond throttling
- `--max-lag-millis` is at least `100ms`
- `--heartbeat-interval-millis` introduced; defaults `500ms`, can range `100ms` - `1s`
- Control replicas lag calculated asynchronously to throttle test
  - aggressive when `max-lag-millis < 1000` and when `replication-lag-query` is given
2016-08-30 09:41:59 +02:00
Shlomi Noach
c7edd1ef84 Merge branch 'master' into concurrent-rowcount 2016-08-25 09:44:12 +02:00
Shlomi Noach
1773f338c2 keeping track of delta rows
on concurrent count(*) this means we re-apply delta onto new estimate
2016-08-24 12:16:34 +02:00
Shlomi Noach
4c972184a8 Merge branch 'master' into sql-mode-strict 2016-08-24 09:34:00 +02:00
Shlomi Noach
56fd82a824 Merge pull request #174 from Wattpad/test-on-replica-manual-replication-control
outstanding. Thank you!
2016-08-24 09:12:21 +02:00
Paulo Bittencourt
6b21ade6d0 Check for --test-on-replica-skip-replica-stop in cutOver method 2016-08-23 18:34:10 -04:00
Shlomi Noach
8b76d0e75b DML write sets sql_mode to STRICT ALL TABLES 2016-08-23 11:58:52 +02:00
Shlomi Noach
b63cc3e75e fix INSERT DML handling on renamed column 2016-08-22 16:00:15 +02:00
Shlomi Noach
9cf4819a98 Merge branch 'master' into fix-rename
Wish to incorporate important time_zone fix
2016-08-22 11:54:52 +02:00
Shlomi Noach
1376f0af23 fixed UPDATE dml on renamed column 2016-08-22 08:49:27 +02:00
Paulo Bittencourt
2e43718ef3 Add --test-on-replica-skip-replica-stop flag 2016-08-19 17:34:08 -04:00
Shlomi Noach
6d80340e4f setting time_zone on DML apply 2016-08-19 09:06:00 +02:00
Paulo Bittencourt
a62f9e0754 Add --test-on-replica-manual-replication-control flag
This will wait indefinitely for the replication status to change.
This allows us to run test schema changes in RDS without needing
custom RDS commands in gh-ost.
2016-08-18 11:53:25 -04:00
Shlomi Noach
75e0d12302 simplified error logic; fixed incorrect RowsEstimate handling on error 2016-08-18 13:38:23 +02:00
Shlomi Noach
74593ec010 DML write wrapped in transaction
- solving the golang problem: 'sql: converting Exec argument #2's type: uint64 values with high bit set are not supported'
2016-08-18 13:31:53 +02:00
Shlomi Noach
3a0ee9b4a5 clarified commented transactional apply 2016-08-17 10:50:41 +02:00
Shlomi Noach
4c8edf6372 elaborate error message on applying event data: printing out the error, query and args 2016-08-17 06:50:40 +02:00
Shlomi Noach
596dce5993 elaborate output on error in apply dml 2016-08-15 15:23:30 +02:00
Damian Gryski
e02a49449e all: use time.Since() instead of time.Now().Sub
Patch created with:
    gofmt -w -r 'time.Now().Sub(a) -> time.Since(a)' .
2016-08-02 08:38:56 -04:00
Shlomi Noach
ef59a866d8 Removed legacy 'safe cut-over'
Now that we have the atomic cut-over, the former is redundant
2016-07-16 08:12:19 -06:00
Shlomi Noach
8217536898 supporting --cut-over-lock-timeout-seconds 2016-07-08 10:14:58 +02:00
Shlomi Noach
0191b2897d an atomic cut-over implementation, as per issue #82 2016-06-27 11:08:06 +02:00
Shlomi Noach
96e8419a35 Solved cut-over stall; change of table names
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
  - Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
  - "old" table name is now `_tbl_del`, "del" standing for "delete"
  - ghost table name is now `_tbl_gho`
  - when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
  `--test-on-replica` uses `_tbl_ght` (ghost-test)
  - similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
  - changelog table is now `_tbl_ghc` (ghost-changelog)
  - To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
2016-06-21 12:56:01 +02:00
Shlomi Noach
62b8a897e3 Retries, better visibility, documentation
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
2016-06-19 17:55:37 +02:00
Shlomi Noach
23cb8ea7e9 Throttling & critical load
- Added `--throttle-query` param (when returns > 0, throttling applies)
- Added `--critical-load`, similar to `--max-load` but implies panic and quit
- Recoded *-load as `LoadMap`
- More info on *-load throttle/panic
- `printStatus()` now gets printing heuristic. Always shows up on interactive `"status"`
- Fixed `change column` (aka rename) handling with quotes
- Removed legacy `mysqlbinlog` parser code
- Added tests
2016-06-18 21:12:07 +02:00
Shlomi Noach
836d0fe119 Supporting column rename
- Parsing `alter` statement to catch `change old_name new_name ...` statements
- Auto deducing renamed columns
- When suspecting renamed columns, requesting explicit `--approve-renamed-columns` or `--skip-renamed-columns`
- updated tests
2016-06-17 08:03:18 +02:00
Shlomi Noach
96bc3804eb test-on-replica stops replication completely 2016-06-14 12:50:07 +02:00
Shlomi Noach
97adbf1ff8 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:01:06 +02:00
Shlomi Noach
cb1c61ac47 - --cut-over no longer mandatory; default to safe
- Removed `CutOverVoluntaryLock` and associated code
- Removed `CutOverUdfWait`
- `RenameTablesRollback()` first attempts an atomic swap
2016-06-14 09:00:56 +02:00
Shlomi Noach
8292f5608f Safe cut-over
- Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption)
- Self-rollsback in case of failure (restored original table)
2016-06-14 08:35:07 +02:00
Shlomi Noach
b8c7e046a1 test-on-replica to invoke cut-over swap 2016-06-10 11:15:11 +02:00
Shlomi Noach
fc00cb2289 adding interactive user commands 2016-06-07 11:59:17 +02:00
Shlomi Noach
5375aa4f69 - Removed use of master_pos_wait(). It was unneccessary in the first place and introduced new problems.
- Supporting `--allow-nullable-unique-key`
  - Tool will bail out if chosen key has nullable columns and the above is not provided
- Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue)
- Introduced reasonable streamer reconnect sleep time
2016-05-20 12:52:14 +02:00
Shlomi Noach
9b54d0208f - Handling gomysql.replication connection timeouts: reconnecting on last known position
- `printStatus()` takes ETA into account
- More info around `master_pos_wait()`
2016-05-19 15:11:36 +02:00
Shlomi Noach
ec34a5ef75 master_pos_wait is now OK to return NULL. We only care if it returns with -1 2016-05-18 15:08:47 +02:00
Shlomi Noach
9f56a84b57 Fixing single-row table migration
- `BuildUniqueKeyRangeEndPreparedQuery` supports `includeRangeStartValues` argument
- `applier` sends `this.migrationContext.GetIteration() == 0` as argument
2016-05-18 14:53:09 +02:00
Shlomi Noach
065d9c40ec some messagages are now Info instead of Debug 2016-05-17 11:57:43 +02:00
Shlomi Noach
9d055dbda7 renaming to gh-ost 2016-05-16 11:09:17 +02:00
Shlomi Noach
1e10f1f29e Solved various race conditions:
- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
2016-05-16 11:03:15 +02:00
Shlomi Noach
74d8b06db1 exact-rowcount implices updating number of rows as we make progress 2016-05-04 08:23:34 +03:00
Shlomi Noach
36905d82e3 - supporting --initially-drop-old-table
- supporting `--initially-drop-ghost-table`
- validating existence of `old` and `ghost` before beginning operation
2016-05-03 12:55:17 +03:00
Shlomi Noach
421ab0fc83 woohoo, logic complete
- Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout
- If `RENAME` fails (due to the above), we throttle and retry
- `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id
- `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released.
- `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced
- `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.
2016-04-22 19:46:34 -07:00
Shlomi Noach
1ed1b0d156 - quick-and-bumpy-swap-tables uses quicker swap tables, at the expense of a period where the table does not exist (non atomic renames)
- refactored lock-and-swap code, in preparation for atomic swap
2016-04-22 13:41:20 -07:00
Shlomi Noach
3c85298b77 - Better, fewer NOOP checks around the code
- Keeping track of `TotalDMLEventsApplied`
2016-04-19 04:25:32 -07:00
Shlomi Noach
eeffa701d6 - Added ok-to-drop-table flag
- Added `switch-to-rbr` flag; applying binlog format change if needed
- Using dedicated db instance for locking & renaming on applier (must be used from within same connection)
- Heartbeat now uses `time.RFC3339Nano`
- Swap tables works! Caveat: short table outage
- `--test-on-replica` works!
- retries: using `panicAbort`: from any goroutine, regardless of context, it is possible to terminate the operation
- Reintroduced changelog events listener on streamer. This is the correct implementation.
2016-04-18 10:57:18 -07:00
Shlomi Noach
a4ee80df13 - Building and applying queries from binlog event data!
- `INSERT`, `DELETE`, `UPDATE` statements
- support for `--noop`
- initial support for `--test-on-replica`. Verifying against `--allow-on-master`
- Changelog events no longer read from binlog stream, because reading it may be throttled, and we have to be able to keep reading the heartbeat and state events.
  They are now being read directly from table, mapping already-seen-events to avoid confusion
  Changlelog listener pools table in 2*frequency of heartbeat injection
2016-04-14 13:37:56 +02:00
Shlomi Noach
04525887f3 - Throttling-check is now an async routine running once per second
- Throttling variables protected by mutex
- Added `--throttle-additional-flag-file`: `operation pauses when this file exists; hint: keep default, use for throttling multiple gh-osc operations`
- ColumnList is not a `struct` which contains ordinal mapping
- More implicit write changelog + audit changelog
- builder now builds `DELETE` and `INSERT` queries from data it will eventually get from DML event
- Sanity check for binlog_row_image
- Restarting replication to be sure binlog settings apply
- Prepare for accepting `SIGHUP` (reloading configuration)
2016-04-11 17:27:16 +02:00
Shlomi Noach
a1a34b8150 ongoing development:
- accepts --max-load
- accepts multiple conditions in --max-load
- throttle includes reason
- chunk-size sanity check
- change log state writes both in appending (history) mode and in replacing (current) mode
- more atomic checks
- inspecting ghost table columns, unique key
- comparing unique keys between tables; sanity
- intersecting columns between tables
- prettify status
- refactored throttle() and retries()
2016-04-08 14:35:06 +02:00
Shlomi Noach
75f68c0752 - row copy and row events are now handled by a single routine which prioritizes events over rowcopy
- Supporting `--throttle-file-flag`
- Printing status
- Supporting transactional table syntax
- code cleanup; refactoring
- proper use of atomic where required
- iterations are in changelog (erm... maybe too much)
- `LOCK TABLES`, `UNLOCK TABLES` working
2016-04-08 10:34:44 +02:00
Shlomi Noach
0e7b23e6fe - Creating an populating Changelog table
- Using heartbeat
- Throttling works based on heartbeat
- Refactored binlog_reader stuff. Now streaming events (into golang channel, which makes for nice buffering and throttling)
- Binlog table listeners work
- More Migrator logic; existing logic for waiting on `state` events (e.g. `TablesCreatedState`)
2016-04-07 15:57:12 +02:00
Shlomi Noach
5deff2adb6 - working POC of row-copy iteration cycle
- initial work on table columns
- initial work on events streamer
2016-04-06 13:05:58 +02:00
Shlomi Noach
d8fefb3d6f exploded args on range query building; iteration works 2016-04-05 19:50:49 +02:00
Shlomi Noach
3583ab5dc5 beginning support for ranges and iteration. Still WIP 2016-04-05 09:14:22 +02:00
Shlomi Noach
ea0906f4e5 reading table (range) min/max values, right now according to hardcoded unique key 2016-04-04 18:19:46 +02:00
Shlomi Noach
937491674c adding applier, instance_key, instance_key_map 2016-04-04 15:30:49 +02:00