This will wait indefinitely for the replication status to change.
This allows us to run test schema changes in RDS without needing
custom RDS commands in gh-ost.
This helps when the table itself doesn't have foreign keys defined but if there are other tables with foreign keys pointing to the table on which gh-ost runs.
This gives INFO messages for each FK. Note that it now informs the user about `payment` being involved.
```
$ ./gh-ost -database sakila -table rental -alter 'ADD COLUMN ghost_test_001 tinyint DEFAULT NULL' -port 19590 -user msandbox -password msandbox -verbose
2016-08-03 10:18:45 INFO starting gh-ost 1.0.8
2016-08-03 10:18:45 INFO Migrating `sakila`.`rental`
2016-08-03 10:18:45 INFO connection validated on 127.0.0.1:19590
2016-08-03 10:18:45 INFO User has ALL privileges
2016-08-03 10:18:45 INFO binary logs validated on 127.0.0.1:19590
2016-08-03 10:18:45 INFO Restarting replication on 127.0.0.1:19590 to make sure binlog settings apply to replication thread
2016-08-03 10:18:46 INFO Table found. Engine=InnoDB
2016-08-03 10:18:47 INFO Found foreign key on `sakila`.`payment` related to `sakila`.`rental`
2016-08-03 10:18:47 INFO Found foreign key on `sakila`.`rental` related to `sakila`.`rental`
2016-08-03 10:18:47 INFO Found foreign key on `sakila`.`rental` related to `sakila`.`rental`
2016-08-03 10:18:47 INFO Found foreign key on `sakila`.`rental` related to `sakila`.`rental`
2016-08-03 10:18:47 ERROR Found 4 foreign keys related to `sakila`.`rental`. Foreign keys are not supported. Bailing out
2016-08-03 10:18:47 FATAL 2016-08-03 10:18:47 ERROR Found 4 foreign keys related to `sakila`.`rental`. Foreign keys are not supported. Bailing out
```
Related Issue: https://github.com/github/gh-ost/issues/129
Related Issue: https://github.com/github/gh-ost/issues/136
New output:
```
2016-08-05 11:39:07 DEBUG Privileges: Super: true, REPLICATION SLAVE: false, ALL on *.*: false, ALL on `sakila`.*: true
2016-08-05 11:39:07 ERROR User has insufficient privileges for migration. Needed: SUPER, REPLICATION SLAVE and ALL on `sakila`.*
2016-08-05 11:39:07 FATAL 2016-08-05 11:39:07 ERROR User has insufficient privileges for migration. Needed: SUPER, REPLICATION SLAVE and ALL on `sakila`.*
```
- by default gh-ost will not delete an existing socket file
and thus, will fail running if socket file exists. This is the desired behavior.
- The flag --initially-drop-socket-file indicates we take responsibility and wish gh-ost to delete this file on startup
- Cutover would stall after `lock tables` wait-timeout due do waiting on a channel that would never be written to. This has been identified, reproduced, fixed, confirmed.
- Change of table names. Heres the story:
- Because were testing this even while `pt-online-schema-change` is being used in production, the `_tbl_old` naming convention makes for a collision.
- "old" table name is now `_tbl_del`, "del" standing for "delete"
- ghost table name is now `_tbl_gho`
- when issuing `--test-on-replica`, we keep the ghost table around, and were also briefly renaming original table to "old". Well this collides with a potentially existing "old" table on master (one that hasnt been dropped yet).
`--test-on-replica` uses `_tbl_ght` (ghost-test)
- similar problem with `--execute-on-replica`, and in this case the table doesnt stick around; calling it `_tbl_ghr` (ghost-replica)
- changelog table is now `_tbl_ghc` (ghost-changelog)
- To clarify, I dont want to go down the path of creating "old" tables with 2 or 3 or 4 or 5 or infinite leading underscored. I think this is very confusing and actually not operations friendly. Its OK that the migration will fail saying "hey, you ALREADY have an old table here, why dont you take care of it first", rather than create _yet_another_ `____tbl_old` table. Were always confused on which table it actually is that gets migrated, which is safe to `drop`, etc.
- just after rowcopy completing, just before cutover, during cutover: marking as point in time _of interest_ so as to increase logging frequency.
- Rowcopy time is bounded by copy end-time
- Retries are configurable via `--default-retries` (default: `60`)
- `migrator` notes the hostname
- `applier` and `inspector` note `impliedKey` (`@@hostname` and `@@port`)
- Added lots of code comments
- Adding documentation for "triggerless design"
- Supporting multi-step, safe cut-over phase, where queries are blocked throughout the phase, and worst case scenario is table outage (no data corruption)
- Self-rollsback in case of failure (restored original table)
The idea is that the user is forced to specify the cut-over type they wish to use, given that each type has some drawbacks.
- More data in status hint
- `select count(*)` is deferred till after we validate migration is valid. Also, it is skipped on `--noop`
- Supporting `--allow-nullable-unique-key`
- Tool will bail out if chosen key has nullable columns and the above is not provided
- Fixed `OriginalBinlogRowImage` comaprison (lower/upper case issue)
- Introduced reasonable streamer reconnect sleep time
- Fixed version printing
- `rowCopyCompleteFlag` is a hint that allows us to escape the infinite loop of rowcopy once we are sure we have reached the end
- Operation would terminate after events lock noticed but before applying all events: race condition where the event would be captured asynchronously. The event is now handled sequentially with the DML events, hence now safe.
- Multiple rowcopy operations would still write to `rowCopyComplete` channel. This is still the case, but now we only wait for the first and then just flush (read and discard) any others, to avoid blocking
- Events DML listener is only added after table creation: the problem was that with very busy tables, the events func buffer would fill up, and the "tables-created" event would be blocked.
- `waitForEventsUpToLock()` unifies the waiting on all variants of complete-migration
- With `--test-on-replica`, now stopping replication "nicely", using `master_pos_wait()`
- With `--test-on-replica`, not throttling on replication after replication is stopped (duh)
- More debug output
- when `--test-on-replica`, the tested replica is implicitly a control replica
- added `replication-lag-query`, an alternate query to `SHOW SLAVE STATUS` to get replication lag
- throttling takes both the above into consideration
- Introduced `SwapTablesTimeoutSeconds`; `RENAME` is limited by this timeout
- If `RENAME` fails (due to the above), we throttle and retry
- `SwapTablesAtomic()` sets `lock_wait_timeout` and notifies with connection id
- `GrabVoluntaryLock()` intentionally grabs (and later releases) voluntary lock. It notifies when it is taken and awaits instructions as for when it could be released.
- `IssueBlockingQueryOnVoluntaryLock()` does what it says. It notifies with its connection_id so that it can be easily traced
- `stopWritesAndCompleteMigrationOnMasterViaLock()` does the thang. Oh dear this was agonizing and the code is a pain to look at, though under the limitations I do believe it is as clean as I could hope for.