Merge branch 'master' into fix-inspector-column-types

This commit is contained in:
Shlomi Noach 2018-12-16 10:28:28 +02:00 committed by GitHub
commit 179a3c2fc4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 68 additions and 26 deletions

View File

@ -69,6 +69,10 @@ This is somewhat similar to a Nagios `n`-times test, where `n` in our case is al
Optional. Default is `safe`. See more discussion in [`cut-over`](cut-over.md) Optional. Default is `safe`. See more discussion in [`cut-over`](cut-over.md)
### cut-over-lock-timeout-seconds
Default `3`. Max number of seconds to hold locks on tables while attempting to cut-over (retry attempted when lock exceeds timeout).
### discard-foreign-keys ### discard-foreign-keys
**Danger**: this flag will _silently_ discard any foreign keys existing on your table. **Danger**: this flag will _silently_ discard any foreign keys existing on your table.
@ -107,6 +111,10 @@ While the ongoing estimated number of rows is still heuristic, it's almost exact
Without this parameter, migration is a _noop_: testing table creation and validity of migration, but not touching data. Without this parameter, migration is a _noop_: testing table creation and validity of migration, but not touching data.
### force-table-names
Table name prefix to be used on the temporary tables.
### gcp ### gcp
Add this flag when executing on a 1st generation Google Cloud Platform (GCP). Add this flag when executing on a 1st generation Google Cloud Platform (GCP).
@ -125,6 +133,10 @@ We think `gh-ost` should not take chances or make assumptions about the user's t
See [`initially-drop-ghost-table`](#initially-drop-ghost-table) See [`initially-drop-ghost-table`](#initially-drop-ghost-table)
### initially-drop-socket-file
Default False. Should `gh-ost` forcibly delete an existing socket file. Be careful: this might drop the socket file of a running migration!
### max-lag-millis ### max-lag-millis
On a replication topology, this is perhaps the most important migration throttling factor: the maximum lag allowed for migration to work. If lag exceeds this value, migration throttles. On a replication topology, this is perhaps the most important migration throttling factor: the maximum lag allowed for migration to work. If lag exceeds this value, migration throttles.
@ -169,6 +181,10 @@ See [`approve-renamed-columns`](#approve-renamed-columns)
Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [`testing-on-replica`](testing-on-replica.md) Issue the migration on a replica; do not modify data on master. Useful for validating, testing and benchmarking. See [`testing-on-replica`](testing-on-replica.md)
### test-on-replica-skip-replica-stop
Default `False`. When `--test-on-replica` is enabled, do not issue commands stop replication (requires `--test-on-replica`).
### throttle-control-replicas ### throttle-control-replicas
Provide a command delimited list of replicas; `gh-ost` will throttle when any of the given replicas lag beyond [`--max-lag-millis`](#max-lag-millis). The list can be queried and updated dynamically via [interactive commands](interactive-commands.md) Provide a command delimited list of replicas; `gh-ost` will throttle when any of the given replicas lag beyond [`--max-lag-millis`](#max-lag-millis). The list can be queried and updated dynamically via [interactive commands](interactive-commands.md)

View File

@ -1,10 +1,10 @@
# Shared key # Shared key
A requirement for a migration to run is that the two _before_ and _after_ tables have a shared unique key. This is to elaborate and illustrate on the matter. gh-ost requires for every migration that both the _before_ and _after_ versions of the table share the same unique not-null key columns. This page illustrates this rule.
### Introduction ### Introduction
Consider a classic, simple migration. The table is any normal: Consider a simple migration, with a normal table,
```sql ```sql
CREATE TABLE tbl ( CREATE TABLE tbl (
@ -15,54 +15,72 @@ CREATE TABLE tbl (
) )
``` ```
And the migration is a simple `add column ts timestamp`. and the migration `add column ts timestamp`. The _after_ table version would be:
In such migration there is no change in indexes, and in particular no change to any unique key, and specifically no change to the `PRIMARY KEY`. To run this migration, `gh-ost` would iterate the `tbl` table using the primary key, copy rows from `tbl` to the _ghost_ table `_tbl_gho` by order of `id`, and then apply binlog events onto `_tbl_gho`. ```sql
CREATE TABLE tbl (
id bigint unsigned not null auto_increment,
data varchar(255),
more_data int,
ts timestamp,
PRIMARY KEY(id)
)
```
Applying the binlog events assumes the existence of a shared unique key. For example, an `UPDATE` statement in the binary log translate to a `REPLACE` statement which `gh-ost` applies to the _ghost_ table. Such statement expects to add or replace an existing row based on given row data. In particular, it would _replace_ an existing row if a unique key violation is met. (This is also the definition of the _ghost_ table, except that that table would be called `_tbl_gho`).
So `gh-ost` correlates `tbl` and `_tbl_gho` rows using a unique key. In the above example that would be the `PRIMARY KEY`. In this migration, the _before_ and _after_ versions contain the same unique not-null key (the PRIMARY KEY). To run this migration, `gh-ost` would iterate through the `tbl` table using the primary key, copy rows from `tbl` to the _ghost_ table `_tbl_gho` in primary key order, while also applying the binlog event writes from `tble` onto `_tbl_gho`.
### Rules The applying of the binlog events is what requires the shared unique key. For example, an `UPDATE` statement to `tbl` translates to a `REPLACE` statement which `gh-ost` applies to `_tbl_gho`. A `REPLACE` statement expects to insert or replace an existing row based on its row's values and the table's unique key constraints. In particular, if inserting that row would result in a unique key violation (e.g., a row with that primary key already exists), it would _replace_ that existing row with the new values.
There must be a shared set of not-null columns for which there is a unique constraint in both the original table and the migration (_ghost_) table. So `gh-ost` correlates `tbl` and `_tbl_gho` rows one to one using a unique key. In the above example that would be the `PRIMARY KEY`.
### Interpreting the rules ### Interpreting the rule
The same columns must be covered by a unique key in both tables. This doesn't have to be the `PRIMARY KEY`. This doesn't have to be a key of the same name. The _before_ and _after_ versions of the table share the same unique not-null key, but:
- the key doesn't have to be the PRIMARY KEY
- the key can have a different name between the _before_ and _after_ versions (e.g., renamed via DROP INDEX and ADD INDEX) so long as it contains the exact same column(s)
Upon migration, `gh-ost` inspects both the original and _ghost_ table and attempts to find at least one such unique key (or rather, a set of columns) that is shared between the two. Typically this would just be the `PRIMARY KEY`, but sometimes you may change the `PRIMARY KEY` itself, in which case `gh-ost` will look for other options. At the start of the migration, `gh-ost` inspects both the original and _ghost_ table it created, and attempts to find at least one such unique key (or rather, a set of columns) that is shared between the two. Typically this would just be the `PRIMARY KEY`, but some tables don't have primary keys, or sometimes it is the primary key that is being modified by the migration. In these cases `gh-ost` will look for other options.
`gh-ost` expects unique keys where no `NULL` values are found, i.e. all columns covered by the unique key are defined as `NOT NULL`. This is implicitly true for `PRIMARY KEY`s. If no such key can be found, `gh-ost` bails out. In the event there is no such key, but you happen to _know_ your columns have no `NULL` values even though they're `NULL`-able, you may take responsibility and pass the `--allow-nullable-unique-key`. The migration will run well as long as no `NULL` values are found in the unique key's columns. Any actual `NULL`s may corrupt the migration. `gh-ost` expects unique keys where no `NULL` values are found, i.e. all columns contained in the unique key are defined as `NOT NULL`. This is implicitly true for primary keys. If no such key can be found, `gh-ost` bails out.
### Examples: allowed and not allowed If the table contains a unique key with nullable columns, but you know your columns contain no `NULL` values, use the `--allow-nullable-unique-key` option. The migration will run well as long as no `NULL` values are found in the unique key's columns. **Any actual `NULL`s may corrupt the migration.**
### Examples: Allowed and Not Allowed
```sql ```sql
create table some_table ( create table some_table (
id int auto_increment, id int not null auto_increment,
ts timestamp, ts timestamp,
name varchar(128) not null, name varchar(128) not null,
owner_id int not null, owner_id int not null,
loc_id int, loc_id int not null,
primary key(id), primary key(id),
unique key name_uidx(name) unique key name_uidx(name)
) )
``` ```
Following are examples of migrations that are _good to run_: Note the two unique, not-null indexes: the primary key and `name_uidx`.
Allowed migrations:
- `add column i int` - `add column i int`
- `add key owner_idx(owner_id)` - `add key owner_idx (owner_id)`
- `add unique key owner_name_idx(owner_id, name)` - though you need to make sure to not write conflicting rows while this migration runs - `add unique key owner_name_idx (owner_id, name)` - **be careful not to write conflicting rows while this migration runs**
- `drop key name_uidx` - `primary key` is shared between the tables - `drop key name_uidx` - `primary key` is shared between the tables
- `drop primary key, add primary key(owner_id, loc_id)` - `name_uidx` is shared between the tables and is used for migration - `drop primary key, add primary key(owner_id, loc_id)` - `name_uidx` is shared between the tables
- `change id bigint unsigned` - the `'primary key` is used. The change of type still makes the `primary key` workable. - `change id bigint unsigned not null auto_increment` - the `primary key` changes datatype but not value, and can be used
- `drop primary key, drop key name_uidx, create primary key(name), create unique key id_uidx(id)` - swapping the two keys. `gh-ost` is still happy because `id` is still unique in both tables. So is `name`. - `drop primary key, drop key name_uidx, add primary key(name), add unique key id_uidx(id)` - swapping the two keys. Either `id` or `name` could be used
Not allowed:
- `drop primary key, drop key name_uidx` - the _ghost_ table has no unique key
- `drop primary key, drop key name_uidx, create primary key(name, owner_id)` - no shared columns to the unique keys on both tables. Even though `name` exists in the _ghost_ table's `primary key`, it is only part of the key and in itself does not guarantee uniqueness in the _ghost_ table.
Following are examples of migrations that _cannot run_: ### Workarounds
- `drop primary key, drop key name_uidx` - no unique key to _ghost_ table, so clearly cannot run If you need to change your primary key or only not-null unique index to use different columns, you will want to do it as two separate migrations:
- `drop primary key, drop key name_uidx, create primary key(name, owner_id)` - no shared columns to both tables. Even though `name` exists in the _ghost_ table's `primary key`, it is only part of the key and in itself does not guarantee uniqueness in the _ghost_ table. 1. `ADD UNIQUE KEY temp_pk (temp_pk_column,...)`
1. `DROP PRIMARY KEY, DROP KEY temp_pk, ADD PRIMARY KEY (temp_pk_column,...)`
Also, you cannot run a migration on a table that doesn't have some form of `unique key` in the first place, such as `some_table (id int, ts timestamp)`

View File

@ -46,6 +46,14 @@ Note that you may dynamically change both `--max-lag-millis` and the `throttle-c
An example query could be: `--throttle-query="select hour(now()) between 8 and 17"` which implies throttling auto-starts `8:00am` and migration auto-resumes at `18:00pm`. An example query could be: `--throttle-query="select hour(now()) between 8 and 17"` which implies throttling auto-starts `8:00am` and migration auto-resumes at `18:00pm`.
#### HTTP Throttle
The `--throttle-http` flag allows for throttling via HTTP. Every 100ms `gh-ost` issues a `HEAD` request to the provided URL. If the response status code is not `200` throttling will kick in until a `200` response status code is returned.
If no URL is provided or the URL provided doesn't contain the scheme then the HTTP check will be disabled. For example `--throttle-http="http://1.2.3.4:6789/throttle"` will enable the HTTP check/throttling, but `--throttle-http="1.2.3.4:6789/throttle"` will not.
The URL can be queried and updated dynamically via [interactive interface](interactive-commands.md).
#### Manual control #### Manual control
In addition to the above, you are able to take control and throttle the operation any time you like. In addition to the above, you are able to take control and throttle the operation any time you like.