diff --git a/doc/requirements-and-limitations.md b/doc/requirements-and-limitations.md index 81e7226..7d246a8 100644 --- a/doc/requirements-and-limitations.md +++ b/doc/requirements-and-limitations.md @@ -30,11 +30,12 @@ The `SUPER` privilege is required for `STOP SLAVE`, `START SLAVE` operations. Th - MySQL 5.7 `JSON` columns are not supported. They are likely to be supported shortly. -- The two _before_ & _after_ tables must share some `UNIQUE KEY`. Such key would be used by `gh-ost` to iterate the table. - - As an example, if your table has a single `UNIQUE KEY` and no `PRIMARY KEY`, and you wish to replace it with a `PRIMARY KEY`, you will need two migrations: one to add the `PRIMARY KEY` (this migration will use the existing `UNIQUE KEY`), another to drop the now redundant `UNIQUE KEY` (this migration will use the `PRIMARY KEY`). - -- The chosen migration key must not include columns with `NULL` values. - - `gh-ost` will do its best to pick a migration key with non-nullable columns. It will by default refuse a migration where the only possible `UNIQUE KEY` includes nullable-columns. You may override this refusal via `--allow-nullable-unique-key` but **you must** be sure there are no actual `NULL` values in those columns. Such `NULL` values would cause a data integrity problem and potentially a corrupted migration. +- The two _before_ & _after_ tables must share a `PRIMARY KEY` or other `UNIQUE KEY`. This key will be used by `gh-ost` to iterate through the table rows when copying. [Read more](shared-key.md) + - The migration key must not include columns with NULL values. This means either: + 1. The columns are `NOT NULL`, or + 2. The columns are nullable but don't contain any NULL values. + - by default, `gh-ost` will not run if the only `UNIQUE KEY` includes nullable columns. + - You may override this via `--allow-nullable-unique-key` but make sure there are no actual `NULL` values in those columns. Existing NULL values can't guarantee data integrity on the migrated table. - It is not allowed to migrate a table where another table exists with same name and different upper/lower case. - For example, you may not migrate `MyTable` if another table called `MYtable` exists in the same schema. @@ -48,4 +49,4 @@ The `SUPER` privilege is required for `STOP SLAVE`, `START SLAVE` operations. Th - If you have en `enum` field as part of your migration key (typically the `PRIMARY KEY`), migration performance will be degraded and potentially bad. [Read more](https://github.com/github/gh-ost/pull/277#issuecomment-254811520) -- Migrating a `FEDERATED` table is unsupported and is irrelevant to the problem `gh-ost` tackles. +- Migrating a `FEDERATED` table is unsupported and is irrelevant to the problem `gh-ost` tackles. diff --git a/doc/shared-key.md b/doc/shared-key.md new file mode 100644 index 0000000..480ad97 --- /dev/null +++ b/doc/shared-key.md @@ -0,0 +1,68 @@ +# Shared key + +A requirement for a migration to run is that the two _before_ and _after_ tables have a shared unique key. This is to elaborate and illustrate on the matter. + +### Introduction + +Consider a classic, simple migration. The table is any normal: + +``` +CREATE TABLE tbl ( + id bigint unsigned not null auto_increment, + data varchar(255), + more_data int, + PRIMARY KEY(id) +) +``` + +And the migration is a simple `add column ts timestamp`. + +In such migration there is no change in indexes, and in particular no change to any unique key, and specifically no change to the `PRIMARY KEY`. To run this migration, `gh-ost` would iterate the `tbl` table using the primary key, copy rows from `tbl` to the _ghost_ table `_tbl_gho` by order of `id`, and then apply binlog events onto `_tbl_gho`. + +Applying the binlog events assumes the existence of a shared unique key. For example, an `UPDATE` statement in the binary log translate to a `REPLACE` statement which `gh-ost` applies to the _ghost_ table. Such statement expects to add or replace an existing row based on given row data. In particular, it would _replace_ an existing row if a unique key violation is met. + +So `gh-ost` correlates `tbl` and `_tbl_gho` rows using a unique key. In the above example that would be the `PRIMARY KEY`. + +### Rules + +There must be a shared set of not-null columns for which there is a unique constraint in both the original table and the migration (_ghost_) table. + +### Interpreting the rules + +The same columns must be covered by a unique key in both tables. This doesn't have to be the `PRIMARY KEY`. This doesn't have to be a key of the same name. + +Upon migration, `gh-ost` inspects both the original and _ghost_ table and attempts to find at least one such unique key (or rather, a set of columns) that is shared between the two. Typically this would just be the `PRIMARY KEY`, but sometimes you may change the `PRIMARY KEY` itself, in which case `gh-ost` will look for other options. + +`gh-ost` expects unique keys where no `NULL` values are found, i.e. all columns covered by the unique key are defined as `NOT NULL`. This is implicitly true for `PRIMARY KEY`s. If no such key can be found, `gh-ost` bails out. In the event there is no such key, but you happen to _know_ your columns have no `NULL` values even though they're `NULL`-able, you may take responsibility and pass the `--allow-nullable-unique-key`. The migration will run well as long as no `NULL` values are found in the unique key's columns. Any actual `NULL`s may corrupt the migration. + +### Examples: allowed and not allowed + +``` +create table some_table ( + id int auto_increment, + ts timestamp, + name varchar(128) not null, + owner_id int not null, + loc_id int, + primary key(id), + unique key name_uidx(name) +) +``` + +Following are examples of migrations that are _good to run_: + +- `add column i int` +- `add key owner_idx(owner_id)` +- `add unique key owner_name_idx(owner_id, name)` - though you need to make sure to not write conflicting rows while this migration runs +- `drop key name_uidx` - `primary key` is shared between the tables +- `drop primary key, add primary key(owner_id, loc_id)` - `name_uidx` is shared between the tables and is used for migration +- `change id bigint unsigned` - the `'primary key` is used. The change of type still makes the `primary key` workable. +- `drop primary key, drop key name_uidx, create primary key(name), create unique key id_uidx(id)` - swapping the two keys. `gh-ost` is still happy because `id` is still unique in both tables. So is `name`. + + +Following are examples of migrations that _cannot run_: + +- `drop primary key, drop key name_uidx` - no unique key to _ghost_ table, so clearly cannot run +- `drop primary key, drop key name_uidx, create primary key(name, owner_id)` - no shared columns to both tables. Even though `name` exists in the _ghost_ table's `primary key`, it is only part of the key and in itself does not guarantee uniqueness in the _ghost_ table. + +Also, you cannot run a migration on a table that doesn't have some form of `unique key` in the first place, such as `some_table (id int, ts timestamp)` diff --git a/localtests/fail-drop-pk/create.sql b/localtests/fail-drop-pk/create.sql new file mode 100644 index 0000000..5bb45f2 --- /dev/null +++ b/localtests/fail-drop-pk/create.sql @@ -0,0 +1,22 @@ +drop table if exists gh_ost_test; +create table gh_ost_test ( + id int auto_increment, + i int not null, + ts timestamp, + primary key(id) +) auto_increment=1; + +drop event if exists gh_ost_test; +delimiter ;; +create event gh_ost_test + on schedule every 1 second + starts current_timestamp + ends current_timestamp + interval 60 second + on completion not preserve + enable + do +begin + insert into gh_ost_test values (null, 11, now()); + insert into gh_ost_test values (null, 13, now()); + insert into gh_ost_test values (null, 17, now()); +end ;; diff --git a/localtests/fail-drop-pk/expect_failure b/localtests/fail-drop-pk/expect_failure new file mode 100644 index 0000000..021ae87 --- /dev/null +++ b/localtests/fail-drop-pk/expect_failure @@ -0,0 +1 @@ +No PRIMARY nor UNIQUE key found in table diff --git a/localtests/fail-drop-pk/extra_args b/localtests/fail-drop-pk/extra_args new file mode 100644 index 0000000..2017cbc --- /dev/null +++ b/localtests/fail-drop-pk/extra_args @@ -0,0 +1 @@ +--alter="change id id int, drop primary key" diff --git a/localtests/fail-no-shared-uk/create.sql b/localtests/fail-no-shared-uk/create.sql new file mode 100644 index 0000000..5bb45f2 --- /dev/null +++ b/localtests/fail-no-shared-uk/create.sql @@ -0,0 +1,22 @@ +drop table if exists gh_ost_test; +create table gh_ost_test ( + id int auto_increment, + i int not null, + ts timestamp, + primary key(id) +) auto_increment=1; + +drop event if exists gh_ost_test; +delimiter ;; +create event gh_ost_test + on schedule every 1 second + starts current_timestamp + ends current_timestamp + interval 60 second + on completion not preserve + enable + do +begin + insert into gh_ost_test values (null, 11, now()); + insert into gh_ost_test values (null, 13, now()); + insert into gh_ost_test values (null, 17, now()); +end ;; diff --git a/localtests/fail-no-shared-uk/expect_failure b/localtests/fail-no-shared-uk/expect_failure new file mode 100644 index 0000000..d3ef0f1 --- /dev/null +++ b/localtests/fail-no-shared-uk/expect_failure @@ -0,0 +1 @@ +No shared unique key can be found after ALTER diff --git a/localtests/fail-no-shared-uk/extra_args b/localtests/fail-no-shared-uk/extra_args new file mode 100644 index 0000000..379a77d --- /dev/null +++ b/localtests/fail-no-shared-uk/extra_args @@ -0,0 +1 @@ +--alter="drop primary key, add primary key (id, i)" diff --git a/localtests/swap-pk-uk/create.sql b/localtests/swap-pk-uk/create.sql new file mode 100644 index 0000000..aa712b8 --- /dev/null +++ b/localtests/swap-pk-uk/create.sql @@ -0,0 +1,24 @@ +drop table if exists gh_ost_test; +create table gh_ost_test ( + id bigint, + i int not null, + ts timestamp(6), + primary key(id), + unique key its_uidx(i, ts) +) ; + +drop event if exists gh_ost_test; +delimiter ;; +create event gh_ost_test + on schedule every 1 second + starts current_timestamp + ends current_timestamp + interval 60 second + on completion not preserve + enable + do +begin + insert into gh_ost_test values ((unix_timestamp() << 2) + 0, 11, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 1, 13, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 2, 17, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 3, 19, now(6)); +end ;; diff --git a/localtests/swap-pk-uk/extra_args b/localtests/swap-pk-uk/extra_args new file mode 100644 index 0000000..f4de64a --- /dev/null +++ b/localtests/swap-pk-uk/extra_args @@ -0,0 +1 @@ +--alter="drop primary key, drop key its_uidx, add primary key (i, ts), add unique key id_uidx(id)" diff --git a/localtests/swap-pk-uk/order_by b/localtests/swap-pk-uk/order_by new file mode 100644 index 0000000..074d1ee --- /dev/null +++ b/localtests/swap-pk-uk/order_by @@ -0,0 +1 @@ +id diff --git a/localtests/swap-uk-uk/create.sql b/localtests/swap-uk-uk/create.sql new file mode 100644 index 0000000..5fcbf32 --- /dev/null +++ b/localtests/swap-uk-uk/create.sql @@ -0,0 +1,24 @@ +drop table if exists gh_ost_test; +create table gh_ost_test ( + id bigint, + i int not null, + ts timestamp(6), + unique key id_uidx(id), + unique key its_uidx(i, ts) +) ; + +drop event if exists gh_ost_test; +delimiter ;; +create event gh_ost_test + on schedule every 1 second + starts current_timestamp + ends current_timestamp + interval 60 second + on completion not preserve + enable + do +begin + insert into gh_ost_test values ((unix_timestamp() << 2) + 0, 11, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 1, 13, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 2, 17, now(6)); + insert into gh_ost_test values ((unix_timestamp() << 2) + 3, 19, now(6)); +end ;; diff --git a/localtests/swap-uk-uk/extra_args b/localtests/swap-uk-uk/extra_args new file mode 100644 index 0000000..84161a5 --- /dev/null +++ b/localtests/swap-uk-uk/extra_args @@ -0,0 +1 @@ +--alter="drop key id_uidx, drop key its_uidx, add unique key its2_uidx(i, ts), add unique key id2_uidx(id)" diff --git a/localtests/swap-uk-uk/order_by b/localtests/swap-uk-uk/order_by new file mode 100644 index 0000000..074d1ee --- /dev/null +++ b/localtests/swap-uk-uk/order_by @@ -0,0 +1 @@ +id diff --git a/localtests/swap-uk/create.sql b/localtests/swap-uk/create.sql new file mode 100644 index 0000000..5bb45f2 --- /dev/null +++ b/localtests/swap-uk/create.sql @@ -0,0 +1,22 @@ +drop table if exists gh_ost_test; +create table gh_ost_test ( + id int auto_increment, + i int not null, + ts timestamp, + primary key(id) +) auto_increment=1; + +drop event if exists gh_ost_test; +delimiter ;; +create event gh_ost_test + on schedule every 1 second + starts current_timestamp + ends current_timestamp + interval 60 second + on completion not preserve + enable + do +begin + insert into gh_ost_test values (null, 11, now()); + insert into gh_ost_test values (null, 13, now()); + insert into gh_ost_test values (null, 17, now()); +end ;; diff --git a/localtests/swap-uk/extra_args b/localtests/swap-uk/extra_args new file mode 100644 index 0000000..ad80561 --- /dev/null +++ b/localtests/swap-uk/extra_args @@ -0,0 +1 @@ +--alter="drop primary key, add unique key(id)" diff --git a/localtests/test.sh b/localtests/test.sh index eb0274a..c0b97a5 100755 --- a/localtests/test.sh +++ b/localtests/test.sh @@ -59,12 +59,16 @@ test_single() { fi orig_columns="*" ghost_columns="*" + order_by="" if [ -f $tests_path/$test_name/orig_columns ] ; then orig_columns=$(cat $tests_path/$test_name/orig_columns) fi if [ -f $tests_path/$test_name/ghost_columns ] ; then ghost_columns=$(cat $tests_path/$test_name/ghost_columns) fi + if [ -f $tests_path/$test_name/order_by ] ; then + order_by="order by $(cat $tests_path/$test_name/order_by)" + fi # graceful sleep for replica to catch up echo_dot sleep 1 @@ -129,8 +133,8 @@ test_single() { fi echo_dot - orig_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${orig_columns} from gh_ost_test" -ss | md5sum) - ghost_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${ghost_columns} from _gh_ost_test_gho" -ss | md5sum) + orig_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${orig_columns} from gh_ost_test ${order_by}" -ss | md5sum) + ghost_checksum=$(gh-ost-test-mysql-replica --default-character-set=utf8mb4 test -e "select ${ghost_columns} from _gh_ost_test_gho ${order_by}" -ss | md5sum) if [ "$orig_checksum" != "$ghost_checksum" ] ; then echo "ERROR $test_name: checksum mismatch"