gh-ost/localtests
Josh Bielick 84e55ff904
copy and update text using convert when charset changes
addresses #290

Note: there is currently no issue backfilling the ghost table when the
characterset changes, likely because it's a insert-into-select-from and
it all occurs within mysql.

However, when applying DML events (UPDATE, DELETE, etc) the values are
sprintf'd into a prepared statement and due to the possibility of
migrating text column data containing invalid characters in the
destination charset, a conversion step is often necessary.

For example, when migrating a table/column from latin1 to utf8mb4, the
latin1 column may contain characters that are invalid single-byte utf8
characters. Characters in the \x80-\xFF range are most common. When
written to utf8mb4 column without conversion, they fail as they do not
exist in the utf8 codepage.

Converting these texts/characters to the destination charset using
convert(? using {charset}) will convert appropriately and the
update/replace will succeed.

I only point out the "Note:" above because there are two tests added
for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert

The former is a test that fails prior to this commit. The latter is a
test that succeeds prior to this comment. Both are affected by the code
in this commit.

convert text to original charset, then destination

converting text first to the original charset and then to the
destination charset produces the most consistent results, as inserting
the binary into a utf8-charset column may encounter an error if there is
no prior context of latin1 encoding.

mysql> select hex(convert(char(189) using utf8mb4));
+---------------------------------------+
| hex(convert(char(189) using utf8mb4)) |
+---------------------------------------+
|                                       |
+---------------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4));
+-------------------------------------------------------------+
| hex(convert(convert(char(189) using latin1) using utf8mb4)) |
+-------------------------------------------------------------+
| C2BD                                                        |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

as seen in this failure on 5.5.62

 Error 1300: Invalid utf8mb4 character string: 'BD'; query=
			replace /* gh-ost `test`.`_gh_ost_test_gho` */ into
				`test`.`_gh_ost_test_gho`
					(`id`, `t`)
				values
					(?, convert(? using utf8mb4))
2021-07-14 09:20:24 -04:00
..
alter-charset Add test for modifying columns to different charsets 2016-09-19 09:37:08 -04:00
alter-charset-all-dml added update/delete tests for multi-charset/alter tests 2016-09-27 12:57:05 +02:00
autoinc-copy-deletes Copy auto increment (#967) 2021-05-14 15:32:56 +02:00
autoinc-copy-deletes-user-defined Copy auto increment (#967) 2021-05-14 15:32:56 +02:00
autoinc-copy-simple Copy auto increment (#967) 2021-05-14 15:32:56 +02:00
autoinc-zero-value Always use NO_AUTO_VALUE_ON_ZERO 2019-03-24 11:32:37 +02:00
bigint-change-nullable Testing nullable int 2019-01-03 11:18:07 +02:00
bit-add BIT datatype tests 2018-11-01 15:27:28 +02:00
bit-dml BIT datatype tests 2018-11-01 15:27:28 +02:00
convert-utf8mb4 copy and update text using convert when charset changes 2021-07-14 09:20:24 -04:00
datetime 5.5 excluded tests 2018-02-26 18:38:39 +02:00
datetime-1970 more attempts at session time zone 2018-10-16 11:25:46 +03:00
datetime-submillis Merge branch 'master' into tests-updates 2019-02-10 11:24:19 +02:00
datetime-submillis-zeroleading 5.5 excluded tests 2018-02-26 18:38:39 +02:00
datetime-to-timestamp testing 5.7 and JSON 2017-07-26 12:29:32 +03:00
datetime-to-timestamp-pk-fail 5.5 excluded tests 2018-02-26 18:41:42 +02:00
decimal Testing DECIMAL datatype 2018-11-28 10:19:28 +02:00
discard-fk more tests for foreign keys, including expected failures 2016-10-10 12:29:25 +02:00
drop-null-add-not-null tests to only check non dropped columns 2017-04-23 08:48:06 +03:00
enum added enum tests 2016-08-23 12:13:40 +02:00
enum-pk testing 5.7 and JSON 2017-07-26 12:29:32 +03:00
enum-to-varchar Enum to varchar (#963) 2021-06-10 17:17:49 +02:00
fail-drop-pk supporting customized 'order by' in tests 2017-01-10 12:35:10 +02:00
fail-fk refined failure tests 2016-10-14 09:34:27 +02:00
fail-fk-parent refined failure tests 2016-10-14 09:34:27 +02:00
fail-float-unique-key tests belonged in another branch 2017-09-05 06:38:55 +03:00
fail-no-shared-uk adding PK, UK, PK-to-UK conversion tests 2017-01-10 12:35:42 +02:00
fail-no-unique-key Validating shared key column types 2017-09-03 09:57:24 +03:00
fail-password-length added test 2017-09-05 06:56:19 +03:00
fail-rename-table Rejecting RENAME TO|AS 2018-05-06 11:19:03 +03:00
fail-update-pk-column supporting update to columns of migration key 2017-11-20 08:17:20 +02:00
gbk-charset more elaborate test 2018-03-08 07:25:52 +02:00
generated-columns-add57 test: add generated column 2018-05-22 14:09:48 +03:00
generated-columns-rename57 added generated column rename test 2018-05-22 12:55:57 +03:00
generated-columns57 Support for GENERATED (aka virtual) columns 2018-05-22 12:36:52 +03:00
generated-columns57-unique Generated column as part of UNIQUE (or PRIMARY) KEY (#919) 2021-05-24 20:16:49 +02:00
geometry57 Added spatial (GEOMETRY, POINT) tests 2018-03-05 09:06:58 +02:00
json57 support for ignored versions 2018-02-26 15:22:50 +02:00
json57dml Support for GENERATED (aka virtual) columns 2018-05-22 12:36:52 +03:00
keyword-column Adding keywork-column tests 2018-01-11 09:59:53 +02:00
latin1 more DML tests for latin1 2016-10-26 20:03:09 +02:00
latin1text latin1 tests with TEXT columns 2017-07-20 17:05:45 +03:00
latin1text-to-utf8mb4 copy and update text using convert when charset changes 2021-07-14 09:20:24 -04:00
mixed-charset added mixed-charset tests 2016-09-08 09:27:18 +02:00
modify-change-case handling column name case change 2017-07-12 11:59:39 +03:00
modify-change-case-pk added test expecting failure in case-change to shared key column name 2017-07-12 12:41:10 +03:00
rename improved rename:DELETE test 2016-08-22 16:35:21 +02:00
rename-inserts-only fixed integer random values 2016-09-27 12:56:37 +02:00
rename-none-column added tests to verify no false positives rename-column found 2016-11-29 11:08:35 +01:00
rename-none-comment added tests to verify no false positives rename-column found 2016-11-29 11:08:35 +01:00
rename-reorder-column added test for renaming and reordering same column 2016-09-21 09:45:36 +02:00
rename-reorder-columns added rename & reorder test 2016-09-21 09:01:33 +02:00
reorder-columns added tests for column-reorder 2016-09-20 16:00:07 +02:00
spatial57 Added spatial (GEOMETRY, POINT) tests 2018-03-05 09:06:58 +02:00
swap-pk-uk 5.5 excluded tests 2018-02-26 18:35:55 +02:00
swap-uk adding PK, UK, PK-to-UK conversion tests 2017-01-10 12:35:42 +02:00
swap-uk-uk not null for unique key test 2020-02-05 10:12:29 +02:00
timestamp 5.5 excluded tests 2018-02-26 18:29:45 +02:00
timestamp-datetime added/fixed tests 2016-10-13 13:13:51 +02:00
timestamp-to-datetime 5.5 excluded tests 2018-02-26 18:29:45 +02:00
trivial added 'trivial' test 2016-12-05 13:41:31 +01:00
tz 5.5 excluded tests 2018-02-26 18:24:42 +02:00
tz-datetime refined tests: tz-datetime only tests datetime issues; tz-datetime-ts converts datetime to timestamp 2016-09-28 10:28:21 +02:00
tz-datetime-ts 5.5 excluded tests 2018-02-26 18:24:42 +02:00
unsigned remove redundant file 2016-08-22 11:14:06 +02:00
unsigned-modify fixing mediumint unsigned problem 2017-01-10 09:57:42 +02:00
unsigned-rename columns not null in test 2016-09-27 12:45:22 +02:00
unsigned-reorder columns not null in test 2016-09-27 12:45:22 +02:00
utf8 text tests 2016-09-06 09:38:41 +02:00
utf8mb4 added utf8mb4 test 2016-09-06 12:15:53 +02:00
varbinary Adding binary/varbinary tests 2018-10-29 10:09:15 +02:00
test.sh Add go mod (#935) 2021-06-24 20:19:37 +02:00