-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-ost aborts on trying to UPDATE, errors on invalid character #290
Comments
I attempted to gh-ost ALTER a one row table with \xE1 already in the original table. This finished fine. So this sort of character is properly handled by gh-ost when it is INSERTing rows as part of table to table copy. But when the character comes from a binlog UPDATE it is not handled properly. So it may be possible to construct a simple testcase over a small table when a problematic character is fed through the binlog. I need to learn how stop gh-ost at exactly the right moment. There will likely be no time for interactive command if the table is small. |
@mizioumt thank you. This is strange, because this bug was fixed in
|
please share |
See #291 |
the table description is too long and I can't guess what stuff is more essential than whichever other stuff.
and gh-ost was invoked as follows:
|
Not saying your alter isn't valid, but OMG. I'm curious whether this is a production DDL you must perform, or is this a random collection of modifications, or maybe this is the accumulation of migration requests over months, or...? It is very difficult to debug with such complexity. Again, this is valid and on my side I should try and find the problem. But in your side, you're experimenting with gh-ost, don't you feel it would be easier to try and isolate those migrations? Throwing out everything at once turns this hunt into needle in a haystack. |
the table structure is real but the ALTER is not. I'm just testing gh-ost. |
Is that the same setup where you don't have |
no, on this one I always had log slave updates. In particular, when I browsed through the binlogs for the last reoccurrence I saw record for the offending
|
replicationwise this is a sibling of the host from #287 however on this one MySQL version is lower
I'm trying to ensure gh-ost works on 5.6 and 5.7, also in different TZ |
I believe we've just run into the same issue while trying to run a migration on a large latin1-encoded table that contains UTF8 characters. The plan has been to use gh-ost to run migrations which will convert our tables so that they are stored and labeled correctly as UTF8. However, until we reach that state, we have many rows containing UTF8 data stored in latin1-labeled columns. That data is perfectly readable over a latin1 connection, but would become incorrectly re-converted under a UTF8 connection, rendering garbage characters. The problem we're facing is that during a migration run ("MODIFY mycol TINYINT(1) DEFAULT '0'"), gh-ost reached one of those rows containing UTF8 data, and we observed the following:
The garbage text above is how it appears rendered under a UTF8 connection; under latin1 is it: The hex value is: After that error, gh-ost kept repeating the following:
Eventually, I had to kill the migration. Here is the command I used:
It's also worth noting that the database and server default charset is utf8, but all apps connect to it as latin1 at this time. |
@jeisen thank you for the detailed comment. I'm going to try and reproduce this; however I'm not too certain I understand the meaning of having |
To give a little more context, the database is entirely latin1 for historical reasons that no longer apply. The application connects to the DB using a latin1 connection, but receives UTF8 data as user input. The app has no idea the data is in a different character set, and just sees it as a sequence of bytes, so it writes those bytes out to the DB in "latin1", because MySQL doesn't check to see if the characters are valid when in latin1. So the HEX() value of this data in latin1 would be identical to if it were written over a UTF8 connection to a UTF8-labeled column.
The main problem is that if you were to connect over a UTF8 connection, MySQL will attempt to convert the "latin1" data. Instead of returning the correct string of bytes, it would send a series of garbage characters that are the result of running them through the UTF8 conversion.
You can see the severity of the problem by looking at the HEX() values before and after running it through CONVERT():
Worse, if you try to directly run an ALTER changing the charset of the column, it would automatically run that CONVERT() step, permanently corrupting the data. The solution is that we need to force MySQL to forget its charset label by temporarily changing the VARCHAR into a VARBINARY. Once that's done, the app can start connecting via a UTF8 connection and the data will still be in the correct encoding. Then, we can do a second migration to change the VARBINARY back into a VARCHAR, this time with the right label, and we're good moving forward. |
Note to self: We will likely want to add a note about the outcome of this to the docs (whether double encoding is a caveat, or requires a special flag, or if there's a feature to fix it). I'm not sure how common double encoding still is, but it certainly was a problem years ago. |
@mizioumt can you confirm your case is similar to @jeisen's? i.e. you have w/r |
I have the code to be able to write back the Here's the problem: I cannot identify those cases beforehand. Even if the user submits some flag, there may still be different columns in the migrated table, which behave differently. So "fix" to one would break another. I would certainly not want the user to specify per-column configs such as By the time I read the column value it has already been converted to Unsure how this should be tackled. |
I think my problem is kind of independent from problem reported by @jeisen. My problem is that gh-ost refuses to handle at least one character that does belong to latin1:
based on http://dev.mysql.com/doc/refman/5.7/en/charset-we-sets.html and http://www.cp1252.com/ So I simply want this character to stay the same. The UTF considerations I think are beyond the scope of gh-ost I think. If the column is latin1 and we do not ask gh-ost to change the charset on it, then it should stay binary identical to the original. If we do change the charset and/or how to deal with a situation that the latin1 looking characters are part of UTF sequences then I could express an opinion. However in my case the column stays latin1 so UTF does not really play a role. |
@shlomi-noach If I were to strip away any unicode beforehand and leave only "pure" latin1 in the VARCHARs, would you consider gh-ost still safe to use on an ALTER table CHARACTER SET utf8 COLLATE utf8_unicode_ci (when the table starts as latin1)? |
Yes, this conversion is supported and tested. I strongly urge you to |
Unless I get good insight on how to reliably detect If anyone else want to pick this up, there's initial code to check/support this scenario, in the form of "not converting when reading and not converting when writing", which breaks "normal" |
reproduces with 1.0.28, will try to find time to create a simple testcase |
looks like to get a simple testcase is not as easy as I thought... |
@mizioumt let's take a step back, then. The particular character you mention, Do you know what the character set is for the connection that generates the Answering a previous question of yours:
As per a previous comment I made, Lastly, can you confirm once again table definition on master & replica is identical? Perhaps a character set is changed between the two? |
|
@shlomi-noach I've just got it to work. Now my column have mixed results. What was inserted by the copy routine is in plain text (charset working fine) but what was inserted by the UPDATE/REPLACE/INSERTis in binary format and is only readable if I convert using latin1. From what I see the REPLACE and UPDATE commands need simply to have a |
Last suggestion, and I promise to stop bugging until you find time to answer. I am no pro, but can we have the same for charset customization? Letting me control if all columns with type This would definitely fix my case and potentially the other's too. |
@MrMauricioLeite To clarify, you're suggesting a I think this would be doable. There was a past attempt at doing I need a precise test case though. Is #455 a good test case? Can you please suggest another? |
Well the issue here happens when the INSERT prepared statement receive arguments like: As you may notice it's sending the binary representation of a text value (maybe with special chars), not the value in plain text. Can you reproduce INSERT or UPDATE statements doing that? Any test that gives you that is the perfect test case. All you will need is a table using latin1 charset and a statement like this one and the table will refuse to insert this value. I can try to build a test case for you next week. To solve my problem here I compiled my own gh-ost code that forces any column with name comment (my problematic column) to use About your question: Exactly, I am suggesting a |
Can you please submit a pull request so I can see the change? |
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
PR for this at openark#27 |
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
this test assumes a latin1-encoded table with content containing bytes in the \x80-\xFF, which are invalid single-byte characters in utf8 and cannot be inserted in the altered table when the column containing these characters is changed to utf8(mb4). since these characters cannot be inserted, gh-ost fails.
addresses github#290 Note: there is currently no issue backfilling the ghost table when the characterset changes, likely because it's a insert-into-select-from and it all occurs within mysql. However, when applying DML events (UPDATE, DELETE, etc) the values are sprintf'd into a prepared statement and due to the possibility of migrating text column data containing invalid characters in the destination charset, a conversion step is often necessary. For example, when migrating a table/column from latin1 to utf8mb4, the latin1 column may contain characters that are invalid single-byte utf8 characters. Characters in the \x80-\xFF range are most common. When written to utf8mb4 column without conversion, they fail as they do not exist in the utf8 codepage. Converting these texts/characters to the destination charset using convert(? using {charset}) will convert appropriately and the update/replace will succeed. I only point out the "Note:" above because there are two tests added for this: latin1text-to-utf8mb4 and latin1text-to-ut8mb4-insert The former is a test that fails prior to this commit. The latter is a test that succeeds prior to this comment. Both are affected by the code in this commit. convert text to original charset, then destination converting text first to the original charset and then to the destination charset produces the most consistent results, as inserting the binary into a utf8-charset column may encounter an error if there is no prior context of latin1 encoding. mysql> select hex(convert(char(189) using utf8mb4)); +---------------------------------------+ | hex(convert(char(189) using utf8mb4)) | +---------------------------------------+ | | +---------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select hex(convert(convert(char(189) using latin1) using utf8mb4)); +-------------------------------------------------------------+ | hex(convert(convert(char(189) using latin1) using utf8mb4)) | +-------------------------------------------------------------+ | C2BD | +-------------------------------------------------------------+ 1 row in set (0.00 sec) as seen in this failure on 5.5.62 Error 1300: Invalid utf8mb4 character string: 'BD'; query= replace /* gh-ost `test`.`_gh_ost_test_gho` */ into `test`.`_gh_ost_test_gho` (`id`, `t`) values (?, convert(? using utf8mb4))
v1.0.21
gh-ost with --test-on-replica failed like this:
column_name
is a latin1 VARCHAR and the value coming in the UPDATE statement contained this problematic character:á
So this may be related to the safe SQL mode settings.
The binlogs showed this:
id
is INSERTed into the ghost table. At the moment, the data contains no \xE1 character.The text was updated successfully, but these errors were encountered: