-
Notifications
You must be signed in to change notification settings - Fork 821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ORM] Emojis and utf8mb4 #8062
Comments
This would become a high impact issue for us if it affects 4 byte Chinese characters for example, anecdotally that sounds to be the case. |
We should also mention this both in the fluent README and our i18n docs. It's not fully related to i18n of course (emojis can be used in the "base language"), but it's the most common place people will look for this info. |
Wholeheartedly agree with this, and there's a lot of evidence to support this [1, 2, 3]. I guess this is a breaking change so would need to be made in edit: I think from MySQL 5.7 onwards it no longer complains about the 768b limit to string lengths as the default row and table format changed to dynamic/barracuda - but I haven't researched that extensively, and we'll need to support older versions of MySQL anyway. edit2: [4] is interesting, showing that the original purpose of [1] https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434 |
Some clarity around this:
|
This will switch for new projects without breaking APIs in upgrades. Fixes silverstripe/silverstripe-framework#8062
MySQLiConnector doesn't always properly switch connection_collation (for more details see #9160). As such, we cannot recommend using
|
I encountered this issue again last night, when I made a long edit on a blog post including an emoticon near the start... so when I saved it, all my changes were lost because it truncated before the emoticon. If this is complicated to solve, can we add a graceful degradation in such a way that it removes any 4-byte characters before feeding it to the database, if it sees that a non-mb4 character set is used for that field? If could go something like this: onsave:
if (has_4_byte_character($value)) {
$charset = get_db_field_charset($field);
if ($charset.max_bytes() < 4) {
// see https://stackoverflow.com/questions/8491431/how-to-replace-remove-4-byte-characters-from-a-utf-8-string-in-php
$value = preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);
}
} And then once this is in place, we can implement the better default for new projects, and start worrying about how to migrate existing applications. |
I believe a switch from AFAIK |
@dnsl48, I would be perfectly happy with I guess we could
This way, it's backwards compatible with everything, and solves or improves the original issue in all situations. |
@JorisDebonnet Do you want to help with 1 and 3 through pull requests to https://github.com/silverstripe/silverstripe-installer/tree/4/app/_config and docs? |
@JorisDebonnet Keen to help reduce the chance that other devs have to go through this, and send a pull request? :) We should also add this to https://github.com/silverstripe/cwp-installer |
is it possible to change charset for just one Table / DataObject? |
MySQL by default allows a maximum of 3-bytes per UTF8 character, meaning that emoji don't work. If you try and insert a a string with an emoji, it will be truncated just before the first emoji. Presumably this would occur with any UTF character requiring 4 bytes in UTF8, but emojis are where I have come across this.
On a project basis you can enable this pretty easily:
However, given the increasing prevalence of emojis in everyday application usage, I think that this is a confusing pitfall, and we should enable this as a default setting.
Caveats
There's a caveat - since the maximum number of bytes in a character is now 4 bytes, rather than 3, checks for the maximum length of a row / index / etc will sometimes blow out. For example, a string based index will have a maximum of 192 characters (4 x 192 = 768) rather than 256 (3 x 256 = 768). This will sometimes lead to errors when you run dev/build after making this change. These errors also seem to depend on MySQL version.
However, if this is in place from the beginning of a project, that's not just a big deal, but it is a pain if the error is triggered during a minor-release upgrade.
Recommended release
So I would recommend:
4
so that new projects have this appliedmaster
.The text was updated successfully, but these errors were encountered: