Skip to content

Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3#317

Closed
nicolas-grekas wants to merge 6 commits intodoctrine:masterfrom
nicolas-grekas:patch-1
Closed

Conditionaly upgrade utf8 to utf8mb4 for MySQL 5.5.3#317
nicolas-grekas wants to merge 6 commits intodoctrine:masterfrom
nicolas-grekas:patch-1

Conversation

@nicolas-grekas
Copy link
Member

See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".

See http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

As utf8mb4 is a superset of utf8, this should be transparent and backward compatible.
For those really requiring the "utf8" meant by MySQL, they can use explicitely the utf8mb3 charset.
But IMHO by default, Doctrine should really use utf8mb4, which is what everybody expect from a charset named "utf8".
@nicolas-grekas
Copy link
Member Author

See also http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-upgrading.html for more detailed background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align = signs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CS: if ($mb4 !== $sql) {

@beberlei
Copy link
Member

I don't think this should happen magically. Developers should do this explicitly themselves

@nicolas-grekas
Copy link
Member Author

Well, what I personally think is that Doctrine should do something about utf8mb4. What exactly is the purpose of this pull request.
My HO is that when people write (or stick to the default) "utf8", they really mean UTF-8 from Unicode.
Also, nobody expect to loose data, even high plane Unicode characters. Think http://www.fileformat.info/info/unicode/char/1f4a9/index.htm for example ;-)
Then, when people get educated that utf8===utf8mb3 and Unicode-UTF-8===utf8mb4, they can choose.
My patch is coded with this "learned path" in mind.

At least the default for Doctrine should be safe for any Unicode-UTF-8 string.
The only pb is that utf8mb4 exists since MySQL 5.5.3, and Doctrine has a lower requirement for MySQL server version.
So to deal with that, we could either use PDO::getAttribute(PDO::ATTR_SERVER_VERSION) (would be required to upgrade MySqlPlatform.php), or these conditional comment tricks that the MySQL parser allows.

@nicolas-grekas
Copy link
Member Author

So, just to be consistent, I updated my patch so that both mysqli and pdomysql drivers also upgrade to utf8mb4 when possible.

@beberlei
Copy link
Member

Sorry, but I think this is too dangerous. This is something we need to keep developers deciding on.

@beberlei beberlei closed this May 26, 2013
@gagarine
Copy link

But how can you force utf8mb4 when you create a schema?

@beberlei
Copy link
Member

@gagarine Doctrine does not create a schema, you have to do this yourself anyways. At that point you can do it.

@gagarine
Copy link

@beberlei I used \Doctrine\DBAL\Schema\Schema();
$schema->toSql()
$app['db']->exec();

The sql provided by "toSql' was using utf8 encoding and I don't see anyway to force utf8mb4 even if I create the Database by hand using utf8mb4.

I even created the table by hand. But after inserting was not working neither. Look like the connexion encoding is not right. How can I test it?

I'm very new to doctrine and perhaps is not the place to get support... any-pointer would be welcome.

@mathiasbynens
Copy link

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants