Skip to content

Support utf8mb4 in com_localise component#286

Closed
andrepereiradasilva wants to merge 28 commits intojoomla-projects:developfrom
andrepereiradasilva:utf8mb4
Closed

Support utf8mb4 in com_localise component#286
andrepereiradasilva wants to merge 28 commits intojoomla-projects:developfrom
andrepereiradasilva:utf8mb4

Conversation

@andrepereiradasilva
Copy link
Contributor

Description

This PR is for adding utf8mb4 support to com_localise.

You will need two packages for the tests:

This tests need to be made in a mysql db server with (5.5.3+) AND without (lower than 5.5.3) utf8mb4 support.

How to test

Note: For 3.5.0 tests (WITH or WITHOUT utf8mb4 support) first apply richard patch in joomla latest staging (joomla/joomla-cms#9468).

Database server WITH utf8mb4 support (Mysql 5.5.3+)

Joomla 3.5.0
New install
  1. Install the patched com_localise utf8mb4 package [package 2]
  2. Check in phpmyadmin if #__localise and #__localise_revised_values tables collations are now utf8mb4_unicode_ci
Update
  1. Install 4.0.14-dev official package [package 1]
  2. Check that there is only one table #__localise in utf8_general_ci
  3. Update com_localise with the patched com_localise utf8mb4 package [package 2]
  4. Check in phpmyadmin if #__localise and #__localise_revised_values tables collations are now utf8mb4_unicode_ci
Joomla 3.4.8

Do exactly the same tests as for Joomla 3.5.0, the only difference will be the results in the Joomla 3.5.0 install and update tests: the result table collations will be utf8_general_ci, instead of utf8mb4_unicode_ci.

Database server WITHOUT utf8mb4 support (lower than Mysql 5.5.3)

Do exactly the same tests done in Database server WITH utf8mb4 support (Mysql 5.5.3+), the only difference will be the results in the Joomla 3.5.0 install and update tests: the result table collations will be utf8_general_ci, instead of utf8mb4_unicode_ci.

Observations

This test removes also duplicated sql instructions that are already in another PR, but for simplicity they were inserted in this PR too.

@andrepereiradasilva
Copy link
Contributor Author

oh a new PR has merged right now by @infograf768 with new sql tables ... talk about bad luck!

Will have to do a revision to the PR.

@richard67
Copy link

Well, what I saw looked good to me, same way as the core does it. The DEFAULT='' aded to one of the columns I am not sure if that is necessary, but should not do any harm.

@andrepereiradasilva
Copy link
Contributor Author

Well, what I saw looked good to me, same way as the core does it. The DEFAULT='' aded to one of the columns I am not sure if that is necessary, but should not do any harm.

Nice. so i will just have to do the same process to the other tables merged in a PR an hour ago ...

@richard67
Copy link

Yes.

@richard67
Copy link

@andrepereiradasilva But one thing I should mention: The conversion script has to be run not only when updating to this particular version where utf8mb4 was introduced, it also has to be run when someone in future updates from a version before the one when utf8mb4 was introduced to a version after the one, means as long as you still expect old installations to be converted on update, you have to keep the conversion script in the manifest and run it, and it has to be adapted in future maybe, so it always fits to the data structures of the particular new version.

@andrepereiradasilva
Copy link
Contributor Author

@richard67
Copy link

Sure, I just say it has to stay like this also in future version of this component.

The conversion then runs after each update, not doing any harm if already done with last update, but this then covers also the conversion if somone migrated from old utf8 to new utf8mb4 database. Those will be converted to utf8mb4 then as soon as they update this component to a new version, and for the meantime they will still run utf8_unicode_ci (which should not be a problem).

@andrepereiradasilva
Copy link
Contributor Author

yes i think so.

@richard67
Copy link

The changes in this PR here after last commits still look good to me. So theoretically all is covered.

@andrepereiradasilva
Copy link
Contributor Author

so is the same process that is needed for weblinks too, right?

@infograf768
Copy link
Contributor

Folks, I have already prepared a PR, but as it works fine as it is, both on staging and 3.4.8, I think it is not urgent to change.

BTW, this is what it should be as otherwise it breaks (look at path:

CREATE TABLE IF NOT EXISTS `#__localise` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `asset_id` int(11) NOT NULL,
  `path` varchar(400) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
  `checked_out` int(10) unsigned NOT NULL DEFAULT '0',
  `checked_out_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `idx_path` (`path`(100))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 DEFAULT COLLATE=utf8mb4_unicode_ci;

CREATE TABLE IF NOT EXISTS `#__localise_revised_values` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `client` varchar(255) NOT NULL,
  `reftag` varchar(6) NOT NULL,
  `tag` varchar(6) NOT NULL,
  `filename` varchar(255) NOT NULL,
  `revised` tinyint(1) NOT NULL DEFAULT 0,
  `key` varchar(255) NOT NULL,
  `target_text` mediumtext NOT NULL,
  `source_text` mediumtext NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 DEFAULT COLLATE=utf8mb4_unicode_ci;

@richard67
Copy link

Yes, if they wanna convert + migrate data in this way.

Another was which some more complicated extensions seem to go is to save old data, create new tables and then import back old data.

@richard67
Copy link

@infograf768 Correct, utf8mb4_bin if path is to be used in join queries or as foreign key.

@andrepereiradasilva
Copy link
Contributor Author

So @infograf768 shall i make that changes (on create and on update) or remove the PR?

@infograf768
Copy link
Contributor

Please do and let's talk on Glip tomorrow. :)

@andrepereiradasilva
Copy link
Contributor Author

ok

@andrepereiradasilva
Copy link
Contributor Author

Conflicts fixed, All seems ok now,

@infograf768
Copy link
Contributor

@richard67 @andrepereiradasilva
A question: if we merge this, will com_localise install and work OK on 3.4.8 and below?

@andrepereiradasilva
Copy link
Contributor Author

so, for instance, if your solution was already working com_localise could work in all 3.x version with all supported mysql db versions with the xml posted in above comment and the necessary sql files.

IMHO it would be a good improvement.

updated the xml

@richard67
Copy link

@andrepereiradasilva PR #9468 for the core is ready for code review, testing insructions will follow. If you want you can have a look on it and maybe even test with your test package for com_localise.

also correct 4.0.4 filename (a space existed inthe beggining of the
filename)
@andrepereiradasilva
Copy link
Contributor Author

updated test instructions

@andrepereiradasilva
Copy link
Contributor Author

@infograf768 with @richard67 joomla-cms new PR (joomla/joomla-cms#9468) and the changes i made here this changes will work on 3.3+ versions and also on non utf8mb4 capable servers, so they there is no need to wait for 4.0 after.

@richard67
Copy link

I tested this item ✅ successfully on fd79cb3

Tested all cases installation and then update to new version, uninstall this then, then install directly the new version, then uninstall this, with checking database after each step, for following Joomla! Version and utf8mb4 support combinations:

  • 3.4.8 without utf8mb4 support
  • 3.4.8 with utf8mb4 support
  • 3.5.0 RC 4 + my PR joomla/joomla-cms#9468 without utf8mb4 support
  • 3.5.0 RC 4 + my PR with utf8mb4 support

I checked data structures only, not if the localise component works well, because I am not familiar with the component.

Result: All as expected, i.e. on 3.5.0 RC 4 + my PR with utf8mb4 support tables are in utf8mb4, other cases tables are in utf8.

You should update your test instructions to make clear that on a 3.5.0 (+ my PR), the result is only utf8mb4_unicode_ci (and the 1 utf8mb4_bin column) when utf8mb4 is supported by the database server and client api, otherwise the result is like for the other case (3.4.8).

@infograf768 How can I mark a test result here without having a fancy issue tracker? I never marked test results with GitHub UI only so I have no idea. I helped myself by creating the above test result check mark and link to the commit manually.

@andrepereiradasilva
Copy link
Contributor Author

You should update your test instructions to make clear that on a 3.5.0 (+ my PR), the result is only utf8mb4_unicode_ci (and the 1 utf8mb4_bin column) when utf8mb4 is supported by the database server and client api, otherwise the result is like for the other case (3.4.8).

updated

@infograf768 How can I mark a test result here without having a fancy issue tracker? I never marked test results with GitHub UI only so I have no idea. I helped myself by creating the above test result check mark and link to the commit manually.

i think there is no issue tracker yet for com_localise
See https://issues.joomla.org/projects

@infograf768
Copy link
Contributor

For now, I have merged #289

When joomla/joomla-cms#9468 will be merged, we can test this again after updating it.

@infograf768
Copy link
Contributor

btw, I do not think we need 191. It is used in core for some varchar. Nowhere in core do we use 191 for a KEY. It is always 100,

@anibalsanchez
Copy link

Test OK

@richard67
Copy link

@infograf768 You do not need 191, yes, but 191 is the correct value for max. possible.

I would also have used that in core instead of the 100 if the decision was on me.

But someone had a favour for even decimal numbers and so did not like the 191 before I started to work for the utf8mb4 stuff.

Is a bit like someone not familiar with dual system says: No I want this buffer be 1000 long and not 1024 because 1024 does not look nice 😄

But for real programmers the 1024 looks perfect, and so does the 191 for people knowing about mysql and utf8mb4.

@andrepereiradasilva
Copy link
Contributor Author

First of all let me say that i'm by no mean an expert in database systems, just trying to help the project.

Why the 191 index size?

In this case (the language file path index), when i added a size of 100 i noticed some values in the index were being cropped.
As i don't know exactly the consequences of the index values being cropped in this component (search, order, etc) i resized it to the maximum allowed value in utf8mb4 to prevent issues.

I was told that the maximum index length in utf8mb4 is 767 bytes (191 * 4 bytes per character = 764), so there is the reason for the 191.

If anyone can confirm that having an index size of 100 will not cause issues, i will change it with no problems.

BTW fixed the conflicts after latest merge. Now the check for code difference is more clear.

@infograf768
Copy link
Contributor

As joomla/joomla-cms#9468 is not going to make it in 3.5.0, I am going to release a 4.0.15-dev version of com_localise without this PR.

I am just waiting for this to be tested:
#290

@richard67
Copy link

@infograf768 I agree, better wait with moving to utf8mb4 ... I think we can make the core handle conversion scripts for extensions, so extensions not have to do it with their schema updates or own php scripts (but still have to provide sql script(s) for conversion). Am almost sure I can have something ready soon, maybe for 3.5.1. What Ande does with this PR is a good proof of concept and still needed, only the conversion to utf8mb4 would then in a future solution be moved from the schemapath to a new conversionpath or so, and be executed by the installer on update or after migration when is necessary.

@andrepereiradasilva
Copy link
Contributor Author

closed as does not makes sense anymore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants