-
-
Notifications
You must be signed in to change notification settings - Fork 138
Prevent Unicode conversion to 8-bit code page in wp-db.php #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add upper-case "N" to quoted string parameter. Without this (and without the database in which Project Nami is installed having a UTF-8 default collation, and most do not), then Unicode characters get converted to the 8-bit code page associated with the default collation of the database in which Project Nami is installed. In both approaches, UTF-8 mode does need to be enabled for the SQL Server driver / connection.
I just checked the schema and can't find any columns of the following datatypes:
There are only 11 date columns (technically 14, but 3 of them have a duplicate in GMT), and some appear to be non-updatable (e.g. "registered_date"). So that reduces the number of potential issues. Looking at the schema I found other areas that should be tested. Some dates that might not have been tested:
Some string columns that might not have been tested:
|
Terms are both Categories and Tags. |
Regarding the consistent use of DATETIME2, it was the closest match I could find to the expected behavior of MySQL dates. Even then, there was still the problem of the normal default date in WP being invalid to SQL Server. |
@patrickebates Yer welcome 😺. And sounds good re: testing / DATETIME2 / etc. To have everything in one place, here is the testing that was discussed (and Thom A. did and found to be successful) in that SqlServerCentral.com/forums thread (all non-Date tests assume using one or more supplementary characters):
It was also suggested that we test filenames for media (are those stored in the
Finally, there are some areas that I didn't include in the previous comments list of tables/columns:
If everything else works, I don't see how these wouldn't. And it would be rare (though not impossible) to even have non-standard ASCII characters in those columns. What about plugins? In that RegEx issue I worked on 2 months ago you stated that there are plugins that use their own SQL directly instead of the WP query objects. I just did a quick check of translations.php and don't see a generic "prepare" handling like this PR deals with, but I did find some areas of concern (not an exhaustive list):
|
Media is actually stored as posts. I believe the Links table was abandoned in Core shortly after we started work on PN. The functionality was moved to a plugin and the table left in place. |
There's a function we added to resolve some issues with writing options that bypasses Prepare. What are your thoughts on changes it might need? projectnami/wp-includes/option.php Line 582 in 1ca8c1b
projectnami/wp-includes/wp-db.php Line 3738 in 1ca8c1b
|
So, "comments", which are structurally quite similar to posts/pages have their own table, yet "media" which appears to be not nearly as similar to posts/pages are stored in the "posts" table? Sometimes I wonder if the folks who originally designed WordPress made an agreement with hardware vendors to get a kick-back on sales of CPU, RAM, and storage for systems running WordPress ;-). I mean, there are some parts of that data model that are sheer insanity.
As far as I can see, there don't appear to be any issues in either of those locations due to them using parameterized queries instead of string literals. As long as the parameters are declared as being |
The data model is beyond insanity. After getting this deep in Core, I think Spencer and I are qualified to declare that. I would go so far as to say that Spencer is one of only a handful of people currently with Automattic who knows just how twisted it is... The parameterized query is a bit tricky as well. SqlSrv operates strictly at the ODBC level. There's no way to declare type on the parameters that I'm aware of. Can't name parameters either, and therefore can't pass them out of order or reuse a single parameter in multiple places in the query. Have to pass it multiple times in the correct position. |
Hmm, looking at that query, that looks like a This is actually made even worse by the Bertrand writes a great article on this, but ideally that statement should probably be changed as it could easily result in (very) unexpected behaviour. I would attempt this, but the parameter count will change, and my familiarity with PHP is very old; I haven't worked with it for about 15 years. |
That lock hint is actually a holdover from some work done prior to fully converting it to parameters. Might need to be changed to an UPDLOCK at this point. |
Using |
@patrickebates Hi there. I was just checking in regarding the status of this PR. I realize that there was additional discussion from @LarnuUK re: the " @LarnuUK Speaking of testing, I assume you have been running your installation with this change for the past 9 months? If so, have you run into any problems or potentially odd behavior? I don't have time or ability to do this testing myself so any feedback would help. Thanks! |
@srutzky yep, I have been on a couple of sites and have had no issues. |
Going to push this with the update for WP 5.8.1 |
I think this fix was mistakenly reverted during merge 4b6b3d4 „Initial merge with WP 6.1.1” :( |
Looks like the file $query = preg_replace( '/(?<!%)%s/', " N'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s. |
Thank you, yes, you are right. But the fix does not exist in the current master nor latest versions, does it? Now I think another merge - commit 467da7e „Initial merge with WP 6.2.0” reverts this fix. |
Yes, the commit to merge 6.2.0 completely removes the line, @adam-czaplinski-vavatech : |
I've posted an Issue, #515, to note the regression. |
@LarnuUK and @adam-czaplinski-vavatech : I am looking into this and should be able to submit a pull request in just a moment. |
This fixes #417 .
Add upper-case "N" to quoted string parameter. Without this (and without the database in which Project Nami is installed having a UTF-8 default collation, and most do not), then Unicode characters get converted to the 8-bit code page associated with the default collation of the database in which Project Nami is installed.
In both approaches, UTF-8 mode does need to be enabled for the SQL Server driver / connection.
This change (adding a single character) will fix Project Nami running on versions of SQL Server prior to 2019 (which introduced the
_UTF8
collations).While I've not tested this change myself, minimal testing has been done as noted in the following forum thread (starting at the linked post):
https://www.sqlservercentral.com/forums/topic/can-post-please-be-stored-in-nvarchar#post-3834511
Still, it would probably be a good idea to do more extensive testing, just to be sure given the nature of this change.
Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/