Skip to content

Conversation

srutzky
Copy link
Contributor

@srutzky srutzky commented Jan 28, 2021

This fixes #417 .

Add upper-case "N" to quoted string parameter. Without this (and without the database in which Project Nami is installed having a UTF-8 default collation, and most do not), then Unicode characters get converted to the 8-bit code page associated with the default collation of the database in which Project Nami is installed.

In both approaches, UTF-8 mode does need to be enabled for the SQL Server driver / connection.

This change (adding a single character) will fix Project Nami running on versions of SQL Server prior to 2019 (which introduced the _UTF8 collations).


While I've not tested this change myself, minimal testing has been done as noted in the following forum thread (starting at the linked post):

https://www.sqlservercentral.com/forums/topic/can-post-please-be-stored-in-nvarchar#post-3834511

Still, it would probably be a good idea to do more extensive testing, just to be sure given the nature of this change.


Take care,
Solomon...
https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/

Add upper-case "N" to quoted string parameter. Without this (and without the database in which Project Nami is installed having a UTF-8 default collation, and most do not), then Unicode characters get converted to the 8-bit code page associated with the default collation of the database in which Project Nami is installed.

In both approaches, UTF-8 mode does need to be enabled for the SQL Server driver / connection.
@srutzky
Copy link
Contributor Author

srutzky commented Jan 28, 2021

I just checked the schema and can't find any columns of the following datatypes:

  • DATE
  • TIME
  • DATETIMEOFFSET
  • UNIQUEIDENTIFIER
  • SQL_VARIANT
  • DATETIME (all "timestamps" are stored as DATETIME2)

There are only 11 date columns (technically 14, but 3 of them have a duplicate in GMT), and some appear to be non-updatable (e.g. "registered_date").

So that reduces the number of potential issues.


Looking at the schema I found other areas that should be tested.

Some dates that might not have been tested:

  1. Email User signing up / registering (signups table)
  2. User activating their registration (signups table)
  3. WordPress follower registration (registration_log table, I think)
  4. Updating a user (users table)
  5. Creating / Updating a link (links table)
  6. Updating blog metadata / appearance / layout / etc (blog_versions table)
  7. Creating / updating blogs on a multi-site installation (blogs table)

Some string columns that might not have been tested:

  1. Terms (are these Categories? they have a "slug" and a taxonomy that includes a "description") (terms and term_taxonomy tables)
  2. Comments: "author" (comments table)
  3. Links: "name", "description", and "notes" (links table)
  4. Options? "name" (options table)
  5. Posts: "name" (different from "title"), "excerpt", and "password" (posts table)
  6. Users: "login", "password", "nicename", and "display_name" (users table)
  7. Signups (WordPress users, I think): "title" (their blog title, I think) (signups table)

@patrickebates
Copy link
Member

Terms are both Categories and Tags.
Hopefully I'll have time this weekend to dig into this myself.
Thanks for the research on this.

@patrickebates
Copy link
Member

patrickebates commented Jan 28, 2021

Regarding the consistent use of DATETIME2, it was the closest match I could find to the expected behavior of MySQL dates. Even then, there was still the problem of the normal default date in WP being invalid to SQL Server.

@srutzky
Copy link
Contributor Author

srutzky commented Jan 28, 2021

@patrickebates Yer welcome 😺. And sounds good re: testing / DATETIME2 / etc.

To have everything in one place, here is the testing that was discussed (and Thom A. did and found to be successful) in that SqlServerCentral.com/forums thread (all non-Date tests assume using one or more supplementary characters):

  1. Create / Update Post / Page. Check for:
    1. Title
    2. Content
    3. Description
    4. Post Slug
  2. After post is created, change post date to a future date (or a different future date).
  3. Create / Update Tags
  4. Create / Update Categories

It was also suggested that we test filenames for media (are those stored in the links table?). For example, you can have (or at least upload from):

  • C:\some\path\👾.png
  • C:\some\path\test_𒍅𒍖.xlsx

Finally, there are some areas that I didn't include in the previous comments list of tables/columns:

  • Post / Comment / User meta — names or values
  • Email addresses
  • URLs

If everything else works, I don't see how these wouldn't. And it would be rare (though not impossible) to even have non-standard ASCII characters in those columns.


What about plugins? In that RegEx issue I worked on 2 months ago you stated that there are plugins that use their own SQL directly instead of the WP query objects. I just did a quick check of translations.php and don't see a generic "prepare" handling like this PR deals with, but I did find some areas of concern (not an exhaustive list):

@patrickebates
Copy link
Member

Media is actually stored as posts. I believe the Links table was abandoned in Core shortly after we started work on PN. The functionality was moved to a plugin and the table left in place.

@patrickebates
Copy link
Member

There's a function we added to resolve some issues with writing options that bypasses Prepare. What are your thoughts on changes it might need?

$result = $wpdb->query_with_params( "IF NOT EXISTS (SELECT * FROM [$wpdb->options] with (nolock) WHERE [option_name] = ?) INSERT INTO [$wpdb->options] ([option_name], [option_value], [autoload]) VALUES (?, ?, ?) else UPDATE [$wpdb->options] set [option_value] = ?, [autoload] = ? where [option_name] = ?", array( array($option, SQLSRV_PARAM_IN), array($option, SQLSRV_PARAM_IN), array($serialized_value, SQLSRV_PARAM_IN), array($autoload, SQLSRV_PARAM_IN), array($serialized_value, SQLSRV_PARAM_IN), array($autoload, SQLSRV_PARAM_IN), array($option, SQLSRV_PARAM_IN) ) );

public function query_with_params( $query, $params = array() ) {

@srutzky
Copy link
Contributor Author

srutzky commented Feb 3, 2021

Hi @patrickebates

Media is actually stored as posts.

So, "comments", which are structurally quite similar to posts/pages have their own table, yet "media" which appears to be not nearly as similar to posts/pages are stored in the "posts" table? Sometimes I wonder if the folks who originally designed WordPress made an agreement with hardware vendors to get a kick-back on sales of CPU, RAM, and storage for systems running WordPress ;-). I mean, there are some parts of that data model that are sheer insanity.

There's a function we added to resolve some issues with writing options that bypasses Prepare. What are your thoughts on changes it might need?

As far as I can see, there don't appear to be any issues in either of those locations due to them using parameterized queries instead of string literals. As long as the parameters are declared as being NVARCHAR / Unicode string, then nothing more to do there.

@patrickebates
Copy link
Member

The data model is beyond insanity. After getting this deep in Core, I think Spencer and I are qualified to declare that. I would go so far as to say that Spencer is one of only a handful of people currently with Automattic who knows just how twisted it is...

The parameterized query is a bit tricky as well. SqlSrv operates strictly at the ODBC level. There's no way to declare type on the parameters that I'm aware of. Can't name parameters either, and therefore can't pass them out of order or reuse a single parameter in multiple places in the query. Have to pass it multiple times in the correct position.

@LarnuUK
Copy link

LarnuUK commented Mar 2, 2021

There's a function we added to resolve some issues with writing options that bypasses Prepare. What are your thoughts on changes it might need?

$result = $wpdb->query_with_params( "IF NOT EXISTS (SELECT * FROM [$wpdb->options] with (nolock) WHERE [option_name] = ?) INSERT INTO [$wpdb->options] ([option_name], [option_value], [autoload]) VALUES (?, ?, ?) else UPDATE [$wpdb->options] set [option_value] = ?, [autoload] = ? where [option_name] = ?", array( array($option, SQLSRV_PARAM_IN), array($option, SQLSRV_PARAM_IN), array($serialized_value, SQLSRV_PARAM_IN), array($autoload, SQLSRV_PARAM_IN), array($serialized_value, SQLSRV_PARAM_IN), array($autoload, SQLSRV_PARAM_IN), array($option, SQLSRV_PARAM_IN) ) );

public function query_with_params( $query, $params = array() ) {

Hmm, looking at that query, that looks like a MERGE antipattern. Checking if a row exists first, then inserting it if it wasn't found, and otherwise updating said row firstly results in more seeks/scans against the table, but also (and far worse), it could suffer a race condition.

This is actually made even worse by the NOLOCK hint there, as even if you did have an explicit table or row lock (which you would need to stop the race condition) that lock wouldn't stop the EXISTS query. Also, if a row was being inserted with the value in another transactions, and then the other transaction was rolled back the UPDATE statement would update no rows in this transaction (as the row no longer exists).

Bertrand writes a great article on this, but ideally that statement should probably be changed as it could easily result in (very) unexpected behaviour.

I would attempt this, but the parameter count will change, and my familiarity with PHP is very old; I haven't worked with it for about 15 years.

@patrickebates
Copy link
Member

That lock hint is actually a holdover from some work done prior to fully converting it to parameters. Might need to be changed to an UPDLOCK at this point.

@LarnuUK
Copy link

LarnuUK commented Mar 2, 2021

Using UPDLOCK would certainly stop the race conditions. Changing the ordering, to attempt an UPDATE and then only INSERT if no rows are effected would be the ideal, however, that would require changes to the calling PHP as well, as it would result in fewer parameters being passed.

@srutzky
Copy link
Contributor Author

srutzky commented Sep 3, 2021

@patrickebates Hi there. I was just checking in regarding the status of this PR. I realize that there was additional discussion from @LarnuUK re: the "MERGE antipattern" and NOLOCK vs UPDLOCK, but those two items are unrelated and could/should be dealt with separately. Is the delay a matter of waiting for resolution of those items, or a matter of testing, or something else? Just curious.

@LarnuUK Speaking of testing, I assume you have been running your installation with this change for the past 9 months? If so, have you run into any problems or potentially odd behavior? I don't have time or ability to do this testing myself so any feedback would help. Thanks!

@LarnuUK
Copy link

LarnuUK commented Sep 3, 2021

@srutzky yep, I have been on a couple of sites and have had no issues.

@patrickebates
Copy link
Member

Going to push this with the update for WP 5.8.1

@patrickebates patrickebates merged commit bc178b3 into ProjectNami:master Sep 11, 2021
@srutzky srutzky deleted the patch-2 branch November 17, 2021 03:41
@adam-czaplinski-vavatech

I think this fix was mistakenly reverted during merge 4b6b3d4 „Initial merge with WP 6.1.1” :(

@LarnuUK
Copy link

LarnuUK commented Nov 20, 2023

Looks like the file wp-includes/wp-db.php has been deprecated, @adam-czaplinski-vavatech and replaced by wp-includes/class-wpdb.php, however, the new file does have the change made from this merge code on line 1470:

$query = preg_replace( '/(?<!%)%s/', " N'%s'", $query ); // Quote the strings, avoiding escaped strings like %%s.

@adam-czaplinski-vavatech
Copy link

adam-czaplinski-vavatech commented Nov 20, 2023

Thank you, yes, you are right. But the fix does not exist in the current master nor latest versions, does it? Now I think another merge - commit 467da7e „Initial merge with WP 6.2.0” reverts this fix.

@LarnuUK
Copy link

LarnuUK commented Nov 20, 2023

Yes, the commit to merge 6.2.0 completely removes the line, @adam-czaplinski-vavatech :
image

@LarnuUK
Copy link

LarnuUK commented Nov 20, 2023

I've posted an Issue, #515, to note the regression.

@srutzky
Copy link
Contributor Author

srutzky commented Nov 20, 2023

I've posted an Issue, #515, to note the regression.

@LarnuUK and @adam-czaplinski-vavatech : I am looking into this and should be able to submit a pull request in just a moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unicode (NVARCHAR) data converted to 8-bit code page (VARCHAR) as it is stored

4 participants