Skip to content

Unicode (NVARCHAR) data converted to 8-bit code page (VARCHAR) as it is stored #417

@srutzky

Description

@srutzky

I'm using a site that uses Project Nami and I'm unable to save any Unicode characters in my posts. They all get converted to ?, or to a best-fit mapping. Supplementary characters such as "𒍅" (U+12345) get saved as ??. All of this behavior points to a conversion to VARCHAR somewhere along the way. The data model appears to be using NVARCHAR to store the data (which is correct), so that can't be the source of the problem. And with a character like "〛" (U+301B) getting converted to ] (i.e. a best-fit mapping), I don't think PHP would do that, even if running on Windows (though I haven't tested that specifically so I guess I can't rule it out entirely). I can HTML-encode those characters, but if the post is ever edited, the decoded character is displayed in the editor, so if I save without re-encoding all of the previously encoded characters, then they all get converted.

So, now I'm looking for code that results in something along the lines of:

INSERT INTO table (column_list) VALUES ('value_from_form', 'value_from_form', ..);

Which should be:

INSERT INTO table (column_list) VALUES (N'value_from_form', N'value_from_form', ..);

The difference is only the N prefix on strings (though dates/times/datetimes/uniqueidentifiers are single-quoted but without the N prefix).

I had found translations.php but was informed that isn't the main code path for DB interaction. However, I didn't see any other place with SQL Server-specific T-SQL. The rest all appeared to be MySQL-specific SQL.

I found a line in the main DB include file that is at least very close (conceptually) to what I'm looking for:

https://github.com/ProjectNami/projectnami/blob/master/wp-includes/wp-db.php#L1351

That line wraps string parameters in single quotes while the SQL is being prepared. But everything else around it still appears to be MySQL-specific, so I wasn't thinking that was truly it.

I did see this open issue: #303
but I'm not sure if that pertains to what I am experiencing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions