-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<format>: Incorrect handling of UTF-8 encoded format strings #1820
Comments
There is no way to detect the execution charset at compile time, so we assume the active codepage and execution charset is the same, this is obviously broken but it's what we do throughout the STL :( |
You can easily detect if the literal encoding is UTF-8 at compile time as follows: constexpr bool is_utf8() {
const unsigned char micro[] = "\u00B5";
return sizeof(micro) == 3 && micro[0] == 0xC2 && micro[1] == 0xB5;
} This has been suggested for P0355 and elsewhere. Shift-JIS can be detected similarly and, in fact, you might only need to detect Shift-JIS because it's the one that collides with UTF-8 and requires special handling at parsing. And of course you don't need to detect single-byte encodings which are the vast majority. |
Yeah, I was just going to paste https://godbolt.org/z/YhnnrdT6e. |
we also need to detect big5 and some of the ISO-1022 encodings. But if that detection works we can special case it, and not be broken for utf-8 |
well we don't need to detect them but we want to work for them. |
@statementreply Closing this as fixed now that your #1824 (and @CaseyCarter's #1834) are merged; it wasn't auto-closed because the Word Of Power "Fixes #NNN" works for the default branch only (as documented in GitHub docs, but which I wasn't aware of until now). |
Describe the bug
std::format
throws on formatting a valid UTF-8 string when the literal (execution) encoding is UTF-8 and the locale encoding is Shift-JIS (and possibly other cases). This is wrong because format strings are almost always literals and therefore the locale encoding shouldn't affect parsing.Command-line test case
Expected behavior
The test program should print 1.
STL version
https://github.com/microsoft/STL/commit/4d7d4f1
The text was updated successfully, but these errors were encountered: