-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong formatting of string arguments containing umlauts #2888
Comments
I think I made some progress. I somehow got the idea that both UTF-8 and ISO-8859-15 encode the letter 'ö' as 0xF6. But this is wrong. UTF-8: 0xC3 0xB6 F6 translates to 11110110 which starts with four 1's. In UTF-8 this means, the character is encoded in 4 bytes (for the UTF-8 I never worked deeply with encoding so I'm not sure if this is really correct, but from what I see it makes sense. Can somebody confirm that? |
If my above comment is correct, this would mean that libfmt has UTF-8 specialized functions that can not work with other encodings. Since there are other popular UTF-8 encodings in use, I think this will cause a lot of issues for many users of libfmt. |
Fixed in 358f5a7, thanks for reporting. Now |
Thanks for the fast fix! |
repro: https://godbolt.org/z/T5PWoEhdW
Issue
We switched from libfmt 6 to 8 on our RedHat Linux development environments and we noticed a change in the behavior of formatting. When using precision formatting on strings (e.g. %-5.5s or {0:<5.5}, see repro for test case) fmt adds spaces at the end of the output (the more umlauts, the more spaces).
Details
I try to debug in the library where the problem comes from. It looks like the extra spaces originate in the function 'code_point_length' in core.h. The umlaut causes this function to return 4. If I hard code the result to 1 I have the expected result.
This happens on our RedHat Linux Servers with LC_CTYPE=en_US.iso885915
I try to continue but I have to admit, that I struggle a bit on this one.
The text was updated successfully, but these errors were encountered: