Skip to content

Conversation

@bschoenmaeckers
Copy link
Member

@bschoenmaeckers bschoenmaeckers commented Sep 17, 2025

While testing OsStr conversions with ill-formed UTF-16 strings on windows, I found out that the current implementation is wrong.

When wstr is NULL, instead return the size that would be required to store all of unicode including a terminating null.

Note that the resulting wchar_t* string may or may not be null-terminated. It is the responsibility of the caller to make sure that the wchar_t* string is null-terminated in case this is required by the application. Also, note that the wchar_t* string might contain null characters, which would cause the string to be truncated when used with most C functions.

PyUnicode_AsWideChar returns the size including the null byte, but does not copy the null byte. This fails the assert in this function. Furthermore we want the byte slice without the null byte.

ref #5368

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to assess if this is better to class as a "fix" or a breaking change. At the moment I guess the result is that we always include a trailing NUL on the extracted string, only on Windows?

So this is a good consistency fix, to match the other platforms, but it also might cause issues in the worst case? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current case is that it can only succeed when the string is valid utf8. Because it will take the shortcut using utf8 conversion. When the string is WTF16 it will continue and use PyUnicode_AsWideChar. But this currently never succeeds because the assertion assert_eq!(bytes_read, size); always fails (bytes_read does not contain the null byte but size does). So this is not a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess no one is hitting this assert because valid utf8 is the common case.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

@davidhewitt davidhewitt added this pull request to the merge queue Sep 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 18, 2025
@bschoenmaeckers bschoenmaeckers added this pull request to the merge queue Sep 18, 2025
@bschoenmaeckers bschoenmaeckers removed this pull request from the merge queue due to a manual request Sep 18, 2025
@davidhewitt davidhewitt added this pull request to the merge queue Sep 19, 2025
Merged via the queue into PyO3:main with commit 5e10ea1 Sep 19, 2025
45 checks passed
@bschoenmaeckers bschoenmaeckers deleted the os-str-conversions branch September 20, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants