-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve std::format's width estimation #3903
Conversation
improve generator source code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should no longer cite N4928, since it didn't contain the changes in P2675R1.
The first C++26 working draft is likely to complete before landing this PR. But I think it makes sense to still cite N4950 since it the final draft of C++23, and MSVC STL is currently citing N4950 in the whole product codes.
Co-authored-by: A. Jiang <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
tools/unicode_properties_parse/format_width_estimate_intervals.cpp
Outdated
Show resolved
Hide resolved
// Copyright (c) Microsoft Corporation. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
|
||
// The following code generates data for `_Width_estimate_intervals` in <format>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tools/unicode_properties_parse/format_width_estimate_intervals.cpp
Outdated
Show resolved
Hide resolved
tools/unicode_properties_parse/format_width_estimate_intervals.cpp
Outdated
Show resolved
Hide resolved
tools/unicode_properties_parse/format_width_estimate_intervals.cpp
Outdated
Show resolved
Hide resolved
tools/unicode_properties_parse/format_width_estimate_intervals.cpp
Outdated
Show resolved
Hide resolved
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
add const
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the full list of all remaining issues I'm concerned about. Aside from these issues, should I also offer a test for all affected ranges(those in the annex of P267521) in this pr?
table_u read_from(ifstream& source) { | ||
table_u table; | ||
|
||
// "The unassigned code points in the following blocks default to "W":" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the citation, the original comment in "EastAsianWidth.txt"
for 20000
~2FFFFD
, 30000
~3FFFFD
is "All undesignated code points in Planes 2 and 3, whether inside or outside of allocated blocks, default to "W" : "
What's its difference with this sentence? Aren't they of the same meaning?
Also see #3903 (comment) and #3903 (comment)
(My idea is that we can safely keep this part, as it is not harmful, and hopefully for a long time will not become outdated.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the question here - where is this comment coming from, and why is it different from what EastAsianWidth.txt says? Can't we just quote the EastAsianWidth.txt comment verbatim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC EastAsianWidth.txt
is trying to communicate that not all of Planes 2 & 3 are allocated. In particular, the following ranges are not allocated for any block:
- 0002A6E0-0002A6FF
- 0002EE50-0002F7FF
- 0002FA20-0002FFFD
- 000323B0-0003FFFD
But I don't think we care about the relationship between planes and blocks. For our purposes, they are just "big areas" and "small areas".
This comment was marked as resolved.
This comment was marked as resolved.
add comments for `_Unicode_width_estimate`; add test cases: all 2->1 cases; and some 1->2 cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, we should create an issue to port the generator to python.
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Thanks for implementing this C++23 feature and bringing the codebase closer to completeness! ✅ 🚀 😻 |
Thanks for consistently citing N4950 in the whole product code! |
Resolves #3446
This pr introduces a generator for the new
_Width_estimate_intervals
table. The new project should includeEastAsianWidth.txt
(link) somewhere.Here is its raw output; the
"Old table:"
part can help to confirm that the interval-generating algorithm is correct; the"Was 1, now 2:" and "Was 2, now 1:"
part can help to confirm that this implementation is conformant to the standard (by comparing them with those in the annex of the paper)Output
Thanks @frederick-vs-ja for pointing out that the old
_Width_estimate_intervals
table can be overwritten directly 👀