-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use version-sorting for all sorting #115046
Conversation
Add a description of a version-sorting algorithm. (This algorithm does not precisely match `strverscmp`; it's intentionally simpler in its handling of leading zeroes, and produces a result easier for humans to easily understand and do by hand.) Change all references to sorting to use version-sorting. Change all references to "ASCIIbetically" to instead say "sort non-lowercase before lowercase".
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred in src/doc/style-guide cc @rust-lang/style |
src/doc/style-guide/src/README.md
Outdated
leading zeroes, longer sequences of digits are larger numbers, and | ||
equal-length sequences of digits can be sorted lexicographically.) | ||
- If the numbers have the same numeric value, the one with more leading zeroes | ||
comes first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a deep reason for this? I think I would have expected the one with more leading zeroes to come later, similar to how with usual lexicographic ordering the longer string (in that case with more trailing characters) comes later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roughly, consistency with numeric sorting. Suppose you have 000, 001, 002, ..., 010, 011, ..., 100, 101. We sort those in that order, not for any reason involving the leading zeroes, but because of their numeric value. But the net effect is also that numbers with three leading zeroes come first, then two leading zeroes, then one leading zero, then no leading zeroes. So it seemed consistent, to me, to put "more leading zeroes" first in this case too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I follow (roughly), with a couple inline questions for clarity.
I don't have any strong opinions one way or another about the resultant sort order, but I do think it would be worth expanding on the set of examples (if not in the guide itself, then at least as for discussion with t-style members to help everyone visualize the differences with current state)
src/doc/style-guide/src/README.md
Outdated
For the purposes of the Rust style, to compare two strings for version-sorting: | ||
|
||
- Compare the strings by (Unicode) character as normal, finding the index of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I presume correctly that the usage of the term "strings" here does not bound the following prescriptions just to literal strings? I.e. we want the same algorithm to also apply in all other sorting contexts (e.g. idents in imports)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the intention of "strings" in "compare two strings for version-sorting" here is in the sense of it being a string to the tool parsing it, not that it's a literal string in the code being parsed.
Suggestions for clearer, more unambiguous wording welcome.
@joshtriplett: Would you like to start an FCP on this, or do you want to have synchronous discussion on this? If you want sync discussion, please re-nominate this, otherwise just kick off the FCP 😺. |
cc rust-lang/style-team#143 as well |
@rfcbot merge |
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members: Concerns:
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
@rfcbot concern require all checkboxes (we require all members to provide their check due to small team size) |
@rfcbot resolve require all checkboxes |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
Should that last bit be "b" vs "a" or have I gotten confused somewhere 🤔 |
Oops, yes, sorry. The keys are ... err ... like right next to each other? Edited to fix it, thanks. |
We discussed this at length in today's @rust-lang/style meeting. For context on my original writeup of this algorithm, I have no attachment to being anywhere close to the What we discussed and agreed on in the style meeting was that we shouldn't be path-dependent and end up with " We're also mindful of the fact that this is mostly a sort for Rust identifiers (and possibly future Rust style needs like We do need to cover more corner cases in the document (it included some, but not all the ones raised here), by listing a series of identifiers and how they sort. (We don't think it's critical to document every difference between this and other version-sorting algorithms, but we can give a high-level difference like "This differs from other version-sorting algorithms, notably in the treatment of underscores and leading zeroes.".) Properties we want to achieve, as agreed in the style meeting:
I'm going to revise this proposal to support these properties. |
@ijackson wrote:
As far as I can tell, we're going to need an explicit check at the end of identical-other-than-leading-zeroes strings based on remembering the first leading-zero difference. We shouldn't just do "if they're identical other than leading zeroes, compare lexically", because AFAICT that would produce this order: I think we should do one of the following two orders, for consistency:
Of the two, I would favor the former over the latter, because other than the case of numeric substrings that are all zero but have a different number of zeroes (which seems far more unlikely), it's consistent with lexical ordering ( |
Is that because it's like hex? The problem with this is that it requires a fallback rescan, which makes the algorithm more difficult to explain. I don't think we actually care very much about the precise ordering of strings which differ only in number of leading zeroes in numerical components. So I think we could have an algorithm which doesn't have a fallback rescan, and just has a rule which treats numerical pieces with the same numerical vlaue but different lengths, as unequal right away, rather than treating them as equal now and looking at them again later if the rest of the string is equal too. But, the way it looks like hex got me thinking. Can we make it so that identifiers containing hex values come out in the right order? Identifiers with hex in aren't rare. Ideally we don't want an algorithm which guesses the base, so it falls out of the rest of the algorithm, but I'm not sure that's possible. Ie for hex I think we want: This is hard because If we had a very simple base-guessing heuristic perhaps people woudl be able to structure their identifiers to trigger it. "Previous character was |
|
No, this is nothing to do with hex; that the examples I gave started with an
The proposal I was making was "when comparing numeric components, compare by numeric value and ignore the number of leading zeroes". We don't have to do a second pass, though; if they're equal, we go on to the next group, but we remember if one or the other had less leading zeroes, and if we get to the end and the two strings happened to be equal in every group, we base the return value on that first difference.
That's a reasonable thought. As you noted, I don't think there's any way to make hex sort numerically without attempting to detect hexadecimal. And I think we're not going to be able to do that reliably. My current inclination would be to not attempt to handle hexadecimal numbers at all. |
Treat numeric chunks with equal value but differing numbers of leading zeroes as equal, unless we get to the end of the entire string in which case we use "more leading zeroes in the earliest differing chunk" as a tiebreaker. Treat `_` as a word separator, sorting it before anything other than space. Give more examples.
@yaahc I've addressed the concern; could you please review? |
@rfcbot resolve discuss edge-cases raised by ijackson |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
LGTM too now, thanks for the updates :-) |
@bors r+ |
@bors rollup |
…iaskrgr Rollup of 11 pull requests Successful merges: - rust-lang#115046 (Use version-sorting for all sorting) - rust-lang#118915 (Add some comments, add `can_define_opaque_ty` check to `try_normalize_ty_recur`) - rust-lang#119006 (Fix is_global special address handling) - rust-lang#119637 (Pass LLVM error message back to pass wrapper.) - rust-lang#119715 (Exhaustiveness: abort on type error) - rust-lang#119763 (Cleanup things in and around `Diagnostic`) - rust-lang#119788 (change function name in comments) - rust-lang#119790 (Fix all_trait* methods to return all traits available in StableMIR) - rust-lang#119803 (Silence some follow-up errors [1/x]) - rust-lang#119804 (Stabilize mutex_unpoison feature) - rust-lang#119832 (Meta: Add project const traits to triagebot config) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#115046 - joshtriplett:master, r=compiler-errors Use version-sorting for all sorting Add a description of a version-sorting algorithm. (This algorithm does not precisely match `strverscmp`; it's intentionally simpler in its handling of leading zeroes, and produces a result easier for humans to easily understand and do by hand.) Change all references to sorting to use version-sorting. Change all references to "ASCIIbetically" to instead say "sort non-lowercase before lowercase".
Pkgsrc changes: * Adapt checksums and patches. Upstream chnages: Version 1.77.0 (2024-03-21) ========================== - [Reveal opaque types within the defining body for exhaustiveness checking.] (rust-lang/rust#116821) - [Stabilize C-string literals.] (rust-lang/rust#117472) - [Stabilize THIR unsafeck.] (rust-lang/rust#117673) - [Add lint `static_mut_refs` to warn on references to mutable statics.] (rust-lang/rust#117556) - [Support async recursive calls (as long as they have indirection).] (rust-lang/rust#117703) - [Undeprecate lint `unstable_features` and make use of it in the compiler.] (rust-lang/rust#118639) - [Make inductive cycles in coherence ambiguous always.] (rust-lang/rust#118649) - [Get rid of type-driven traversal in const-eval interning] (rust-lang/rust#119044), only as a [future compatiblity lint] (rust-lang/rust#122204) for now. - [Deny braced macro invocations in let-else.] (rust-lang/rust#119062) Compiler -------- - [Include lint `soft_unstable` in future breakage reports.] (rust-lang/rust#116274) - [Make `i128` and `u128` 16-byte aligned on x86-based targets.] (rust-lang/rust#116672) - [Use `--verbose` in diagnostic output.] (rust-lang/rust#119129) - [Improve spacing between printed tokens.] (rust-lang/rust#120227) - [Merge the `unused_tuple_struct_fields` lint into `dead_code`.] (rust-lang/rust#118297) - [Error on incorrect implied bounds in well-formedness check] (rust-lang/rust#118553), with a temporary exception for Bevy. - [Fix coverage instrumentation/reports for non-ASCII source code.] (rust-lang/rust#119033) - [Fix `fn`/`const` items implied bounds and well-formedness check.] (rust-lang/rust#120019) - [Promote `riscv32{im|imafc}-unknown-none-elf` targets to tier 2.] (rust-lang/rust#118704) - Add several new tier 3 targets: - [`aarch64-unknown-illumos`] (rust-lang/rust#112936) - [`hexagon-unknown-none-elf`] (rust-lang/rust#117601) - [`riscv32imafc-esp-espidf`] (rust-lang/rust#119738) - [`riscv32im-risc0-zkvm-elf`] (rust-lang/rust#117958) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Implement `From<&[T; N]>` for `Cow<[T]>`.] (rust-lang/rust#113489) - [Remove special-case handling of `vec.split_off (0)`.](rust-lang/rust#119917) Stabilized APIs --------------- - [`array::each_ref`] (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_ref) - [`array::each_mut`] (https://doc.rust-lang.org/stable/std/primitive.array.html#method.each_mut) - [`core::net`] (https://doc.rust-lang.org/stable/core/net/index.html) - [`f32::round_ties_even`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#method.round_ties_even) - [`f64::round_ties_even`] (https://doc.rust-lang.org/stable/std/primitive.f64.html#method.round_ties_even) - [`mem::offset_of!`] (https://doc.rust-lang.org/stable/std/mem/macro.offset_of.html) - [`slice::first_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk) - [`slice::first_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.first_chunk_mut) - [`slice::split_first_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk) - [`slice::split_first_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_first_chunk_mut) - [`slice::last_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk) - [`slice::last_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.last_chunk_mut) - [`slice::split_last_chunk`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk) - [`slice::split_last_chunk_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.split_last_chunk_mut) - [`slice::chunk_by`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by) - [`slice::chunk_by_mut`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.chunk_by_mut) - [`Bound::map`] (https://doc.rust-lang.org/stable/std/ops/enum.Bound.html#method.map) - [`File::create_new`] (https://doc.rust-lang.org/stable/std/fs/struct.File.html#method.create_new) - [`Mutex::clear_poison`] (https://doc.rust-lang.org/stable/std/sync/struct.Mutex.html#method.clear_poison) - [`RwLock::clear_poison`] (https://doc.rust-lang.org/stable/std/sync/struct.RwLock.html#method.clear_poison) Cargo ----- - [Extend the build directive syntax with `cargo::`.] (rust-lang/cargo#12201) - [Stabilize metadata `id` format as `PackageIDSpec`.] (rust-lang/cargo#12914) - [Pull out as `cargo-util-schemas` as a crate.] (rust-lang/cargo#13178) - [Strip all debuginfo when debuginfo is not requested.] (rust-lang/cargo#13257) - [Inherit jobserver from env for all kinds of runners.] (rust-lang/cargo#12776) - [Deprecate rustc plugin support in cargo.] (rust-lang/cargo#13248) Rustdoc ----- - [Allows links in markdown headings.] (rust-lang/rust#117662) - [Search for tuples and unit by type with `()`.] (rust-lang/rust#118194) - [Clean up the source sidebar's hide button.] (rust-lang/rust#119066) - [Prevent JS injection from `localStorage`.] (rust-lang/rust#120250) Misc ---- - [Recommend version-sorting for all sorting in style guide.] (rust-lang/rust#115046) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Add more weirdness to `weird-exprs.rs`.] (rust-lang/rust#119028)
Add a description of a version-sorting algorithm. (This algorithm does
not precisely match
strverscmp
; it's intentionally simpler in itshandling of leading zeroes, and produces a result easier for humans to
easily understand and do by hand.)
Change all references to sorting to use version-sorting.
Change all references to "ASCIIbetically" to instead say "sort
non-lowercase before lowercase".