-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC2603] Extend <const>
to include str
and structural constants.
#3161
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Based on the RFC having been a @rfcbot merge |
Team member @eddyb has proposed to merge this. The next step is review by the rest of the tagged team members:
Concerns:
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
<const-data> = ["n"] {<hex-digit>} "_" | ||
<const-fields> = "U" // X | ||
| "T" {<const>} "E" // X(a, b, c, ...) | ||
| "S" {<identifier> <const>} "E" // X { field: value, ... } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why U
is necessary, UnitStruct
and UnitStruct {}
are the same constant, so U
can be S {} E
, the only benefit is slight compression.
I'm also not sure why identifiers for field names are necessary, fields are uniquely identifiable by their indices, so "S" {<const>} "E"
should work equally well?
In other words, "S" {<const>} "E"
appears to be usable for any structs and variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also "just" flatten the constant and only put the leaf values in, we wouldn't need any names or tuple/array distinctions and nesting.
The only reason we tell apart any of these things is being able to recover useful syntax for the users, and I somewhat assumed that to be a given, I suppose.
I'm obviously biased towards the approach I implemented, but maybe you could register a concern with rfcbot, that could get discussed by compiler team? (though I'm not sure in what kind of meeting)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say we keep the U
just in order to make it consistent: we already encode ()
as u
where we could also encode it as an empty tuple TE
.
Nominating since we're close to being able to turn |
After last weeks discussion I think the main open question is what to do with field names in struct values. As @petrochenkov points out, from the perspective of keeping symbol names unambiguous the field names are not necessary. They would only be needed for making demangler output a bit more readable. Without them you would get
instead of
As the grammar is now, we unfortunately can't just make the field name optional: both That being said, I don't think we can make a decision without having numbers on how much field names cost us in terms of symbol name length/compile times/binary sizes. So my suggestion is:
All of the above would need to happen before const generics are fully stabilized. I don't think the new mangling scheme should be blocked on this one aspect that only comes to bear when using an unstable feature. |
FWIW there are two ways one could do that, while staying within the already-implemented syntax:
If neither of these is satisfactory, we can always come up with a different letter (than |
Error: This repository is not enabled to use triagebot. Please let |
I assume that there are no plans to support unions (since it's not clear what it would mean to encode one)? |
Correct, though they are disallowed by In general, you would need a way to know whether For edge cases that may eventually be supported (such as single-case |
…woerister,oli-obk rustc_symbol_mangling: support structural constants and &str in v0. This PR should unblock rust-lang#85530 (except for float `const` generics, which AFAIK should've never worked). (cc `@tmiasko` could the rust-lang#85530 (comment) failures be retried with a quick crater "subset" run of this PR + changing the default to `v0`? Just to make sure I didn't miss anything other than the floats) The encoding is the one suggested before in e.g. rust-lang#61486 (comment), tho this PR won't by itself finish rust-lang#61486, before closing that we'd likely want to move to `@oli-obk's` "valtrees" (i.e. rust-lang#83234 and other associated work). <hr> **EDITs**: 1. switched unit/tuple/braced-with-named-fields `<const-fields>` prefixes from `"u"`/`"T"`/`""` to `"U"`/`"T"`/`"S"` to avoid the ambiguity reported by `@tmiasko` in rust-lang#87194 (comment). 2. `rustc-demangle` PR: rust-lang/rustc-demangle#55 3. RFC amendment PR: rust-lang/rfcs#3161 * also removed the grammar changes included in that PR, from this description 4. added tests (temporarily using my fork of `rustc-demangle`) <hr> r? `@michaelwoerister`
Feedback from @Havvy elsewhere (#3224 (comment)): we should probably not be amending RFCs. Even if we make an exception for "fixing the RFC to reflect reality as it shipped" (#3224 (comment)), we would still want to split this PR into two:
The first half would be roughly this (plus comments and -<const> = <type> <const-data>
+<const> = <int-type> <const-int>
+ | "b" <const-int> // false, true
+ | "c" <const-int> // '...' Whereas the second half would be adding the new structural cases to |
Hello checking this RFC for current status: are the raised concerns (comment) still to be untangled, has there been any progress? Is there a path to get this RFC approved or do I read correctly that the general feeling is close it? thanks @eddyb for an update (and sorry for my hand-waving comment, can't fully grasp the context here 😅 ) |
@apiraino This amendment is describing functionality that has been implemented for almost two years (but I believe is still nightly-only, simply because I don't know if this should be the responsibility of people involved with mangling (@wesleywiser @michaelwoerister @ehuss, off the top of my head), or with const generics (@oli-obk @BoxyUwU @lcnr, maybe?). Looking again at the comments above, #3161 (comment) seems to be describing an overall plan, and of its 5 steps, step 1. happened already, and step 2. is/will be happening soon (AIUI, anyway, I haven't kept up that well).
At most the specifics of the current form, maybe - something or other will have to be stabilized eventually, to support |
I think @michaelwoerister's plan in #3161 (comment) makes a lot of sense. Reducing the length of symbols by not encoding the field names as @petrochenkov suggests is very valuable especially for space constrained environments. I think the best way to determine the impact is to implement the feature with field names and then perform measurements to see what the cost actually ends up being. We may even want to allow users to decide for themselves; some users might prefer the smaller symbols and others prefer the more detailed symbols in backtraces. @petrochenkov would it resolve your concerns if we left whether to encode field names in the symbols as an open question to be resolved before we stabilize more complex types in generics? |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
Is this not merged because of the merge conflict? |
Sorry, this kind of fell of my plate. Process-wise, instead of amending RFCs, this should probably be done as a PR to update the documentation at https://github.com/rust-lang/rust/blob/master/src/doc/rustc/src/symbol-mangling/v0.md. However, I think part of my hesitation was that I wanted to avoid documenting unstable features. IIUC, some of these consts aren't stabilized, right? I'm a little uncertain how to handle that. Is the intent that these encodings are stable before the features themselves are stabilized? If so, we can just go ahead and document them. If not, then we might want to consider a different approach of how to document them. I would suggest that the encodings just get added to the docs whenever the corresponding const features are stabilized. I guess, part of my uncertainty was what was the intent of this RFC? |
From a pure grammar point of view, I don't see a problem making it expressive enough to enable encoding things that aren't stable yet, and then also document that. The only thing we need to make sure is that what we add to the grammar will actually be what we need for the stable version of complex const generics. But the set being proposed here looks pretty straightforward. |
@ehuss @michaelwoerister is the status of this RFC still uncertain? What could help to push this RFC to the finish line? I'm trying to understand if there are blockers or other things it depend on. Thanks |
ping @ehuss @michaelwoerister about status for this RFC. Thanks :-) |
This depends on two things, as far as I can tell:
|
The main point of instability with When references are involved there's data that we currently don't encode that we might want to start doing at some point, these are solely issues with allowing references in const generics and do not occur otherwise. In general see rust-lang/rust#120961 for an overview of what we're not currently representing in values for const generics. Some of the stuff we might want to start representing in const generic values would work perfectly fine in this scheme as is (e.g. padding bytes and what static a borrow is a reference to). Other stuff I have no idea how we would even handle mangling (e.g. bytes with provenance and more generally representing borrows as having their pointee be arbitrary bytes). The stuff that wouldn't work under this scheme I don't know how we could make it work under any scheme so I think that largely does not pose an issue for this RFC. My only real point would be that if we wind up forbidding references in const parameter types, the edit: sorry for any ghost pings, I originally had this message contain some other information but removed it |
Thanks for the info, @BoxyUwU!
Overall though, the fewer opaque hash values the better. |
I'm not sure what the process for amending RFCs is, but here goes nothing:
This is one of the last pieces of the
v0
mangling, namely arbitrary constant values (for the fullconst
generics feature). This has been tracked by rust-lang/rust#61486, and there's some discussion there.The main takeaway is that the mangling is structural (ADT-like tree with integer-like leaves), matching the structural equality that type-level constants are required to follow, for soundness reasons.
Accompanying implementation PRs:
The summary of the added forms is:
e
:str
, followed by bytes encoded as two hex nibbles per byteR
/Q
:&
/&mut
, followed by the pointee valueA...E
:[...]
, containing any number of element valuesT...E
:(...)
, containing any number of field valuesV
: named variant/struct
, followed by the constructor path and one of:U
: unit variant/struct
(e.g.None
)T...E
: tuple variant/struct
(e.g.Some(...)
), containing any number of field valuesS...E
: struct-like variant/struct
(e.g.Foo { ... }
), containing any number of (disambiguated-)identifier-prefixed (i.e. named) field valuesEven if there may be constants in the future not covered by these forms, we can rely on the nominal
V
form to encode all sorts of pseudo-paths (while waiting for demanglers to support dedicated manglings), such as these hypothetical examples:const::<SomeType>::h54723863eb99e89f
(hashed constant, masquerading as unitstruct
)core::mem::transmute::<usize, *mut T>(1)
(function call, masquerading as tuplestruct
)cc @michaelwoerister