-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify structures and enum variants in AST/HIR #28816
Conversation
☔ The latest upstream changes (presumably #28521) made this pull request unmergeable. Please resolve the merge conflicts. |
54c3b79
to
f1a5608
Compare
I don't think we should unify structs and enums in the AST, only in the HIR, with the unification happening at the lowering stage. Is there a specific reason you want to unify the AST too? My reasoning is that the HIR should be as small as possible, but the AST should be as close as possible to the source text. |
This unification simplifies code in libsyntax in the same way as everywhere else - structures and variants are parsed in the same way, pretty-printed in the same way, deriving logic is also simplified. (See the first commit, it's self-sufficient (i.e. passes make check) and reflect the unification in AST only and its consequences). Besides, |
Ah, and as I mentioned above, structs and enums are not unified completely like in type checker, only structs and enum variants are, i.e. it's still close to the source. |
☔ The latest upstream changes (presumably #28697) made this pull request unmergeable. Please resolve the merge conflicts. |
f1a5608
to
27de390
Compare
Rebased. |
OK, that makes sense. I would like to see all data unified in the HIR (but not the AST), but that doesn't have to block this PR. I'll review later today. |
#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)] | ||
pub struct StructDef { | ||
/// Fields, not including ctor | ||
pub fields: Vec<StructField>, | ||
/// ID of the constructor. This is only used for tuple- or enum-like | ||
/// structs. | ||
pub ctor_id: Option<NodeId>, | ||
pub id: NodeId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried by the new id setup (the old one was kind of bad too though). Why have an id here if not using it for just the constructor? And if you are using it for just the constructor, why change the name? In particular, structs are always items, which have their own id, so having an id here and in the Item is weird. It seems leaving the id in variant might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had an impression, that in variants (unlike in structs) id
s are used not only for their constructors, but for something else (as an id
of "variant itself", I don't remember where exactly, need to investigate), i.e. id
s for {}
-variants are also used despite not being constructor id
s. I can rename id
back to ctor_id
if that is not true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the first example I've found: https://github.com/rust-lang/rust/blob/master/src/librustc/front/map/collector.rs#L137
Fields of variants are parented to variant's id even if the variant doesn't have a constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variant ids are indeed used for identifying the variant itself (variants are items). The "ctor id" hack is to allow tuple-structs to have 2 item types - the struct itself is Foo<T>
and the variant is for<'a> fn(&'a T) -> Foo<T>
. Tuple-like variants just have the fn type.
It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.
Isn't that what petrochenkov@5abb670 does? It merges "variant id" and "ctor id" and as a result we have one id in the outer enum
/struct
item with enum
/struct
type and another id inside of variant.def
with fn
type (also used as "variant id").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a table! :)
Now:
What | Kind | NodeId in StructDef | NodeId in Variant | NodeId in Item |
---|---|---|---|---|
Variant in Enum | Unit | Used | N/A | Used |
Variant in Enum | Tuple | Used | N/A | Used |
Variant in Enum | Dict | Used | N/A | Used |
Struct | Unit | Used | N/A | Used |
Struct | Tuple | Used | N/A | Used |
Struct | Dict | Not used | N/A | Used |
After moving NodeId in Variant:
What | Kind | NodeId in StructDef | NodeId in Variant | NodeId in Item |
---|---|---|---|---|
Variant in Enum | Unit | Some(Not used) | Used | Used |
Variant in Enum | Tuple | Some(Not used) | Used | Used |
Variant in Enum | Dict | None | Used | Used |
Struct | Unit | Some(Used) | N/A | Used |
Struct | Tuple | Some(Used) | N/A | Used |
Struct | Dict | None | N/A | Used |
Personally, I like the first variant better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I so almost did this. I think what I am proposing is:
What | Kind | NodeId in Variant | NodeId in VariantKind |
---|---|---|---|
Variant in Enum | Unit | Used | Used |
Variant in Enum | Tuple | Used | Used |
Variant in Enum | Dict | Used | N/A |
Struct | Unit | N/A | Used |
Struct | Tuple | N/A | Used |
Struct | Dict | N/A | N/A |
But I think we disagree about where a ctor id is used - I thought that Tuple and Unit variants use their ctor id - tuple variants are certainly valid functions in Rust. In the first table there is also duplication between the node id in StructDef and the NodeId in Item
in the non-variant struct cases (iiuc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variant in Enum |...|Used |Used
will require some additional refactoring, because now both Used
successfully share the same id. I'll look at it and at with_fields(|field| { /* ... */ })
tomorrow.
(I've also updated the table with NodeId in Item
column.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess #28888 will have to be reverted if variant id and variant ctor id are splitted in HIR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just stick a "ctor id" on the variant even in dict-like structs? Just using the struct id there would be fine too, as you can't really access the "ctor" of a dict-like struct.
I think this PR is a step in the right direction, but I think the structure of the AST types needs some refinement, and I'd like to improve the id story. I'll review in detail once these have been addressed. |
Updated with some renamings |
Updated with this setup:
It doesn't even look as bad as I thought (with exception of folds). Edit: but personally, I'd revert this. I'll try to split variant id and variant ctor id tomorrow, although I'm still not sure if it is a good idea. |
I am not sure why we need separate variant id and variant ctor id. We need one id for the type (which we already have on an enum) + one id for each variant - it may be better to have a "ctor id" even on dict-like structs, even if not strictly required. All enums need v+1 node-ids - one for the enum, one for each variant. |
I think so, too, and the patch currently does this, because it's the simplest solution, but @nrc seems to not like unused NodeIds |
@arielb1, @petrochenkov I feel like I'm missing something - I don't want to make @petrochenkov do unnecessary work here if it's just because I'm not getting something here, so to go over it again: why is it better or simpler to have the id on the StructDef rather than the VariantData_? My motivation here is that as many constraints as possible should be explicit in the data structure - here it seems there is a constraint that dict-like structs as items should never have a ctor-id, that can be enforced in the data structure so we should (unless there is a reason not to). It seems an easy-enough mistake to make for some future developer to use that id by accident. Likewise, the constraint that Unit variants have no fields should be explicit in the data structure. Otherwise some syntax extension could add fields and they would be ignored by the compiler in some places and used in others and cause complex bugs. |
enum variants have I think there should be 1 id on the item + 1 id on each |
If we are talking about enforcing constraints in data structures, could we make tuple-like variants have only unnamed fields and dict-like variants have only named ones? |
@arielb1 There is some discussion of that (named/unnamed fields) inline, @petrochenkov thinks we iterate over fields more often without caring whether they are named or not. I find that persuasive. |
|
@arielb1 the intention being that you can always lookup the id on the |
The item id is always on the item. The variant/ctor id is always on the variant. There is no place for confusion, and anyway using the wrong id will tend to cause an ICE. |
@arielb1 ah, so the simplification is that you want the ctor id and thus you know that you are (or should be) dealing with a non-dict struct, then you can just look on the VariantData, rather than having to re-check the kind? |
Hm, can plugins/syntax extension modify HIR? If not, then we can check the invariants during lowering from AST and keep the data structures simple and uniform (i.e. unused |
Syntax extensions cannot modify the HIR, they operate only on the AST. But some kind of future compiler plugin could. Or some compiler dev could. The trouble with asserting invariants at one point like this is that it only ensures the invariant at that point. Since you can't expect a dev to know what happens throughout the compiler (or may happen in the future), to be safe they must assert it at every use. Whereas, if it is impossible for something to exist due to the shape of the data, then you are free to assume always (well, you don't have to) and can be safe. For the price of a little extra complexity you win big in terms of defensive programming. |
@nrc |
9e0bde2
to
c11bf36
Compare
d7ac587
to
607b8c3
Compare
@nrc |
@bors: r+ |
📌 Commit 607b8c3 has been approved by |
⌛ Testing commit 607b8c3 with merge b9695f9... |
⛄ The build was interrupted to prioritize another pull request. |
This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them. I didn't touch the top level, i.e. `ItemStruct` is still `ItemStruct` and not `ItemEnum` with one variant, like in the type checker. As part of this patch, structures and variants get the `kind` field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructor `NodeId`. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out. This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion. It also touches #28750 by providing span for visit_struct_def cc #28336 r? @nrc
match *self { | ||
VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => Some(fields), | ||
_ => None, | ||
}.into_iter().flat_map(vec_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if anyone's reading this anymore, but there a much simpler way to write this method, because &[]: &'static [T]
:
pub fn fields(&self) -> slice::Iter<StructField> {
match *self {
VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => fields.iter(),
_ => [].iter()
}
}
Could also just return the slice and the for field in s.fields()
usage would be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading! (I'm subscribed)
&[]: &'static [T]
This is... not what I'd expect from a temporary, but it makes everything much simpler, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a special case, until rvalue promotion is turned on (&constexpr
has been pointing to .rodata
from before 1.0, but, alas, no RFC).
And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_` Improvements suggested by @eddyb in #28816 (comment) and #28816 (comment) plugin-[breaking-change] r? @eddyb
And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_` Improvements suggested by @eddyb in rust-lang/rust#28816 (comment) and rust-lang/rust#28816 (comment) plugin-[breaking-change] r? @eddyb
This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them.
I didn't touch the top level, i.e.
ItemStruct
is stillItemStruct
and notItemEnum
with one variant, like in the type checker.As part of this patch, structures and variants get the
kind
field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructorNodeId
. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out.This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion.
It also touches #28750 by providing span for visit_struct_def
cc #28336
r? @nrc