Unify structures and enum variants in AST/HIR #28816

petrochenkov · 2015-10-02T21:48:53Z

This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them.
I didn't touch the top level, i.e. ItemStruct is still ItemStruct and not ItemEnum with one variant, like in the type checker.
As part of this patch, structures and variants get the kind field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructor NodeId. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out.
This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion.
It also touches #28750 by providing span for visit_struct_def
cc #28336

r? @nrc

bors · 2015-10-03T20:20:53Z

☔ The latest upstream changes (presumably #28521) made this pull request unmergeable. Please resolve the merge conflicts.

nrc · 2015-10-05T21:38:53Z

I don't think we should unify structs and enums in the AST, only in the HIR, with the unification happening at the lowering stage. Is there a specific reason you want to unify the AST too? My reasoning is that the HIR should be as small as possible, but the AST should be as close as possible to the source text.

petrochenkov · 2015-10-05T22:03:43Z

This unification simplifies code in libsyntax in the same way as everywhere else - structures and variants are parsed in the same way, pretty-printed in the same way, deriving logic is also simplified. (See the first commit, it's self-sufficient (i.e. passes make check) and reflect the unification in AST only and its consequences). Besides, kind should present in AST already, otherwise it can't be propagated to HIR.

petrochenkov · 2015-10-05T22:11:09Z

Ah, and as I mentioned above, structs and enums are not unified completely like in type checker, only structs and enum variants are, i.e. it's still close to the source.

bors · 2015-10-06T01:22:01Z

☔ The latest upstream changes (presumably #28697) made this pull request unmergeable. Please resolve the merge conflicts.

petrochenkov · 2015-10-06T10:26:38Z

Rebased.

nrc · 2015-10-06T18:53:47Z

OK, that makes sense. I would like to see all data unified in the HIR (but not the AST), but that doesn't have to block this PR. I'll review later today.

nrc · 2015-10-07T02:44:56Z

src/libsyntax/ast.rs

 #[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)]
 pub struct StructDef {
    /// Fields, not including ctor
    pub fields: Vec<StructField>,
    /// ID of the constructor. This is only used for tuple- or enum-like
    /// structs.
-    pub ctor_id: Option<NodeId>,
+    pub id: NodeId,


I'm a bit worried by the new id setup (the old one was kind of bad too though). Why have an id here if not using it for just the constructor? And if you are using it for just the constructor, why change the name? In particular, structs are always items, which have their own id, so having an id here and in the Item is weird. It seems leaving the id in variant might be better.

I had an impression, that in variants (unlike in structs) ids are used not only for their constructors, but for something else (as an id of "variant itself", I don't remember where exactly, need to investigate), i.e. ids for {}-variants are also used despite not being constructor ids. I can rename id back to ctor_id if that is not true.

Here's the first example I've found: https://github.com/rust-lang/rust/blob/master/src/librustc/front/map/collector.rs#L137
Fields of variants are parented to variant's id even if the variant doesn't have a constructor.

@petrochenkov

The variant ids are indeed used for identifying the variant itself (variants are items). The "ctor id" hack is to allow tuple-structs to have 2 item types - the struct itself is Foo<T> and the variant is for<'a> fn(&'a T) -> Foo<T>. Tuple-like variants just have the fn type.

It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.

@arielb1

It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.

Isn't that what petrochenkov@5abb670 does? It merges "variant id" and "ctor id" and as a result we have one id in the outer enum/struct item with enum/struct type and another id inside of variant.def with fn type (also used as "variant id").

I've made a table! :)

Now:

What Kind NodeId in StructDef NodeId in Variant NodeId in Item

Variant in Enum Unit Used N/A Used

Variant in Enum Tuple Used N/A Used

Variant in Enum Dict Used N/A Used

Struct Unit Used N/A Used

Struct Tuple Used N/A Used

Struct Dict Not used N/A Used

After moving NodeId in Variant:

What Kind NodeId in StructDef NodeId in Variant NodeId in Item

Variant in Enum Unit Some(Not used) Used Used

Variant in Enum Tuple Some(Not used) Used Used

Variant in Enum Dict None Used Used

Struct Unit Some(Used) N/A Used

Struct Tuple Some(Used) N/A Used

Struct Dict None N/A Used

Personally, I like the first variant better

Nice, I so almost did this. I think what I am proposing is:

What Kind NodeId in Variant NodeId in VariantKind

Variant in Enum Unit Used Used

Variant in Enum Tuple Used Used

Variant in Enum Dict Used N/A

Struct Unit N/A Used

Struct Tuple N/A Used

Struct Dict N/A N/A

But I think we disagree about where a ctor id is used - I thought that Tuple and Unit variants use their ctor id - tuple variants are certainly valid functions in Rust. In the first table there is also duplication between the node id in StructDef and the NodeId in Item in the non-variant struct cases (iiuc).

Variant in Enum |...|Used |Used will require some additional refactoring, because now both Used successfully share the same id. I'll look at it and at with_fields(|field| { /* ... */ }) tomorrow.
(I've also updated the table with NodeId in Item column.)

I guess #28888 will have to be reverted if variant id and variant ctor id are splitted in HIR

Can't we just stick a "ctor id" on the variant even in dict-like structs? Just using the struct id there would be fine too, as you can't really access the "ctor" of a dict-like struct.

nrc · 2015-10-07T02:56:39Z

I think this PR is a step in the right direction, but I think the structure of the AST types needs some refinement, and I'd like to improve the id story. I'll review in detail once these have been addressed.

petrochenkov · 2015-10-08T00:22:16Z

Updated with some renamings

petrochenkov · 2015-10-08T20:51:24Z

Updated with this setup:

enum VariantData {
    Struct(Vec<StructField>),
    Tuple(Vec<StructField>),
    Unit,
}

It doesn't even look as bad as I thought (with exception of folds). Edit: but personally, I'd revert this.

I'll try to split variant id and variant ctor id tomorrow, although I'm still not sure if it is a good idea.

arielb1 · 2015-10-08T21:02:50Z

@petrochenkov

I am not sure why we need separate variant id and variant ctor id. We need one id for the type (which we already have on an enum) + one id for each variant - it may be better to have a "ctor id" even on dict-like structs, even if not strictly required.

All enums need v+1 node-ids - one for the enum, one for each variant.
Tuple/unit-like structs need 2 node-ids - one for the struct, one for the ctor
Dict-like structs need 1 node-id - only for the struct, but adding a node-id for the ctor won't really hurt anyone.

petrochenkov · 2015-10-08T21:14:08Z

it may be better to have a "ctor id" even on dict-like structs, even if not strictly required

I think so, too, and the patch currently does this, because it's the simplest solution, but @nrc seems to not like unused NodeIds

nrc · 2015-10-08T22:35:00Z

@arielb1, @petrochenkov I feel like I'm missing something - I don't want to make @petrochenkov do unnecessary work here if it's just because I'm not getting something here, so to go over it again:

why is it better or simpler to have the id on the StructDef rather than the VariantData_?

My motivation here is that as many constraints as possible should be explicit in the data structure - here it seems there is a constraint that dict-like structs as items should never have a ctor-id, that can be enforced in the data structure so we should (unless there is a reason not to). It seems an easy-enough mistake to make for some future developer to use that id by accident.

Likewise, the constraint that Unit variants have no fields should be explicit in the data structure. Otherwise some syntax extension could add fields and they would be ignored by the compiler in some places and used in others and cause complex bugs.

arielb1 · 2015-10-08T22:39:26Z

@nrc

enum variants have ctor id == vid, and structs can have that too (this would lead to dict-like structs having a useless ctor id, but that is the price of uniformity).

I think there should be 1 id on the item + 1 id on each VariantData (that is the variant id on all enums, the ctor id on tuple-like structs and useless on dict-like structs).

arielb1 · 2015-10-08T22:44:47Z

If we are talking about enforcing constraints in data structures, could we make tuple-like variants have only unnamed fields and dict-like variants have only named ones?

nrc · 2015-10-08T22:52:14Z

@arielb1 There is some discussion of that (named/unnamed fields) inline, @petrochenkov thinks we iterate over fields more often without caring whether they are named or not. I find that persuasive.

arielb1 · 2015-10-08T22:55:08Z

ty uses a sentinel to mark an unnamed field, as it does not care about names in 99% of all cases

nrc · 2015-10-08T23:10:10Z

@arielb1 the intention being that you can always lookup the id on the VariantData and it will be what you want? Vs, having to check the data kind before checking the id? That seems to assume that if you do this, then the ctor_id, not the item id is the one you want, which seems dangerous. I.e., I guess my hypothesis is that it is dangerous to check the id without knowing the kind of variant.

arielb1 · 2015-10-08T23:25:34Z

@nrc

The item id is always on the item. The variant/ctor id is always on the variant. There is no place for confusion, and anyway using the wrong id will tend to cause an ICE.

nrc · 2015-10-08T23:29:22Z

@arielb1 ah, so the simplification is that you want the ctor id and thus you know that you are (or should be) dealing with a non-dict struct, then you can just look on the VariantData, rather than having to re-check the kind?

petrochenkov · 2015-10-09T02:25:17Z

@nrc

Otherwise some syntax extension could add fields and they would be ignored by the compiler in some places and used in others and cause complex bugs.

Hm, can plugins/syntax extension modify HIR? If not, then we can check the invariants during lowering from AST and keep the data structures simple and uniform (i.e. unused NodeId for dict-style structs, empty fields for unit structs).

nrc · 2015-10-09T03:16:10Z

Syntax extensions cannot modify the HIR, they operate only on the AST. But some kind of future compiler plugin could. Or some compiler dev could.

The trouble with asserting invariants at one point like this is that it only ensures the invariant at that point. Since you can't expect a dev to know what happens throughout the compiler (or may happen in the future), to be safe they must assert it at every use. Whereas, if it is impossible for something to exist due to the shape of the data, then you are free to assume always (well, you don't have to) and can be safe. For the price of a little extra complexity you win big in terms of defensive programming.

petrochenkov · 2015-10-10T00:17:11Z

@nrc
Today I tried to split variant ids and ctor ids and it turned out to require non-trivial knowledge about parts of the compiler that use these ids. I'm not sure I'll be able to do it in a reasonable time. Based on these trials I can also say that dealing with two subtly different ids is certainly more tricky than with one sometimes unused id, so the argument about less probable mistakes by future developers doesn't really apply. So, I think I'm done here.
I've pushed a rebased version with VariantData_ merged with VariantData, so it should be ready for a final review.

petrochenkov · 2015-10-13T13:20:48Z

@nrc
Updated with the comments. (Hopefully, they address your questions as well)

nrc · 2015-10-13T20:06:07Z

@bors: r+

bors · 2015-10-13T20:06:08Z

📌 Commit 607b8c3 has been approved by nrc

bors · 2015-10-14T01:40:16Z

⌛ Testing commit 607b8c3 with merge b9695f9...

bors · 2015-10-14T02:00:18Z

⛄ The build was interrupted to prioritize another pull request.

bors · 2015-10-14T04:21:59Z

⌛ Testing commit 607b8c3 with merge 2939666...

@nrc

This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them. I didn't touch the top level, i.e. `ItemStruct` is still `ItemStruct` and not `ItemEnum` with one variant, like in the type checker. As part of this patch, structures and variants get the `kind` field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructor `NodeId`. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out. This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion. It also touches #28750 by providing span for visit_struct_def cc #28336 r? @nrc

bors · 2015-10-14T06:35:11Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-gnu-32-nopt-t, auto-win-gnu-32-opt, auto-win-gnu-64-nopt-t, auto-win-gnu-64-opt, auto-win-msvc-32-opt, auto-win-msvc-64-opt

Closes #28750 `Arm` and `Generics` don't have spans at all, so it's not a visitor's problem, `visit_struct_def` was fixed in #28816

eddyb · 2015-10-20T11:30:46Z

src/librustc_front/hir.rs

+        match *self {
+            VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => Some(fields),
+            _ => None,
+        }.into_iter().flat_map(vec_iter)


Not sure if anyone's reading this anymore, but there a much simpler way to write this method, because &[]: &'static [T]:

pub fn fields(&self) -> slice::Iter<StructField> { match *self { VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => fields.iter(), _ => [].iter() } }

Could also just return the slice and the for field in s.fields() usage would be the same.

I'm reading! (I'm subscribed)

&[]: &'static [T]

This is... not what I'd expect from a temporary, but it makes everything much simpler, yes.

It's a special case, until rvalue promotion is turned on (&constexpr has been pointing to .rodata from before 1.0, but, alas, no RFC).

@eddyb

And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_` Improvements suggested by @eddyb in #28816 (comment) and #28816 (comment) plugin-[breaking-change] r? @eddyb

@eddyb

And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_` Improvements suggested by @eddyb in rust-lang/rust#28816 (comment) and rust-lang/rust#28816 (comment) plugin-[breaking-change] r? @eddyb

rust-highfive assigned nrc Oct 2, 2015

petrochenkov force-pushed the unistruct branch from 54c3b79 to f1a5608 Compare October 4, 2015 10:12

petrochenkov force-pushed the unistruct branch from f1a5608 to 27de390 Compare October 6, 2015 10:26

nrc reviewed Oct 7, 2015
View reviewed changes

petrochenkov force-pushed the unistruct branch from 9e0bde2 to c11bf36 Compare October 10, 2015 00:20

petrochenkov added 4 commits October 13, 2015 15:19

Merge struct fields and struct kind

40aa09e

Fix rebase 2

a5225cb

Merge VariantData and VariantData_

46750d0

Comment on the purpose(s) of NodeId in VariantData

607b8c3

petrochenkov force-pushed the unistruct branch from d7ac587 to 607b8c3 Compare October 13, 2015 13:19

bors merged commit 607b8c3 into rust-lang:master Oct 14, 2015

bors mentioned this pull request Oct 14, 2015

Do not allow a module and tuple struct of the same name to coexist. #26421

Merged

marcusklaas mentioned this pull request Oct 15, 2015

rustfmt doesn't compile rust-lang/rustfmt#468

Closed

oli-obk mentioned this pull request Oct 16, 2015

fix(update to current nightly) serde-deprecated/aster#43

Closed

petrochenkov mentioned this pull request Oct 16, 2015

Provide span for visit_enum_def #29102

Merged

bors added a commit that referenced this pull request Oct 17, 2015

Auto merge of #29102 - petrochenkov:spanvis, r=alexcrichton

32a4bd9

Closes #28750 `Arm` and `Generics` don't have spans at all, so it's not a visitor's problem, `visit_struct_def` was fixed in #28816

eddyb reviewed Oct 20, 2015
View reviewed changes

petrochenkov mentioned this pull request Oct 25, 2015

syntax/rustc_front: Simplify VariantData::fields #29303

Merged

petrochenkov deleted the unistruct branch November 22, 2015 11:46

petrochenkov mentioned this pull request Sep 29, 2016

Refactoring/bugfixing around definitions for struct/variant constructors #36814

Merged

petrochenkov mentioned this pull request Apr 26, 2017

Do not show ::{{constructor}} on tuple struct diagnostics #41433

Merged

petrochenkov mentioned this pull request Mar 24, 2019

Separate DefIds for variants and their constructors #59382

Merged

What	Kind	NodeId in StructDef	NodeId in Variant	NodeId in Item
Variant in Enum	Unit	Used	N/A	Used
Variant in Enum	Tuple	Used	N/A	Used
Variant in Enum	Dict	Used	N/A	Used
Struct	Unit	Used	N/A	Used
Struct	Tuple	Used	N/A	Used
Struct	Dict	Not used	N/A	Used

What	Kind	NodeId in StructDef	NodeId in Variant	NodeId in Item
Variant in Enum	Unit	Some(Not used)	Used	Used
Variant in Enum	Tuple	Some(Not used)	Used	Used
Variant in Enum	Dict	None	Used	Used
Struct	Unit	Some(Used)	N/A	Used
Struct	Tuple	Some(Used)	N/A	Used
Struct	Dict	None	N/A	Used

Unify structures and enum variants in AST/HIR #28816

Unify structures and enum variants in AST/HIR #28816

Conversation

petrochenkov commented Oct 2, 2015

bors commented Oct 3, 2015

nrc commented Oct 5, 2015

petrochenkov commented Oct 5, 2015

petrochenkov commented Oct 5, 2015

bors commented Oct 6, 2015

petrochenkov commented Oct 6, 2015

nrc commented Oct 6, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrc commented Oct 7, 2015

petrochenkov commented Oct 8, 2015

petrochenkov commented Oct 8, 2015

arielb1 commented Oct 8, 2015

petrochenkov commented Oct 8, 2015

nrc commented Oct 8, 2015

arielb1 commented Oct 8, 2015

arielb1 commented Oct 8, 2015

nrc commented Oct 8, 2015

arielb1 commented Oct 8, 2015

nrc commented Oct 8, 2015

arielb1 commented Oct 8, 2015

nrc commented Oct 8, 2015

petrochenkov commented Oct 9, 2015

nrc commented Oct 9, 2015

petrochenkov commented Oct 10, 2015

petrochenkov commented Oct 13, 2015

nrc commented Oct 13, 2015

bors commented Oct 13, 2015

bors commented Oct 14, 2015

bors commented Oct 14, 2015

bors commented Oct 14, 2015

bors commented Oct 14, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment