Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify structures and enum variants in AST/HIR #28816

Merged
merged 13 commits into from
Oct 14, 2015

Conversation

petrochenkov
Copy link
Contributor

This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them.
I didn't touch the top level, i.e. ItemStruct is still ItemStruct and not ItemEnum with one variant, like in the type checker.
As part of this patch, structures and variants get the kind field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructor NodeId. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out.
This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion.
It also touches #28750 by providing span for visit_struct_def
cc #28336

r? @nrc

@bors
Copy link
Contributor

bors commented Oct 3, 2015

☔ The latest upstream changes (presumably #28521) made this pull request unmergeable. Please resolve the merge conflicts.

@nrc
Copy link
Member

nrc commented Oct 5, 2015

I don't think we should unify structs and enums in the AST, only in the HIR, with the unification happening at the lowering stage. Is there a specific reason you want to unify the AST too? My reasoning is that the HIR should be as small as possible, but the AST should be as close as possible to the source text.

@petrochenkov
Copy link
Contributor Author

This unification simplifies code in libsyntax in the same way as everywhere else - structures and variants are parsed in the same way, pretty-printed in the same way, deriving logic is also simplified. (See the first commit, it's self-sufficient (i.e. passes make check) and reflect the unification in AST only and its consequences). Besides, kind should present in AST already, otherwise it can't be propagated to HIR.

@petrochenkov
Copy link
Contributor Author

Ah, and as I mentioned above, structs and enums are not unified completely like in type checker, only structs and enum variants are, i.e. it's still close to the source.

@bors
Copy link
Contributor

bors commented Oct 6, 2015

☔ The latest upstream changes (presumably #28697) made this pull request unmergeable. Please resolve the merge conflicts.

@petrochenkov
Copy link
Contributor Author

Rebased.

@nrc
Copy link
Member

nrc commented Oct 6, 2015

OK, that makes sense. I would like to see all data unified in the HIR (but not the AST), but that doesn't have to block this PR. I'll review later today.

#[derive(Clone, PartialEq, Eq, RustcEncodable, RustcDecodable, Hash, Debug)]
pub struct StructDef {
/// Fields, not including ctor
pub fields: Vec<StructField>,
/// ID of the constructor. This is only used for tuple- or enum-like
/// structs.
pub ctor_id: Option<NodeId>,
pub id: NodeId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried by the new id setup (the old one was kind of bad too though). Why have an id here if not using it for just the constructor? And if you are using it for just the constructor, why change the name? In particular, structs are always items, which have their own id, so having an id here and in the Item is weird. It seems leaving the id in variant might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an impression, that in variants (unlike in structs) ids are used not only for their constructors, but for something else (as an id of "variant itself", I don't remember where exactly, need to investigate), i.e. ids for {}-variants are also used despite not being constructor ids. I can rename id back to ctor_id if that is not true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the first example I've found: https://github.com/rust-lang/rust/blob/master/src/librustc/front/map/collector.rs#L137
Fields of variants are parented to variant's id even if the variant doesn't have a constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrochenkov

The variant ids are indeed used for identifying the variant itself (variants are items). The "ctor id" hack is to allow tuple-structs to have 2 item types - the struct itself is Foo<T> and the variant is for<'a> fn(&'a T) -> Foo<T>. Tuple-like variants just have the fn type.

It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arielb1

It may be better to use the ctor id as the "variant id" of a tuple-like struct, but I didn't try that.

Isn't that what petrochenkov@5abb670 does? It merges "variant id" and "ctor id" and as a result we have one id in the outer enum/struct item with enum/struct type and another id inside of variant.def with fn type (also used as "variant id").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a table! :)

Now:

What Kind NodeId in StructDef NodeId in Variant NodeId in Item
Variant in Enum Unit Used N/A Used
Variant in Enum Tuple Used N/A Used
Variant in Enum Dict Used N/A Used
Struct Unit Used N/A Used
Struct Tuple Used N/A Used
Struct Dict Not used N/A Used

After moving NodeId in Variant:

What Kind NodeId in StructDef NodeId in Variant NodeId in Item
Variant in Enum Unit Some(Not used) Used Used
Variant in Enum Tuple Some(Not used) Used Used
Variant in Enum Dict None Used Used
Struct Unit Some(Used) N/A Used
Struct Tuple Some(Used) N/A Used
Struct Dict None N/A Used

Personally, I like the first variant better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I so almost did this. I think what I am proposing is:

What Kind NodeId in Variant NodeId in VariantKind
Variant in Enum Unit Used Used
Variant in Enum Tuple Used Used
Variant in Enum Dict Used N/A
Struct Unit N/A Used
Struct Tuple N/A Used
Struct Dict N/A N/A

But I think we disagree about where a ctor id is used - I thought that Tuple and Unit variants use their ctor id - tuple variants are certainly valid functions in Rust. In the first table there is also duplication between the node id in StructDef and the NodeId in Item in the non-variant struct cases (iiuc).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variant in Enum |...|Used |Used will require some additional refactoring, because now both Used successfully share the same id. I'll look at it and at with_fields(|field| { /* ... */ }) tomorrow.
(I've also updated the table with NodeId in Item column.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess #28888 will have to be reverted if variant id and variant ctor id are splitted in HIR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just stick a "ctor id" on the variant even in dict-like structs? Just using the struct id there would be fine too, as you can't really access the "ctor" of a dict-like struct.

@nrc
Copy link
Member

nrc commented Oct 7, 2015

I think this PR is a step in the right direction, but I think the structure of the AST types needs some refinement, and I'd like to improve the id story. I'll review in detail once these have been addressed.

@petrochenkov
Copy link
Contributor Author

Updated with some renamings

@petrochenkov
Copy link
Contributor Author

Updated with this setup:

enum VariantData {
    Struct(Vec<StructField>),
    Tuple(Vec<StructField>),
    Unit,
}

It doesn't even look as bad as I thought (with exception of folds). Edit: but personally, I'd revert this.

I'll try to split variant id and variant ctor id tomorrow, although I'm still not sure if it is a good idea.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2015

@petrochenkov

I am not sure why we need separate variant id and variant ctor id. We need one id for the type (which we already have on an enum) + one id for each variant - it may be better to have a "ctor id" even on dict-like structs, even if not strictly required.

All enums need v+1 node-ids - one for the enum, one for each variant.
Tuple/unit-like structs need 2 node-ids - one for the struct, one for the ctor
Dict-like structs need 1 node-id - only for the struct, but adding a node-id for the ctor won't really hurt anyone.

@petrochenkov
Copy link
Contributor Author

it may be better to have a "ctor id" even on dict-like structs, even if not strictly required

I think so, too, and the patch currently does this, because it's the simplest solution, but @nrc seems to not like unused NodeIds

@nrc
Copy link
Member

nrc commented Oct 8, 2015

@arielb1, @petrochenkov I feel like I'm missing something - I don't want to make @petrochenkov do unnecessary work here if it's just because I'm not getting something here, so to go over it again:

why is it better or simpler to have the id on the StructDef rather than the VariantData_?

My motivation here is that as many constraints as possible should be explicit in the data structure - here it seems there is a constraint that dict-like structs as items should never have a ctor-id, that can be enforced in the data structure so we should (unless there is a reason not to). It seems an easy-enough mistake to make for some future developer to use that id by accident.

Likewise, the constraint that Unit variants have no fields should be explicit in the data structure. Otherwise some syntax extension could add fields and they would be ignored by the compiler in some places and used in others and cause complex bugs.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2015

@nrc

enum variants have ctor id == vid, and structs can have that too (this would lead to dict-like structs having a useless ctor id, but that is the price of uniformity).

I think there should be 1 id on the item + 1 id on each VariantData (that is the variant id on all enums, the ctor id on tuple-like structs and useless on dict-like structs).

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2015

If we are talking about enforcing constraints in data structures, could we make tuple-like variants have only unnamed fields and dict-like variants have only named ones?

@nrc
Copy link
Member

nrc commented Oct 8, 2015

@arielb1 There is some discussion of that (named/unnamed fields) inline, @petrochenkov thinks we iterate over fields more often without caring whether they are named or not. I find that persuasive.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2015

ty uses a sentinel to mark an unnamed field, as it does not care about names in 99% of all cases

@nrc
Copy link
Member

nrc commented Oct 8, 2015

@arielb1 the intention being that you can always lookup the id on the VariantData and it will be what you want? Vs, having to check the data kind before checking the id? That seems to assume that if you do this, then the ctor_id, not the item id is the one you want, which seems dangerous. I.e., I guess my hypothesis is that it is dangerous to check the id without knowing the kind of variant.

@arielb1
Copy link
Contributor

arielb1 commented Oct 8, 2015

@nrc

The item id is always on the item. The variant/ctor id is always on the variant. There is no place for confusion, and anyway using the wrong id will tend to cause an ICE.

@nrc
Copy link
Member

nrc commented Oct 8, 2015

@arielb1 ah, so the simplification is that you want the ctor id and thus you know that you are (or should be) dealing with a non-dict struct, then you can just look on the VariantData, rather than having to re-check the kind?

@petrochenkov
Copy link
Contributor Author

@nrc

Otherwise some syntax extension could add fields and they would be ignored by the compiler in some places and used in others and cause complex bugs.

Hm, can plugins/syntax extension modify HIR? If not, then we can check the invariants during lowering from AST and keep the data structures simple and uniform (i.e. unused NodeId for dict-style structs, empty fields for unit structs).

@nrc
Copy link
Member

nrc commented Oct 9, 2015

Syntax extensions cannot modify the HIR, they operate only on the AST. But some kind of future compiler plugin could. Or some compiler dev could.

The trouble with asserting invariants at one point like this is that it only ensures the invariant at that point. Since you can't expect a dev to know what happens throughout the compiler (or may happen in the future), to be safe they must assert it at every use. Whereas, if it is impossible for something to exist due to the shape of the data, then you are free to assume always (well, you don't have to) and can be safe. For the price of a little extra complexity you win big in terms of defensive programming.

@petrochenkov
Copy link
Contributor Author

@nrc
Today I tried to split variant ids and ctor ids and it turned out to require non-trivial knowledge about parts of the compiler that use these ids. I'm not sure I'll be able to do it in a reasonable time. Based on these trials I can also say that dealing with two subtly different ids is certainly more tricky than with one sometimes unused id, so the argument about less probable mistakes by future developers doesn't really apply. So, I think I'm done here.
I've pushed a rebased version with VariantData_ merged with VariantData, so it should be ready for a final review.

@petrochenkov
Copy link
Contributor Author

@nrc
Updated with the comments. (Hopefully, they address your questions as well)

@nrc
Copy link
Member

nrc commented Oct 13, 2015

@bors: r+

@bors
Copy link
Contributor

bors commented Oct 13, 2015

📌 Commit 607b8c3 has been approved by nrc

@bors
Copy link
Contributor

bors commented Oct 14, 2015

⌛ Testing commit 607b8c3 with merge b9695f9...

@bors
Copy link
Contributor

bors commented Oct 14, 2015

⛄ The build was interrupted to prioritize another pull request.

@bors
Copy link
Contributor

bors commented Oct 14, 2015

⌛ Testing commit 607b8c3 with merge 2939666...

bors added a commit that referenced this pull request Oct 14, 2015
This patch uses the same data structures for structs and enum variants in AST and HIR. These changes in data structures lead to noticeable simplification in most of code dealing with them.
I didn't touch the top level, i.e. `ItemStruct` is still `ItemStruct` and not `ItemEnum` with one variant, like in the type checker.
As part of this patch, structures and variants get the `kind` field making distinction between "normal" structs, tuple structs and unit structs explicit instead of relying on the number of fields and presence of constructor `NodeId`. In particular, we can now distinguish empty tuple structs from unit structs, which was impossible before! Comprehensive tests for empty structs are added and some improvements to empty struct feature gates are made. Some tests don't pass due to issue #28692 , they are still there for completeness, but are commented out.
This patch fixes issue mentioned in #16819 (comment), now emptiness of tuple structs is checked after expansion.
It also touches #28750 by providing span for visit_struct_def
cc #28336

r? @nrc
@bors bors merged commit 607b8c3 into rust-lang:master Oct 14, 2015
bors added a commit that referenced this pull request Oct 17, 2015
Closes #28750
`Arm` and `Generics` don't have spans at all, so it's not a visitor's problem, `visit_struct_def` was fixed in #28816
match *self {
VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => Some(fields),
_ => None,
}.into_iter().flat_map(vec_iter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if anyone's reading this anymore, but there a much simpler way to write this method, because &[]: &'static [T]:

    pub fn fields(&self) -> slice::Iter<StructField> {
        match *self {
            VariantData::Struct(ref fields, _) | VariantData::Tuple(ref fields, _) => fields.iter(),
            _ => [].iter()
        }
    }

Could also just return the slice and the for field in s.fields() usage would be the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading! (I'm subscribed)

&[]: &'static [T]

This is... not what I'd expect from a temporary, but it makes everything much simpler, yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a special case, until rvalue promotion is turned on (&constexpr has been pointing to .rodata from before 1.0, but, alas, no RFC).

bors added a commit that referenced this pull request Oct 26, 2015
And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_`

Improvements suggested by @eddyb in #28816 (comment) and #28816 (comment)

plugin-[breaking-change]

r? @eddyb
@petrochenkov petrochenkov deleted the unistruct branch November 22, 2015 11:46
critiqjo pushed a commit to critiqjo/rustdoc that referenced this pull request Dec 16, 2016
And use `VariantData` instead of `P<VariantData>` in `Item_` and `Variant_`

Improvements suggested by @eddyb in rust-lang/rust#28816 (comment) and rust-lang/rust#28816 (comment)

plugin-[breaking-change]

r? @eddyb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants