Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement consistent attribute validation #1901

Closed
rsmmr opened this issue Oct 18, 2024 · 1 comment
Closed

Implement consistent attribute validation #1901

rsmmr opened this issue Oct 18, 2024 · 1 comment
Assignees

Comments

@rsmmr
Copy link
Member

rsmmr commented Oct 18, 2024

Our validation of which attributes are allowed where is currently a bit messy/spotty. I think it would be helpful to centralize this. Couple ideas:

  • Right now attributes are identified as strings throughout the code, making it easy to typo them in validation code and elsewhere (like checking for &byte_order instead of &byte-order). It would be good to switch to enums instead: we'd translate the name of the attribute into an enum label right at parsing time and then carry that around. (Note that the Spicy lexer has a list of valid attributes already).

  • I think we could define a central table specifying for each attribute in which contexts it's allowed to be used. Like: &eod is a "parse attribute", &cxxname is a "type attribute", etc.. Then the validation code could just ask at the corresponding locations: is this attribute I'm seeing here a type attribute as I would expect? It might take a bit of trial-and-error to find the right context categories (e.g., is it "parse attribute", or instead a combination "field attribute" & "unit attribute"? Also: Would we have separate categories for type-specific parse attributes, like &ipv4 only applies to address fields, or would be continue to hardcode that distinction directly in the validator?) Also note that an attribute may apply to multiple context categories. But independent of the details, the idea would be centralize the knowledge about where an attribute it expected to been.

(Much of this applies to properties, too, btw).

@evantypanski evantypanski self-assigned this Nov 21, 2024
evantypanski added a commit that referenced this issue Nov 21, 2024
Closes #1901

Instead of `attributes()->find("&my-attr")` it'll be
`attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go
through and find all of the used attributes.

Still TODO:
 - Remove accessing the tag at all to get better coverage
 - Organize/document all of that new stuff
 - Classify attributes to contexts
 - Maybe have nodes tell which attributes may apply to them, then
   validation is easy

I kind of also want easy `has` checks - there are a surprising number of
places that go through all of the attributes in order to check if
they're present at all, doing that for many different attributes, then
just selecting one. That check can easily be constant time for both
`has` and `hasOneOf` checks. It probably isn't a big performance impact,
but putting that in `AttributeSet` could really make things nicer IMO.
evantypanski added a commit that referenced this issue Nov 25, 2024
Part 1 of #1901

Instead of `attributes()->find("&my-attr")` it'll be
`attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go
through and find all of the used attributes.

Still TODO:
 - Remove accessing the tag at all to get better coverage
 - Organize/document all of that new stuff
 - Classify attributes to contexts
 - Maybe have nodes tell which attributes may apply to them, then
   validation is easy

I kind of also want easy `has` checks - there are a surprising number of
places that go through all of the attributes in order to check if
they're present at all, doing that for many different attributes, then
just selecting one. That check can easily be constant time for both
`has` and `hasOneOf` checks. It probably isn't a big performance impact,
but putting that in `AttributeSet` could really make things nicer IMO.
evantypanski added a commit that referenced this issue Nov 26, 2024
Part 1 of #1901

Instead of `attributes()->find("&my-attr")` it'll be
`attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go
through and find all of the used attributes.

Now internally, all attributes should use the enum representation and
not even be able to use the string representation.
evantypanski added a commit that referenced this issue Nov 27, 2024
Closes #1901

This is where the actual validation changes happen. The issue mentions
which contexts an attribute is allowed to be used, specifically "parse
attribute" and "type attribute" - but I think that conflates how an
attribute is used with where it is. In my opinion, how it is used is a
completely separate and harder to solve issue because that requires
extra validation (if this is a "parse" attribute, how do I confirm
that's *how* it is used?)

Instead, this just validates the where in one place. The big list of
attributes is allowed on Field, but CXXNAME is not allowed on a Field.
Then, anything with attributes can tell whether all of their attributes
are within that.

Some implementation details: this is implemented as a bitmask that is
created in a constexpr. That's not really because of any performance
considerations, since I believe those would be pretty easy to solve.
It's just an easier way to store a large amount of things that are just
present or not. I think bringing this over to `AttributeSet` in some
capacity could be reasonable. That has complications when you consider
removing attributes, though. Either way you'll need to loop through the
attributes, so that may not matter anyway.

I don't really know where this should big table should live. I think the
validators make sense. I absolutely do NOT want that list in the nodes,
I think they should try to be as close as possible to POD. In some
`attribute` namespace is where this started, but that started to look a
little more clunky to me since the validation is in spicy and hilti, but
the hilti side would need to know about spicy node tags.

Also, using the node tags as keys could change. It doesn't do well for
the diagnostic (though maybe the caller can just give the diagnostic
like 'in functions' etc.). But, it's really really convenient to index
by, and no other downside. We could also just accept a linear search
through some init list or something, idk.
@evantypanski
Copy link
Contributor

So looking at part 2 of this (with a WIP commit), I think minimally we could use a node's tag as some sort of key (or in a switch or something) in order to look up which attributes are allowed on.

I talked with both Robin and Benjamin and it seems like that's safe. We may lose some information for derived types - if that ends up having a noticeable impact then switching to using the whole _node_tags seems reasonable. I'll wait on that until there's a pressing need, though, because I'm not sure if there is one.

Doing this mapping isn't really complete. Some attributes may only be valid in some contexts, and we can't validate that just with a node's tag. I don't think there's a very easy way to get that full context though, since it's dependent on what node you're asking about. There are two important points here:

  1. Fields are complex, there are a bunch of attributes that they accept. Then most of those attributes are context dependent. Just validating fields have the "field attributes" seems a bit crude. Maybe there will be another table that uses the field's attributes for the node, so that way you can have one table to verify questions like "Is this &ipv4 attribute on an addr type within a field?" - separate from the question "Is this &ipv4 attribute valid on a field?"
  2. The validator should still be validating attributes apply in correct contexts, but I think some of these questions can be made easier, so it's easier to tell what attributes apply when. One example could be asking an AttributeSet if it containsOneOf a list of attributes. That's just spitballing, but having more convenient methods to ask about attributes seems like it could make it simpler to tell how an attribute is used.

So using nodeTag seems like it can fully verify the question "WHERE is this attribute used?" - asking HOW it's used may have to remain ad-hoc. I don't really know how to centralize that information, other than making it easier to see within the validator.

evantypanski added a commit that referenced this issue Dec 2, 2024
Closes #1901

This is where the actual validation changes happen. The issue mentions
which contexts an attribute is allowed to be used, specifically "parse
attribute" and "type attribute" - but I think that conflates how an
attribute is used with where it is. In my opinion, how it is used is a
completely separate and harder to solve issue because that requires
extra validation (if this is a "parse" attribute, how do I confirm
that's *how* it is used?)

Instead, this just validates the where in one place. The big list of
attributes is allowed on Field, but CXXNAME is not allowed on a Field.
Then, anything with attributes can tell whether all of their attributes
are within that.

Some implementation details: this is implemented as a bitmask that is
created in a constexpr. That's not really because of any performance
considerations, since I believe those would be pretty easy to solve.
It's just an easier way to store a large amount of things that are just
present or not. I think bringing this over to `AttributeSet` in some
capacity could be reasonable. That has complications when you consider
removing attributes, though. Either way you'll need to loop through the
attributes, so that may not matter anyway.

I don't really know where this should big table should live. I think the
validators make sense. I absolutely do NOT want that list in the nodes,
I think they should try to be as close as possible to POD. In some
`attribute` namespace is where this started, but that started to look a
little more clunky to me since the validation is in spicy and hilti, but
the hilti side would need to know about spicy node tags.

Also, using the node tags as keys could change. It doesn't do well for
the diagnostic (though maybe the caller can just give the diagnostic
like 'in functions' etc.). But, it's really really convenient to index
by, and no other downside. We could also just accept a linear search
through some init list or something, idk.
evantypanski added a commit that referenced this issue Dec 4, 2024
Closes #1901

This updates attribute validation to use some fancy bitmask magic in
order to see if attributes are in the right places. The goal here is to
have one place (ie a big map) to answer the question "Can I use X
attribute on Y node?"

There are a lot more moving parts with attribute validation, but those
generally have to do with behavior. A lot of that requires extra context
in the validator, which is exactly what the validator is meant to do.
Much of that is pretty ad-hoc, so it could get cleaned up as well.
evantypanski added a commit that referenced this issue Dec 4, 2024
Closes #1901

This updates attribute validation to use some fancy bitmask magic in
order to see if attributes are in the right places. The goal here is to
have one place (ie a big map) to answer the question "Can I use X
attribute on Y node?"

There are a lot more moving parts with attribute validation, but those
generally have to do with behavior. A lot of that requires extra context
in the validator, which is exactly what the validator is meant to do.
Much of that is pretty ad-hoc, so it could get cleaned up as well.
evantypanski added a commit that referenced this issue Dec 6, 2024
Closes #1901

This updates attribute validation to use some fancy bitmask magic in
order to see if attributes are in the right places. The goal here is to
have one place (ie a big map) to answer the question "Can I use X
attribute on Y node?"

There are a lot more moving parts with attribute validation, but those
generally have to do with behavior. A lot of that requires extra context
in the validator, which is exactly what the validator is meant to do.
Much of that is pretty ad-hoc, so it could get cleaned up as well.
@rsmmr rsmmr closed this as completed in 9ad62e5 Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants