Implement consistent attribute validation #1901

rsmmr · 2024-10-18T09:28:16Z

Our validation of which attributes are allowed where is currently a bit messy/spotty. I think it would be helpful to centralize this. Couple ideas:

Right now attributes are identified as strings throughout the code, making it easy to typo them in validation code and elsewhere (like checking for &byte_order instead of &byte-order). It would be good to switch to enums instead: we'd translate the name of the attribute into an enum label right at parsing time and then carry that around. (Note that the Spicy lexer has a list of valid attributes already).
I think we could define a central table specifying for each attribute in which contexts it's allowed to be used. Like: &eod is a "parse attribute", &cxxname is a "type attribute", etc.. Then the validation code could just ask at the corresponding locations: is this attribute I'm seeing here a type attribute as I would expect? It might take a bit of trial-and-error to find the right context categories (e.g., is it "parse attribute", or instead a combination "field attribute" & "unit attribute"? Also: Would we have separate categories for type-specific parse attributes, like &ipv4 only applies to address fields, or would be continue to hardcode that distinction directly in the validator?) Also note that an attribute may apply to multiple context categories. But independent of the details, the idea would be centralize the knowledge about where an attribute it expected to been.

(Much of this applies to properties, too, btw).

The text was updated successfully, but these errors were encountered:

Closes #1901 Instead of `attributes()->find("&my-attr")` it'll be `attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go through and find all of the used attributes. Still TODO: - Remove accessing the tag at all to get better coverage - Organize/document all of that new stuff - Classify attributes to contexts - Maybe have nodes tell which attributes may apply to them, then validation is easy I kind of also want easy `has` checks - there are a surprising number of places that go through all of the attributes in order to check if they're present at all, doing that for many different attributes, then just selecting one. That check can easily be constant time for both `has` and `hasOneOf` checks. It probably isn't a big performance impact, but putting that in `AttributeSet` could really make things nicer IMO.

Part 1 of #1901 Instead of `attributes()->find("&my-attr")` it'll be `attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go through and find all of the used attributes. Still TODO: - Remove accessing the tag at all to get better coverage - Organize/document all of that new stuff - Classify attributes to contexts - Maybe have nodes tell which attributes may apply to them, then validation is easy I kind of also want easy `has` checks - there are a surprising number of places that go through all of the attributes in order to check if they're present at all, doing that for many different attributes, then just selecting one. That check can easily be constant time for both `has` and `hasOneOf` checks. It probably isn't a big performance impact, but putting that in `AttributeSet` could really make things nicer IMO.

Part 1 of #1901 Instead of `attributes()->find("&my-attr")` it'll be `attributes()->find(hilti::Attribute::MY_ATTR)` - this also lets me go through and find all of the used attributes. Now internally, all attributes should use the enum representation and not even be able to use the string representation.

Closes #1901 This is where the actual validation changes happen. The issue mentions which contexts an attribute is allowed to be used, specifically "parse attribute" and "type attribute" - but I think that conflates how an attribute is used with where it is. In my opinion, how it is used is a completely separate and harder to solve issue because that requires extra validation (if this is a "parse" attribute, how do I confirm that's *how* it is used?) Instead, this just validates the where in one place. The big list of attributes is allowed on Field, but CXXNAME is not allowed on a Field. Then, anything with attributes can tell whether all of their attributes are within that. Some implementation details: this is implemented as a bitmask that is created in a constexpr. That's not really because of any performance considerations, since I believe those would be pretty easy to solve. It's just an easier way to store a large amount of things that are just present or not. I think bringing this over to `AttributeSet` in some capacity could be reasonable. That has complications when you consider removing attributes, though. Either way you'll need to loop through the attributes, so that may not matter anyway. I don't really know where this should big table should live. I think the validators make sense. I absolutely do NOT want that list in the nodes, I think they should try to be as close as possible to POD. In some `attribute` namespace is where this started, but that started to look a little more clunky to me since the validation is in spicy and hilti, but the hilti side would need to know about spicy node tags. Also, using the node tags as keys could change. It doesn't do well for the diagnostic (though maybe the caller can just give the diagnostic like 'in functions' etc.). But, it's really really convenient to index by, and no other downside. We could also just accept a linear search through some init list or something, idk.

evantypanski · 2024-12-02T16:01:53Z

So looking at part 2 of this (with a WIP commit), I think minimally we could use a node's tag as some sort of key (or in a switch or something) in order to look up which attributes are allowed on.

I talked with both Robin and Benjamin and it seems like that's safe. We may lose some information for derived types - if that ends up having a noticeable impact then switching to using the whole _node_tags seems reasonable. I'll wait on that until there's a pressing need, though, because I'm not sure if there is one.

Doing this mapping isn't really complete. Some attributes may only be valid in some contexts, and we can't validate that just with a node's tag. I don't think there's a very easy way to get that full context though, since it's dependent on what node you're asking about. There are two important points here:

Fields are complex, there are a bunch of attributes that they accept. Then most of those attributes are context dependent. Just validating fields have the "field attributes" seems a bit crude. Maybe there will be another table that uses the field's attributes for the node, so that way you can have one table to verify questions like "Is this &ipv4 attribute on an addr type within a field?" - separate from the question "Is this &ipv4 attribute valid on a field?"
The validator should still be validating attributes apply in correct contexts, but I think some of these questions can be made easier, so it's easier to tell what attributes apply when. One example could be asking an AttributeSet if it containsOneOf a list of attributes. That's just spitballing, but having more convenient methods to ask about attributes seems like it could make it simpler to tell how an attribute is used.

So using nodeTag seems like it can fully verify the question "WHERE is this attribute used?" - asking HOW it's used may have to remain ad-hoc. I don't really know how to centralize that information, other than making it easier to see within the validator.

Closes #1901 This is where the actual validation changes happen. The issue mentions which contexts an attribute is allowed to be used, specifically "parse attribute" and "type attribute" - but I think that conflates how an attribute is used with where it is. In my opinion, how it is used is a completely separate and harder to solve issue because that requires extra validation (if this is a "parse" attribute, how do I confirm that's *how* it is used?) Instead, this just validates the where in one place. The big list of attributes is allowed on Field, but CXXNAME is not allowed on a Field. Then, anything with attributes can tell whether all of their attributes are within that. Some implementation details: this is implemented as a bitmask that is created in a constexpr. That's not really because of any performance considerations, since I believe those would be pretty easy to solve. It's just an easier way to store a large amount of things that are just present or not. I think bringing this over to `AttributeSet` in some capacity could be reasonable. That has complications when you consider removing attributes, though. Either way you'll need to loop through the attributes, so that may not matter anyway. I don't really know where this should big table should live. I think the validators make sense. I absolutely do NOT want that list in the nodes, I think they should try to be as close as possible to POD. In some `attribute` namespace is where this started, but that started to look a little more clunky to me since the validation is in spicy and hilti, but the hilti side would need to know about spicy node tags. Also, using the node tags as keys could change. It doesn't do well for the diagnostic (though maybe the caller can just give the diagnostic like 'in functions' etc.). But, it's really really convenient to index by, and no other downside. We could also just accept a linear search through some init list or something, idk.

Closes #1901 This updates attribute validation to use some fancy bitmask magic in order to see if attributes are in the right places. The goal here is to have one place (ie a big map) to answer the question "Can I use X attribute on Y node?" There are a lot more moving parts with attribute validation, but those generally have to do with behavior. A lot of that requires extra context in the validator, which is exactly what the validator is meant to do. Much of that is pretty ad-hoc, so it could get cleaned up as well.

rsmmr added Cleanup Compiler labels Oct 18, 2024

evantypanski mentioned this issue Nov 7, 2024

Move type alias attribute validation to parser #1916

Merged

evantypanski self-assigned this Nov 21, 2024

evantypanski mentioned this issue Nov 25, 2024

Put attributes in an enum #1927

Merged

evantypanski mentioned this issue Dec 4, 2024

Add some consistency to attribute validation #1939

Merged

rsmmr closed this as completed in 9ad62e5 Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement consistent attribute validation #1901

Implement consistent attribute validation #1901

rsmmr commented Oct 18, 2024

evantypanski commented Dec 2, 2024

Implement consistent attribute validation #1901

Implement consistent attribute validation #1901

Comments

rsmmr commented Oct 18, 2024

evantypanski commented Dec 2, 2024