-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][Module] Make tags for ADT constructors and ConstructorValues more robust #3369
Conversation
@@ -264,6 +244,26 @@ def __init__(self, mod, ctx, target): | |||
self.target = target | |||
self._intrp = _backend.CreateInterpreter(mod, ctx, target) | |||
|
|||
def _arg_to_ast(self, arg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _arg_to_ast(self, arg): | |
def arg_to_ast(self, arg): |
ditto below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out it can't be a method because another object also uses it, so I am going to make it a function that takes the mod as a parameter instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _
is to mark private in Python, its convention don't change name.
I realized that just the index can be a sufficient tag if you have type-checked the code and therefore know what global var to check. Perhaps it is not strictly necessary to keep the reverse mapping but having it makes something like |
@@ -110,15 +111,22 @@ void ModuleNode::Add(const GlobalVar& var, | |||
AddUnchecked(var, checked_func); | |||
} | |||
|
|||
void ModuleNode::RegisterConstructors(const GlobalTypeVar& var, const TypeData& type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using hash value as tag makes opcode not very intuitive to understand, How about maintaining a continuous range [0, num_constructors) for each ADT type?
Then for the following code:
type t = | Alice | Bob | Charlie | David
let test v =
match v with
| Alice -> 1
| Bob -> 2
| Charlie -> 3
| David -> 4
We can generate opcode like:
(switch* v
case int 0: 1
case int 1: 2
case int 2: 3
case int 3: 4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point and I agree hashes are not ideal. The reason I went with hashes is to ensure that the tag will be determined completely by the name of the global type var and the index of the constructor in the type data and thus will not be affected, for example, by the order in which types are defined. It also means that tags will not collide across different ADTs. If hashes are undesirable, we can take the approach (as I alluded to before) of requiring a type check and only looking at the index of the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I vote for eliminating the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the constructor value carry information about its type, then? The case that is most difficult is the one in arg_to_ast
: given only the constructor value, how do you work backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for VM, we might need to discard type information in Object
. As we already drop the type information in byte codes we generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see what I can do about arg_to_ast
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, though, that it will be possible to drop type information for ADT values without ensuring that all tags be globally unique (and I'm not sure of any better way to do that than hashing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree there is no other way . And seems like interpreter has the need to map runtime value back to AST node to support user provided bindings.
How about make a 8 bit hash instead and put it in the most significant byte of the tag. So when we debug, we can easily tell the variant from the least significant byte. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting suggestion, definitely more readable too. I will look into implementing that, thanks.
@wweic I've gone with your suggestion. I hope I've done the hash truncation in the right way. |
src/relay/ir/module.cc
Outdated
// The hash will be used as the most significant byte of the tag, with the index of | ||
// the constructor in the less significant bytes | ||
size_t hash = std::hash<std::string>()(var->var->name_hint); | ||
int64_t prefix = static_cast<int64_t>(hash & 0xff) << 56; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use int32
? I think that's more than enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, if we're truncating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I think as far as RTTI for the interpreter this is a good tradeoff. |
…more robust (apache#3369) * Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate * Lint and build fixes * Add round-tripping test for getting constructors by tag * Use int64_t everywhere for tags * Add additional identity check * Bring out _arg_to_ast again * Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags * Use int32 instead of int64 for tag
…more robust (apache#3369) * Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate * Lint and build fixes * Add round-tripping test for getting constructors by tag * Use int64_t everywhere for tags * Add additional identity check * Bring out _arg_to_ast again * Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags * Use int32 instead of int64 for tag
…more robust (apache#3369) * Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate * Lint and build fixes * Add round-tripping test for getting constructors by tag * Use int64_t everywhere for tags * Add additional identity check * Bring out _arg_to_ast again * Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags * Use int32 instead of int64 for tag
Since #3299 changed the representation of
ConstructorValue
to rely on the ADT tag rather than a constructor object, I wrote this change to ensure thatPreviously tags simply consisted of the constructor's index in the typedata array, but this means that constructors for different ADTs could have clashing tags. Here I use the hash of the ADT name (guaranteed to be globally unique by the module) and the index to produce the tag for the constructor. The module also keeps a reverse mapping of tags to constructors to make it easy to retrieve.
I followed the example of the structural hash in passing around the hashed values and I hope I have done so correctly. I have added some round-trip tests to ensure that hashes map back to the same constructors each time. Please review @jroesch, @wweic, @icemelon9, @MarisaKirisame (reviewers on #3299)