Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay][Module] Make tags for ADT constructors and ConstructorValues more robust #3369

Merged
merged 8 commits into from
Jul 5, 2019

Conversation

slyubomirsky
Copy link
Contributor

@slyubomirsky slyubomirsky commented Jun 13, 2019

Since #3299 changed the representation of ConstructorValue to rely on the ADT tag rather than a constructor object, I wrote this change to ensure that

  1. it would be easy to get back a constructor given the tag and
  2. tags could be reasonably guaranteed to be globally unique

Previously tags simply consisted of the constructor's index in the typedata array, but this means that constructors for different ADTs could have clashing tags. Here I use the hash of the ADT name (guaranteed to be globally unique by the module) and the index to produce the tag for the constructor. The module also keeps a reverse mapping of tags to constructors to make it easy to retrieve.

I followed the example of the structural hash in passing around the hashed values and I hope I have done so correctly. I have added some round-trip tests to ensure that hashes map back to the same constructors each time. Please review @jroesch, @wweic, @icemelon9, @MarisaKirisame (reviewers on #3299)

@@ -264,6 +244,26 @@ def __init__(self, mod, ctx, target):
self.target = target
self._intrp = _backend.CreateInterpreter(mod, ctx, target)

def _arg_to_ast(self, arg):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _arg_to_ast(self, arg):
def arg_to_ast(self, arg):

ditto below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it can't be a method because another object also uses it, so I am going to make it a function that takes the mod as a parameter instead

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _ is to mark private in Python, its convention don't change name.

@slyubomirsky
Copy link
Contributor Author

I realized that just the index can be a sufficient tag if you have type-checked the code and therefore know what global var to check. Perhaps it is not strictly necessary to keep the reverse mapping but having it makes something like arg_to_ast easier to implement, namely by not needing to run a type check to do it.

@@ -110,15 +111,22 @@ void ModuleNode::Add(const GlobalVar& var,
AddUnchecked(var, checked_func);
}

void ModuleNode::RegisterConstructors(const GlobalTypeVar& var, const TypeData& type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using hash value as tag makes opcode not very intuitive to understand, How about maintaining a continuous range [0, num_constructors) for each ADT type?

Then for the following code:

type t = | Alice | Bob | Charlie | David

let test v =
  match v with
  | Alice   -> 1
  | Bob     -> 2
  | Charlie -> 3
  | David   -> 4

We can generate opcode like:

(switch* v
          case int 0: 1
          case int 1: 2
          case int 2: 3
          case int 3: 4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point and I agree hashes are not ideal. The reason I went with hashes is to ensure that the tag will be determined completely by the name of the global type var and the index of the constructor in the type data and thus will not be affected, for example, by the order in which types are defined. It also means that tags will not collide across different ADTs. If hashes are undesirable, we can take the approach (as I alluded to before) of requiring a type check and only looking at the index of the constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I vote for eliminating the hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the constructor value carry information about its type, then? The case that is most difficult is the one in arg_to_ast: given only the constructor value, how do you work backwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for VM, we might need to discard type information in Object. As we already drop the type information in byte codes we generate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see what I can do about arg_to_ast then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, though, that it will be possible to drop type information for ADT values without ensuring that all tags be globally unique (and I'm not sure of any better way to do that than hashing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree there is no other way . And seems like interpreter has the need to map runtime value back to AST node to support user provided bindings.

How about make a 8 bit hash instead and put it in the most significant byte of the tag. So when we debug, we can easily tell the variant from the least significant byte. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting suggestion, definitely more readable too. I will look into implementing that, thanks.

@slyubomirsky
Copy link
Contributor Author

@wweic I've gone with your suggestion. I hope I've done the hash truncation in the right way.

// The hash will be used as the most significant byte of the tag, with the index of
// the constructor in the less significant bytes
size_t hash = std::hash<std::string>()(var->var->name_hint);
int64_t prefix = static_cast<int64_t>(hash & 0xff) << 56;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use int32? I think that's more than enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, if we're truncating.

Copy link
Contributor

@wweic wweic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jroesch
Copy link
Member

jroesch commented Jul 5, 2019

I think as far as RTTI for the interpreter this is a good tradeoff.

@jroesch jroesch merged commit 3d2a056 into apache:master Jul 5, 2019
wweic pushed a commit to wweic/tvm that referenced this pull request Jul 11, 2019
…more robust (apache#3369)

* Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate

* Lint and build fixes

* Add round-tripping test for getting constructors by tag

* Use int64_t everywhere for tags

* Add additional identity check

* Bring out _arg_to_ast again

* Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags

* Use int32 instead of int64 for tag
wweic pushed a commit to wweic/tvm that referenced this pull request Jul 11, 2019
…more robust (apache#3369)

* Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate

* Lint and build fixes

* Add round-tripping test for getting constructors by tag

* Use int64_t everywhere for tags

* Add additional identity check

* Bring out _arg_to_ast again

* Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags

* Use int32 instead of int64 for tag
wweic pushed a commit to neo-ai/tvm that referenced this pull request Jul 11, 2019
…more robust (apache#3369)

* Use hash of ADT name and constructor idx to generate tag, add reverse mapping to module and use where appropriate

* Lint and build fixes

* Add round-tripping test for getting constructors by tag

* Use int64_t everywhere for tags

* Add additional identity check

* Bring out _arg_to_ast again

* Use 8-bit hash of GTV name as MSB of tag, index as LSB for more readable tags

* Use int32 instead of int64 for tag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants