-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][VM] Add AllocTensor instruction and better instruction printer #3306
Conversation
runtime::TVMArgsSetter setter(values.data(), codes.data()); | ||
size_t arity = 0; | ||
for (Index i = 0; i < arg_count; i++) { | ||
if (args[i].ptr_->tag == ObjectTag::kDatatype) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops like concatenate
take in a tuple of tensors. And fusion sometimes can make a fused function take a tuple as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry for the slow review Haichen, could you provide an example of the updated instruction printer? my only concern is I'm not sure if we should introduce duplicate instructions for readability wins, should we really differentiate static vs. non-static allocation. I am not a 100% sure this matches the design I had in mind for the memory manifestation and optimization. We could just do this for now, and later change it when the memory ships but I was thinking we should introduce a low level concept of storage independent from allocation of tensors. For example:
I guess we could have two low-level instruction variants which take storage, dtype and shape but in general I don't like specializing/duplicating as it increases the number of potential code paths, and the allocation mechanism should have no real differences afaict. The rest of the changes look good to me. |
@jroesch I don't have a strong argument on adding new static tensor allocation. But I don't see adding this new instruction will cause any overhead either. I suggest that we can have this static alloc instruction for now, and later change it to alloc tensor from storage. I can also reserve an opcode for alloc storage instruction. What do you think? I'll add examples of instruction printer. |
Updated instruction printer outputs: |
Okay looks good, I plan on moving back to VM after tutorial |
apache#3306) * Update vm print & add AllocTensor instruction * patch * fix invoke packed * update cmake * tweak move * update invoke_closure * lint * add doc * tweak
apache#3306) * Update vm print & add AllocTensor instruction * patch * fix invoke packed * update cmake * tweak move * update invoke_closure * lint * add doc * tweak
AllocTensor
instruction toAllocTensorReg
instruction for dynamic shape allocation, and addAllocTensor
instruction for const shape. Current everyAllocTensor
requires aLoadConst
instruction before, which I think significantly increases the number of instructions and reduces the readability. Therefore I think it's better to have bothAllocTensor
andAllocTensorReg
instruction.cc: @jroesch @wweic @zhiics @tqchen