-
Notifications
You must be signed in to change notification settings - Fork 58
Allocate and Deallocate HostIr insertion in FusionKernelRuntime #4329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate this into two PRs, the first one being a rollback of #4303 (with fixes of course) and the second being insert deallocations?
e6d02e1
to
3fce074
Compare
output_tensor, info.tv, expr_eval); | ||
info.shape_info.allocation_sizes = alloc_sizes; | ||
info.shape_info.allocation_strides = alloc_strides; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure whether this may have any unintended side effects. I will take a look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jjsjann123 do you recall why it was problematic for pre-allocated outputs to have allocation domains?
848f41f
to
d0d169e
Compare
!test |
1 similar comment
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks good overall. I'm also unsure about the change executor.cpp. So cc'ed Jie
output_tensor, info.tv, expr_eval); | ||
info.shape_info.allocation_sizes = alloc_sizes; | ||
info.shape_info.allocation_strides = alloc_strides; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @jjsjann123 do you recall why it was problematic for pre-allocated outputs to have allocation domains?
507b9b1
to
a103e0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with comments!
The PR description seems outdated. Please fix that as well.
!test |
1 similar comment
!test |
!test |
d02cdb5
to
c0848ed
Compare
!test |
Co-authored-by: Jingyue Wu <[email protected]>
5d8db2d
to
96427a0
Compare
!test |
Follow-up PR to #4286. The PR will insert allocate ops for every LaunchKernel output, and also insert a Deallocate right after the last use of every input expr in the Hostir container. It adds a test to check the number of Deallocate ops and the max memory usage is correct for an example fusion as well.