-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMD] Add buffer support #4716
base: main
Are you sure you want to change the base?
[AMD] Add buffer support #4716
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just putting a blocker on this as some pieces will need a bit more discussions. Lei had mentioned those to me so it's not a surprise but haven't had a chance to discuss it with Phil and the rest of the team yet
Do we have some data on the performance impact of this feature? Considering the cost in extra compilations and maintenance it would be good to have this information |
After having a quick chat with @ptillet one problem is that the specialization will apply to all backends even the ones that can't take advantage of that. |
Absolutely fine, I put it here exactly to have discussions while we get on with #4638 |
fbeb3a6
to
b426ce4
Compare
b426ce4
to
6dafa47
Compare
8894638
to
0eae68c
Compare
In this PR I am trying to refactor the specializations that we apply to the signature of a given function in Triton. Basically, given a kernel there are some argument properties that can help compilation. E.g., divisibility by 16 and the fact that an integer is equal to 1. In a previous PR: #4716, I needed other specializations to add buffer support in the AMD backend (and get back some performance when we were using unaligned masked loads). So this is my attempt to redesign the specialization support to introduce per-backend specializations. The idea is that `AttrsDescriptor` is now the class that is taking care of doing the analysis of the parameters and adding the specialization. It also has a function table where more specializations can be added per-backend.
0eae68c
to
de67cc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff thanks a ton for this @giuseros! Overall looks quite good to me; I just have a bunch of small issues.
queue.push_back(operand.getDefiningOp()); | ||
} | ||
|
||
// 2. Check the that pointer is not a block argument. We cannot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is duplicated from L255?
Thanks for the review @antiagainst ! I addressed almost all the comments except testing. I will do that in the next commit (possible later today or tomorrow) |
This PR is building on top of #4638 to finally add support for buffer operations. For now we will focus on buffer load/store, but in the future we might add more. What this PR is doing:
tt.non_negative
) and the size of the memory buffers passed (tt.within_2gb
)tt.within_2gb
propertyAMDGCN_USE_BUFFER_OPS
. In this way we can enable the feature gradually and check for possible performance/correctness issues.