-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not promote broadcast only groups #4154
Conversation
!test |
Review updated until commit 3db4f2a Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
!test --diff |
!test |
This PR attempts to not promote a loop group when it only consists of broadcast IDs. For example, consider a fusion as shown below:
Here, the interesting part is the innermost loop ID of
T2
,bS11{1}
. BecausebS5
is promoted toiS3
(oriS7
),bS11
is also promoted to a non-broadcast ID that is exact mapped withiS13
andiS15
. However, in this case,bS11
doesn't really need to be promoted. More specifically, as long as a loop group only consists of broadcast IDs, the group should not need to be promoted.Currently, the generated CUDA kernel with
NVFUSER_ENABLE=id_model(all)
looks like below:The code is not incorrect, but
T2
is redundantly defined over the loop ofT1.logical_size[1]
because of the promotion ofbS11
. Note that the allocation ofT2
is not affected because broadcast IDs are excluded before promotion.In this PR, for loop groups that only consist of broadcast IDs, promotion to non-broadcast is reverted. With the change, the above fusion results in the kernel below: