Select by warp id in AsyncWarp if Register Sharing is enabled#4334
Select by warp id in AsyncWarp if Register Sharing is enabled#4334
Conversation
|
Review updated until commit 56b6c0b Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
af66969 to
e937d1f
Compare
2403eba to
78acce4
Compare
|
!test |
78acce4 to
05d6bbb
Compare
|
!test |
76f7e97 to
60e5ba4
Compare
|
!test |
|
@zasdfgbnm I had to push 56b6c0b because I ran into the e.g., I can clean things up with #4395, which keeps the same padding rules even when register sharing is disabled. TL;DR: Having different padding rules for register sharing is annoying. |
Background: Lowering pads a thread block ParallelType for the mbarrier async operations in the fusion with
WarpSpecializedcircular buffering.Problem: Picking Warps
Solution - How to pick warp and threads?
Problem: ElectSync doesn't work if
blockDim.x < 32Solution: Replace ElectSync with
thread_id == 0Code snippet from HopperMatmulTest/MLPGemmPersistentBroadcastInputs.NumWarpGroups/2
Details
Code snippet from Hopper/TmaCircularBufferingTest.Matmul/stage_2_prefetch_neg2_M_500_N_2048_WarpSpecializedOnTIDyRegisterSharing_64_168_CpAsyncBulkTensorTile
Details