Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check by ianbarber · Pull Request #10066 · triton-lang/triton

ianbarber · 2026-04-17T19:40:58Z

The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask.

For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span.

The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference.

Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
This PR does not need a test because existing tests pass, and the fix only makes the safety check stricter .
I have not added any lit tests.

ianbarber · 2026-04-19T15:23:18Z

I think this is a flake with mi300? rebased to fix merge

The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask. For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span. The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference. Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends.

lezcano · 2026-04-20T07:09:08Z

~~please stop rebasing. If there is any flaky test I'll just rerun it manually~~ nvm, I see that there was a fix for CI on top of main. Fair enough. It seems that there's other flaky tests on amd tho...

…riton-lang#10066) The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask. For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span. The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference. Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends. # New contributor declaration - [x] I am not making a trivial change, such as fixing a typo in a comment. - [x] I have written a PR description following these [rules](https://cbea.ms/git-commit/#why-not-how). - [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`. - [x] This PR does not need a test because existing tests pass, and the fix only makes the safety check stricter . - [x] I have not added any `lit` tests.

ianbarber requested review from antiagainst, ptillet and zhanglx13 as code owners April 17, 2026 19:41

lezcano approved these changes Apr 18, 2026

View reviewed changes

lezcano enabled auto-merge (squash) April 18, 2026 17:32

auto-merge was automatically disabled April 19, 2026 15:22
Head branch was pushed to by a user without write access

ianbarber force-pushed the fix-mask-span-affine-offset branch from 02ce4bb to ae7008b Compare April 19, 2026 15:22

lezcano enabled auto-merge (squash) April 19, 2026 19:42

auto-merge was automatically disabled April 19, 2026 22:26
Head branch was pushed to by a user without write access

ianbarber force-pushed the fix-mask-span-affine-offset branch from ae7008b to 0af838d Compare April 19, 2026 22:26

ianbarber force-pushed the fix-mask-span-affine-offset branch from 0af838d to 6d55593 Compare April 20, 2026 00:43

lezcano enabled auto-merge (squash) April 20, 2026 07:08

lezcano merged commit 6684293 into triton-lang:main Apr 20, 2026
17 of 18 checks passed

leijurv mentioned this pull request Apr 23, 2026

Potential bug in ld/stmatrix lowering #9606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check#10066

Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check#10066
lezcano merged 1 commit into
triton-lang:mainfrom
ianbarber:fix-mask-span-affine-offset

ianbarber commented Apr 17, 2026

Uh oh!

ianbarber commented Apr 19, 2026

Uh oh!

lezcano commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ianbarber commented Apr 17, 2026

New contributor declaration

Uh oh!

ianbarber commented Apr 19, 2026

Uh oh!

lezcano commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lezcano commented Apr 20, 2026 •

edited

Loading