Skip to content

Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check#10066

Merged
lezcano merged 1 commit into
triton-lang:mainfrom
ianbarber:fix-mask-span-affine-offset
Apr 20, 2026
Merged

Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check#10066
lezcano merged 1 commit into
triton-lang:mainfrom
ianbarber:fix-mask-span-affine-offset

Conversation

@ianbarber
Copy link
Copy Markdown
Contributor

The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask.

For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span.

The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference.

Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends.

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • This PR does not need a test because existing tests pass, and the fix only makes the safety check stricter .

  • I have not added any lit tests.

@lezcano lezcano enabled auto-merge (squash) April 18, 2026 17:32
auto-merge was automatically disabled April 19, 2026 15:22

Head branch was pushed to by a user without write access

@ianbarber ianbarber force-pushed the fix-mask-span-affine-offset branch from 02ce4bb to ae7008b Compare April 19, 2026 15:22
@ianbarber
Copy link
Copy Markdown
Contributor Author

I think this is a flake with mi300? rebased to fix merge

@lezcano lezcano enabled auto-merge (squash) April 19, 2026 19:42
auto-merge was automatically disabled April 19, 2026 22:26

Head branch was pushed to by a user without write access

@ianbarber ianbarber force-pushed the fix-mask-span-affine-offset branch from ae7008b to 0af838d Compare April 19, 2026 22:26
The subslice safety check in lowerLdStMatrix uses a bitmask to verify
that affine offsets don't touch the contiguous part of the tile's offset
dimension. It was using getOutDimSizeLog2 (which returns log2 of the
size) instead of getOutDimSize (the actual size) to construct this mask.

For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas
8-1 = 7 (0b111) correctly checks all bits within the tile span.

The bug makes the check too permissive — it could allow subslices that
overlap the contiguous tile region. Latent because the specific bit
patterns in maskSpanAffineOffset rarely trigger the difference.

Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends.
@ianbarber ianbarber force-pushed the fix-mask-span-affine-offset branch from 0af838d to 6d55593 Compare April 20, 2026 00:43
@lezcano lezcano enabled auto-merge (squash) April 20, 2026 07:08
@lezcano
Copy link
Copy Markdown
Contributor

lezcano commented Apr 20, 2026

please stop rebasing. If there is any flaky test I'll just rerun it manually nvm, I see that there was a fix for CI on top of main. Fair enough. It seems that there's other flaky tests on amd tho...

@lezcano lezcano merged commit 6684293 into triton-lang:main Apr 20, 2026
17 of 18 checks passed
bingyizh233 pushed a commit to bingyizh233/triton that referenced this pull request Apr 20, 2026
…riton-lang#10066)

The subslice safety check in lowerLdStMatrix uses a bitmask to verify
that affine offsets don't touch the contiguous part of the tile's offset
dimension. It was using getOutDimSizeLog2 (which returns log2 of the
size) instead of getOutDimSize (the actual size) to construct this mask.

For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 =
7 (0b111) correctly checks all bits within the tile span.

The bug makes the check too permissive — it could allow subslices that
overlap the contiguous tile region. Latent because the specific bit
patterns in maskSpanAffineOffset rarely trigger the difference.

Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends.

# New contributor declaration
- [x] I am not making a trivial change, such as fixing a typo in a
comment.

  - [x] I have written a PR description following these
  [rules](https://cbea.ms/git-commit/#why-not-how).

- [x] I have run `pre-commit run --from-ref origin/main --to-ref HEAD`.

- [x] This PR does not need a test because existing tests pass, and the
fix only makes the safety check stricter .

  - [x] I have not added any `lit` tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants