From 6d55593c2e7efe34922901e6e998caedf00c1698 Mon Sep 17 00:00:00 2001 From: Ian Barber Date: Fri, 17 Apr 2026 12:34:53 -0700 Subject: [PATCH] Fix maskSpanAffineOffset bitmask in ldmatrix/stmatrix subslice check MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The subslice safety check in lowerLdStMatrix uses a bitmask to verify that affine offsets don't touch the contiguous part of the tile's offset dimension. It was using getOutDimSizeLog2 (which returns log2 of the size) instead of getOutDimSize (the actual size) to construct this mask. For outDimSize=8: log2(8)-1 = 2 (0b010) only checks bit 1, whereas 8-1 = 7 (0b111) correctly checks all bits within the tile span. The bug makes the check too permissive — it could allow subslices that overlap the contiguous tile region. Latent because the specific bit patterns in maskSpanAffineOffset rarely trigger the difference. Fix both the NVIDIA (Utility.cpp) and AMD (MemoryOpToLLVM.cpp) backends. --- third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp | 2 +- third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp b/third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp index 78226c707886..323900cf9df6 100644 --- a/third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp +++ b/third_party/amd/lib/TritonAMDGPUToLLVM/MemoryOpToLLVM.cpp @@ -187,7 +187,7 @@ class TransLocalLoadOpConversion // If we are lowering a subslice, the subslice offsets shall not touch the // contiguous part of the tile - if (maskSpanAffineOffset & (tile.getOutDimSizeLog2(kOffset) - 1)) { + if (maskSpanAffineOffset & (tile.getOutDimSize(kOffset) - 1)) { return failure(); } diff --git a/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp b/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp index c2aff0a770b0..5571af976b91 100644 --- a/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp +++ b/third_party/nvidia/lib/TritonNVIDIAGPUToLLVM/Utility.cpp @@ -332,7 +332,7 @@ LogicalResult lowerLdStMatrix( // If we are lowering a subslice, the subslice offsets shall not touch the // contiguous part of the tile - if (maskSpanAffineOffset & (tile.getOutDimSizeLog2(kOffset) - 1)) { + if (maskSpanAffineOffset & (tile.getOutDimSize(kOffset) - 1)) { return failure(); }