[Canonicalize] Transform ptr_to_int->add->int_to_ptr to addptr#9971
Conversation
Fix AxisInfoAnalysis to preserve contiguity information through tt.int_to_ptr and tt.ptr_to_int operations by registering them as CastOpAxisInfoVisitor. Previously, these operations would reset contiguity to 1, preventing proper vectorization. This fixes async_copy_local_to_global failing to generate wide vector stores (e.g., b128) when pointer arithmetic is performed via integer operations (ptr->int->add->int->ptr pattern). Test case added for async_copy_local_to_global with ptr_to_int/ int_to_ptr casts, verifying 128-bit stores are generated for sizePerThread=[8] with f16 elements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| CastOpAxisInfoVisitor<triton::IntToPtrOp>, | ||
| CastOpAxisInfoVisitor<triton::PtrToIntOp>, |
There was a problem hiding this comment.
I don't think that's correct since contiguous pointers wouldn't generate contiguous integers.
Also ideally you unit test only running analysis to show the bug and new behavior
There was a problem hiding this comment.
I don't think that's correct since contiguous pointers wouldn't generate contiguous integers.
Right. Perhaps I should just handle the specific pattern: PtrToInt -> add -> IntToPtr. because this pattern is basically equivalent to pointer arithmetic and should preserve the contiguity from the original pointer.
There was a problem hiding this comment.
can we canonicalize this this into ptr arithmetic?
Add canonicalization pattern for IntToPtrOp that recognizes the pattern: int_to_ptr(addi(ptr_to_int(ptr), constant_offset)) and transforms it to: addptr(ptr, element_offset) where element_offset = constant_offset / element_size_bytes. This pattern appears when performing pointer arithmetic via integer operations (e.g., adding byte offsets to pointers). By canonicalizing to addptr, AxisInfoAnalysis can correctly track contiguity, enabling proper vectorization for operations like async_copy_local_to_global. The pattern only applies when: - The offset is a compile-time constant (IntegerAttr or SplatElementsAttr) - The byte offset is evenly divisible by the element size Added to both standard canonicalize and gluon-canonicalize passes. Tests added for positive cases (f32, f16, commutative) and negative cases (non-constant offset, indivisible offset). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| return failure(); | ||
| } | ||
|
|
||
| auto ptrToIntOp = ptrToIntValue.getDefiningOp<PtrToIntOp>(); |
There was a problem hiding this comment.
nit: make ptrToIntValue a PtrToIntOp directly to avoid redundant cast
| } else { | ||
| // Scalar case | ||
| elementOffsetValue = arith::ConstantOp::create( | ||
| rewriter, loc, rewriter.getI32IntegerAttr(elementOffset)); |
There was a problem hiding this comment.
I don't think you can assume the offset is a i32.
| //-- IntToPtrOp -- | ||
| // Canonicalize: int_to_ptr(addi(ptr_to_int(ptr), constant_offset)) -> | ||
| // addptr(ptr, element_offset) Only when offset is constant and divisible by | ||
| // element size. |
There was a problem hiding this comment.
Wouldn't it be more general to split this into 2 patterns:
int_to_ptr(addi(val, offset)) -> addptr(int_to_ptr(val), element_offset)
int_to_ptr(ptr_to_int(ptr)) -> ptr
There was a problem hiding this comment.
Good point. Have refactored to 2 patterns.
| offsetValue = addOp.getRhs(); | ||
| } else if (auto rhsPtrToInt = addOp.getRhs().getDefiningOp<PtrToIntOp>()) { | ||
| ptrToIntOp = rhsPtrToInt; | ||
| offsetValue = addOp.getLhs(); |
There was a problem hiding this comment.
addi is canonicalized so constants are always the rhs argument.
…uity Adding a runtime scalar offset to an existing pointer tensor (output_ptrs + elem_delta) breaks AxisInfoAnalysis contiguity in current Triton, causing fallback to 16-bit async stores which gfx1250 doesn't support (assertion failure). Instead of modifying the existing pointer tensor, build the remote pointer tensor from scratch: (output_ptr + tile_base + elem_delta) + flat_idx. This gives AxisInfoAnalysis the splat(scalar) + arange pattern it trusts, enabling b128 vectorized async stores. Upstream Triton fix: triton-lang/triton#9971 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n-lang#9971) Add canonicalization pattern for IntToPtrOp that recognizes the pattern: int_to_ptr(addi(ptr_to_int(ptr), constant_offset)) and transforms it to: addptr(ptr, element_offset) where element_offset = constant_offset / element_size_bytes. This pattern appears when performing pointer arithmetic via integer operations (e.g., adding byte offsets to pointers). By canonicalizing to addptr, AxisInfoAnalysis can correctly track contiguity, enabling proper vectorization for operations like async_copy_local_to_global. The pattern only applies when: - The offset is a compile-time constant (IntegerAttr or SplatElementsAttr) - The byte offset is evenly divisible by the element size Added to both standard canonicalize and gluon-canonicalize passes. Tests added for positive cases (f32, f16, commutative) and negative cases (non-constant offset, indivisible offset).
Uh oh!
There was an error while loading. Please reload this page.