Skip to content

[Canonicalize] Transform ptr_to_int->add->int_to_ptr to addptr#9971

Merged
ThomasRaoux merged 8 commits into
triton-lang:mainfrom
yiqian1:qian/fix-axisInfo-with-cast
Apr 10, 2026
Merged

[Canonicalize] Transform ptr_to_int->add->int_to_ptr to addptr#9971
ThomasRaoux merged 8 commits into
triton-lang:mainfrom
yiqian1:qian/fix-axisInfo-with-cast

Conversation

@yiqian1
Copy link
Copy Markdown
Contributor

@yiqian1 yiqian1 commented Apr 8, 2026

Add canonicalization pattern for IntToPtrOp that recognizes the pattern:
  int_to_ptr(addi(ptr_to_int(ptr), constant_offset))

and transforms it to:
  addptr(ptr, element_offset)

where element_offset = constant_offset / element_size_bytes.

This pattern appears when performing pointer arithmetic via integer
operations (e.g., adding byte offsets to pointers). By canonicalizing
to addptr, AxisInfoAnalysis can correctly track contiguity, enabling
proper vectorization for operations like async_copy_local_to_global.

The pattern only applies when:
- The offset is a compile-time constant (IntegerAttr or SplatElementsAttr)
- The byte offset is evenly divisible by the element size

Added to both standard canonicalize and gluon-canonicalize passes.

Tests added for positive cases (f32, f16, commutative) and negative
cases (non-constant offset, indivisible offset).

Fix AxisInfoAnalysis to preserve contiguity information through
tt.int_to_ptr and tt.ptr_to_int operations by registering them
as CastOpAxisInfoVisitor. Previously, these operations would reset
contiguity to 1, preventing proper vectorization.

This fixes async_copy_local_to_global failing to generate wide
vector stores (e.g., b128) when pointer arithmetic is performed
via integer operations (ptr->int->add->int->ptr pattern).

Test case added for async_copy_local_to_global with ptr_to_int/
int_to_ptr casts, verifying 128-bit stores are generated for
sizePerThread=[8] with f16 elements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread lib/Analysis/AxisInfo.cpp Outdated
Comment on lines +1175 to +1176
CastOpAxisInfoVisitor<triton::IntToPtrOp>,
CastOpAxisInfoVisitor<triton::PtrToIntOp>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's correct since contiguous pointers wouldn't generate contiguous integers.

Also ideally you unit test only running analysis to show the bug and new behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's correct since contiguous pointers wouldn't generate contiguous integers.

Right. Perhaps I should just handle the specific pattern: PtrToInt -> add -> IntToPtr. because this pattern is basically equivalent to pointer arithmetic and should preserve the contiguity from the original pointer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we canonicalize this this into ptr arithmetic?

yiqian1 and others added 2 commits April 9, 2026 02:06
Add canonicalization pattern for IntToPtrOp that recognizes the pattern:
  int_to_ptr(addi(ptr_to_int(ptr), constant_offset))

and transforms it to:
  addptr(ptr, element_offset)

where element_offset = constant_offset / element_size_bytes.

This pattern appears when performing pointer arithmetic via integer
operations (e.g., adding byte offsets to pointers). By canonicalizing
to addptr, AxisInfoAnalysis can correctly track contiguity, enabling
proper vectorization for operations like async_copy_local_to_global.

The pattern only applies when:
- The offset is a compile-time constant (IntegerAttr or SplatElementsAttr)
- The byte offset is evenly divisible by the element size

Added to both standard canonicalize and gluon-canonicalize passes.

Tests added for positive cases (f32, f16, commutative) and negative
cases (non-constant offset, indivisible offset).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread lib/Dialect/Triton/IR/Ops.cpp Outdated
return failure();
}

auto ptrToIntOp = ptrToIntValue.getDefiningOp<PtrToIntOp>();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make ptrToIntValue a PtrToIntOp directly to avoid redundant cast

Comment thread lib/Dialect/Triton/IR/Ops.cpp Outdated
} else {
// Scalar case
elementOffsetValue = arith::ConstantOp::create(
rewriter, loc, rewriter.getI32IntegerAttr(elementOffset));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can assume the offset is a i32.

@yiqian1 yiqian1 changed the title [AxisInfo] Preserve contiguity through IntToPtr/PtrToInt casts [Canonicalize] Transform ptr_to_int->add->int_to_ptr to addptr Apr 9, 2026
@yiqian1 yiqian1 marked this pull request as ready for review April 9, 2026 14:31
Comment thread lib/Dialect/Triton/IR/Ops.cpp Outdated
//-- IntToPtrOp --
// Canonicalize: int_to_ptr(addi(ptr_to_int(ptr), constant_offset)) ->
// addptr(ptr, element_offset) Only when offset is constant and divisible by
// element size.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more general to split this into 2 patterns:

int_to_ptr(addi(val, offset)) -> addptr(int_to_ptr(val), element_offset)
int_to_ptr(ptr_to_int(ptr)) -> ptr

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Have refactored to 2 patterns.

Comment thread lib/Dialect/Triton/IR/Ops.cpp Outdated
offsetValue = addOp.getRhs();
} else if (auto rhsPtrToInt = addOp.getRhs().getDefiningOp<PtrToIntOp>()) {
ptrToIntOp = rhsPtrToInt;
offsetValue = addOp.getLhs();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addi is canonicalized so constants are always the rhs argument.

Copy link
Copy Markdown
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mawad-amd added a commit to ROCm/iris that referenced this pull request Apr 10, 2026
…uity

Adding a runtime scalar offset to an existing pointer tensor
(output_ptrs + elem_delta) breaks AxisInfoAnalysis contiguity in
current Triton, causing fallback to 16-bit async stores which
gfx1250 doesn't support (assertion failure).

Instead of modifying the existing pointer tensor, build the remote
pointer tensor from scratch: (output_ptr + tile_base + elem_delta)
+ flat_idx. This gives AxisInfoAnalysis the splat(scalar) + arange
pattern it trusts, enabling b128 vectorized async stores.

Upstream Triton fix: triton-lang/triton#9971

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ThomasRaoux ThomasRaoux enabled auto-merge (squash) April 10, 2026 01:50
@ThomasRaoux ThomasRaoux merged commit 89d154d into triton-lang:main Apr 10, 2026
9 checks passed
plognjen pushed a commit to plognjen/triton that referenced this pull request Apr 14, 2026
…n-lang#9971)

Add canonicalization pattern for IntToPtrOp that recognizes the pattern:
      int_to_ptr(addi(ptr_to_int(ptr), constant_offset))

    and transforms it to:
      addptr(ptr, element_offset)

    where element_offset = constant_offset / element_size_bytes.

    This pattern appears when performing pointer arithmetic via integer
operations (e.g., adding byte offsets to pointers). By canonicalizing
    to addptr, AxisInfoAnalysis can correctly track contiguity, enabling
    proper vectorization for operations like async_copy_local_to_global.

    The pattern only applies when:
- The offset is a compile-time constant (IntegerAttr or
SplatElementsAttr)
    - The byte offset is evenly divisible by the element size

    Added to both standard canonicalize and gluon-canonicalize passes.

    Tests added for positive cases (f32, f16, commutative) and negative
    cases (non-constant offset, indivisible offset).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants