Skip to content

[GPU] Inject index hints during MMA lane distribution#23152

Merged
Max191 merged 3 commits intoiree-org:mainfrom
Max191:inject-transpose-load-hints
Jan 16, 2026
Merged

[GPU] Inject index hints during MMA lane distribution#23152
Max191 merged 3 commits intoiree-org:mainfrom
Max191:inject-transpose-load-hints

Conversation

@Max191
Copy link
Contributor

@Max191 Max191 commented Jan 15, 2026

Injects iree_codegen.index_hint ops on offsets in the populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the hints here, because the semantic information about the offsets is readily available, and can easily carry down to the later optimization pass that converts loads into transpose loads using these hints. These hints are intended for load to transpose load optimizations, but they are set unconditionally regardless of transpositions for simplicity. The later optimization pass is responsible for determining when the loads are transposed, since it is more explicit at that point.

The hint ops will be dropped right after LLVMGPULowerExecutableTarget, since at that point the index_hint ops should already have been used. Currently, the pass that consumes these hint ops is not enabled, so the hint ops will be doing nothing until the pass is added.

Copy link
Contributor

@krzysz00 krzysz00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, lgtm

Wrap delinearized lane IDs with iree_codegen.index_hint ops during MMA
operand distribution. This annotates indices with their lane-variance
semantics, enabling downstream passes to recognize transpose load
opportunities.

When distributing MMA operands across lanes, the lane ID is delinearized
into row and column components. The row component is uniform within
16-lane groups (lane_constant<16>), while the column component increments
across consecutive lanes (lane_increment<16>). These hints propagate
through index arithmetic and are later consumed by the transpose load
lowering pass.

This is the producer side of the transpose load optimization - hints are
injected here and consumed by ROCDLLoadToTransposeLoadPass.

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
@Max191 Max191 force-pushed the inject-transpose-load-hints branch from 261bf8f to 91c92b4 Compare January 16, 2026 16:38
@Max191 Max191 marked this pull request as ready for review January 16, 2026 16:39
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
@Max191 Max191 merged commit 7e23975 into iree-org:main Jan 16, 2026
74 of 83 checks passed
@Max191 Max191 deleted the inject-transpose-load-hints branch January 16, 2026 22:17
Comment on lines +801 to +802
return SmallVector<Value>(delinearizedLaneId.begin(),
delinearizedLaneId.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer llvm::to_vector or llvm::to_vector_of<T>

Comment on lines +814 to +815
for (size_t i = 1; i <= basis.size(); ++i) {
groupSize = basis[basis.size() - i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not recalculate the end

Suggested change
for (size_t i = 1; i <= basis.size(); ++i) {
groupSize = basis[basis.size() - i];
for (size_t i = 1, e = basis.size(); i <= e; ++i) {
groupSize = basis[e - i];

keshavvinayak01 pushed a commit that referenced this pull request Jan 27, 2026
Injects iree_codegen.index_hint ops on offsets in the
populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the
hints here, because the semantic information about the offsets is
readily available, and can easily carry down to the later optimization
pass that converts loads into transpose loads using these hints. These
hints are intended for load to transpose load optimizations, but they
are set unconditionally regardless of transpositions for simplicity. The
later optimization pass is responsible for determining when the loads
are transposed, since it is more explicit at that point.

The hint ops will be dropped right after LLVMGPULowerExecutableTarget,
since at that point the index_hint ops should already have been used.
Currently, the pass that consumes these hint ops is not enabled, so the
hint ops will be doing nothing until the pass is added.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Signed-off-by: Keshav Vinayak Jha <keshavvinayakjha@gmail.com>
MaheshRavishankar pushed a commit to MaheshRavishankar/iree that referenced this pull request Feb 24, 2026
Injects iree_codegen.index_hint ops on offsets in the
populateOperandOffsetsSizesStrides functions for MMAAttrs. We inject the
hints here, because the semantic information about the offsets is
readily available, and can easily carry down to the later optimization
pass that converts loads into transpose loads using these hints. These
hints are intended for load to transpose load optimizations, but they
are set unconditionally regardless of transpositions for simplicity. The
later optimization pass is responsible for determining when the loads
are transposed, since it is more explicit at that point.

The hint ops will be dropped right after LLVMGPULowerExecutableTarget,
since at that point the index_hint ops should already have been used.
Currently, the pass that consumes these hint ops is not enabled, so the
hint ops will be doing nothing until the pass is added.

---------

Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants