Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1960

umangyadav · 2025-08-26T20:47:26Z

Motivation

Fixes https://github.com/ROCm/rocMLIR-internal/issues/1969

Technical Details

Current logic uses DFS without caching therefore it revisits some ops multiple times. This leads to adding blockArgument for the firstGemm multiple times. But then it only keeps index for one of the firstGemmIndex blockArguments i.e. last one.

This leads to mismatch in sizes of preSoftmaxElementwiseInputs() and block arguments list. Which leads to eventual out of bound indexing into preSoftmaxElementwiseInputs()

Refactors logic so that it can cache ops/values found during "match" phase and use them directly during "rewrite" phase.

Test Plan

Added E2E test that exposes the problem and it passes.

Copilot

Pull Request Overview

This PR refactors the creation of ElementWise Region for Gemm+Gemm like operations to fix a bug where the Depth-First Search (DFS) algorithm revisited operations multiple times, causing incorrect block argument indexing and eventual out-of-bounds access.

Replaced the recursive DFS approach with a cached visitor pattern to eliminate redundant visits
Introduced ElementwiseRegionFinder struct to encapsulate region finding and rewrite logic
Updated all affected pattern matchers to use the new cached approach

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
mlir/test/fusion/pr-e2e/gemm-gemm/mixr-gemm-gemm-multiple-traces-to-first-gemm.mlir	Adds end-to-end test case that exposes the multiple traces bug
mlir/lib/Conversion/TosaToRock/TosaToRock.cpp	Refactors element-wise region finding from recursive DFS to cached visitor pattern

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp

Co-authored-by: Copilot <[email protected]>

bdevorem · 2025-08-26T22:38:01Z

I tested commit 6d11fa2 with my branch of MIGraphX (which has the GEMM+GEMM fusion on the MIGraphX side) with the testcase and it compiles successfully e2e:

[2025-08-26 22:31:40]
[ MIGraphX Version: 2.14.0.cb9bc5d01-dirty ] Complete(0.659944s): ./build/bin/driver compile test.mxr

As discussed elsewhere, commit bfb7605 has a faulty copilot fix.

justinrosner

Maybe this is something that can be addressed by a future ticket, but do you think it would be worthwhile creating some variant of the df_iterator framework for our Rock ops similar to what LLVM has? This could potentially help with code reusability when implementing custom DFS traversals of our ops.

justinrosner · 2025-08-28T14:19:28Z

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp

+    }
+    // Right now, this is a bit restricted that we only allow reshape-like
+    // ops between in the elementwise tree that get fused to the fusion point.
+    // TODO: however, the latest code gridwise-gemm-to-blockwise should tackle


Is there a ticket opened for this TODO so that we don't lose track of it?

I don't think there is an issue. I copied pasted earlier todo as it is.

We haven't hit any cases where it is not generating attention kernel because of this limitation yet.

Also i've seen in most (or all) of the cases tensor.expand and tensor.collapse along with tosa.add covers all the invertible transforms. e.g. broadcast, reshapes, squeeze/unsqueeze.

umangyadav · 2025-08-28T14:38:14Z

Maybe this is something that can be addressed by a future ticket, but do you think it would be worthwhile creating some variant of the df_iterator framework for our Rock ops similar to what LLVM has? This could potentially help with code reusability when implementing custom DFS traversals of our ops.

Yes sounds like a good idea. but we are only doing DFS in TosaToRock.cpp at this point afaict. We do use graph traversals with bufferDependencyAnalysis in some places. We will have to think about if it would be worth using df_iterator.

#1960) * Refactor matching logic for elemwise tree

…emm like ops (#1965) * Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops (#1960) * Refactor matching logic for elemwise tree * Fix merge issues

umangyadav added 2 commits August 26, 2025 20:38

Refactor matching logic for elemwise tree

f79d525

formatting

dc92124

umangyadav requested a review from causten as a code owner August 26, 2025 20:47

umangyadav self-assigned this Aug 26, 2025

umangyadav requested review from Copilot, djramic, justinrosner and stefankoncarevic and removed request for causten August 26, 2025 20:49

Copilot AI reviewed Aug 26, 2025

View reviewed changes

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp Show resolved Hide resolved

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp Show resolved Hide resolved

umangyadav commented Aug 26, 2025

View reviewed changes

mlir/lib/Conversion/TosaToRock/TosaToRock.cpp Outdated Show resolved Hide resolved

umangyadav and others added 4 commits August 26, 2025 16:53

Update mlir/lib/Conversion/TosaToRock/TosaToRock.cpp

6d11fa2

Update mlir/lib/Conversion/TosaToRock/TosaToRock.cpp

bfb7605

Co-authored-by: Copilot <[email protected]>

Remove unused param

b16900e

add some comments

be1866b

umangyadav and others added 2 commits August 26, 2025 23:20

Revert back copilot reivew comment

c707440

Merge branch 'develop' into fixGemmBlockArgs

8280838

justinrosner reviewed Aug 28, 2025

View reviewed changes

justinrosner approved these changes Aug 28, 2025

View reviewed changes

Merge branch 'develop' into fixGemmBlockArgs

d6f6853

umangyadav merged commit 97a0085 into develop Aug 28, 2025
16 of 22 checks passed

umangyadav deleted the fixGemmBlockArgs branch August 28, 2025 20:02

umangyadav added a commit that referenced this pull request Aug 28, 2025

Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops (

9e34019

#1960) * Refactor matching logic for elemwise tree

umangyadav mentioned this pull request Aug 28, 2025

[Backport] Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1965

Merged

bdevorem mentioned this pull request Aug 30, 2025

Fuse GEMM+GEMM with rocMLIR ROCm/AMDMIGraphX#4261

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1960

Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1960

Uh oh!

umangyadav commented Aug 26, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdevorem commented Aug 26, 2025 •

edited

Loading

Uh oh!

justinrosner left a comment

Uh oh!

justinrosner Aug 28, 2025

Uh oh!

umangyadav Aug 28, 2025

Uh oh!

umangyadav commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1960

Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops #1960

Uh oh!

Conversation

umangyadav commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bdevorem commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinrosner left a comment

Choose a reason for hiding this comment

Uh oh!

justinrosner Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

umangyadav Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

umangyadav commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

umangyadav commented Aug 26, 2025 •

edited

Loading

bdevorem commented Aug 26, 2025 •

edited

Loading