Skip to content

Fix issue in prefetching column major matrix.#4611

Merged
whitneywhtsang merged 8 commits intomainfrom
chengjun/fix_prefetch_issue
Jul 14, 2025
Merged

Fix issue in prefetching column major matrix.#4611
whitneywhtsang merged 8 commits intomainfrom
chengjun/fix_prefetch_issue

Conversation

@chengjunlu
Copy link
Copy Markdown
Contributor

@chengjunlu chengjunlu commented Jul 3, 2025

The prefetching lowering uses the incorrect shape sizes to get the tiling shape for column major matrix.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the tiling-shape computation for column-major matrices in the prefetch lowering by swapping the dimensions and updating the tensor type; it also adds new row-major prefetch tests.

  • Swap the tensor shape dimensions for column-major support and recreate the tensor type
  • Add MLIR tests covering scalar-mask and block-pointer prefetch in row-major mode

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp Swap dimensions and reconstruct tensorType for column-major matrices
test/TritonIntelGPU/prefetch-to-llvm.mlir Add new test cases for row-major ttig.prefetch scenarios
Comments suppressed due to low confidence (1)

test/TritonIntelGPU/prefetch-to-llvm.mlir:266

  • The new tests cover row_major prefetch paths but lack a column_major case. Add a test with ttig.block_io = "column_major" to verify the column-major fix.
}

Comment thread third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp
etiotto
etiotto previously approved these changes Jul 7, 2025
@etiotto
Copy link
Copy Markdown
Contributor

etiotto commented Jul 7, 2025

@chengjunlu pls add an attached issue to the PR.

@etiotto
Copy link
Copy Markdown
Contributor

etiotto commented Jul 7, 2025

@chengjunlu any impact on the benchmarks ?

@whitneywhtsang
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression on GEMM (A^t@B): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16122585980/job/45491842774
GEMM (A^t@B) passes with second attempt, but there may be some issues of this change with that benchmark, as the same change caused failures before with the same benchmark: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14777632147/job/41503097292.

@chengjunlu chengjunlu force-pushed the chengjun/fix_prefetch_issue branch 2 times, most recently from d671bdc to dd6c641 Compare July 8, 2025 02:54
@etiotto
Copy link
Copy Markdown
Contributor

etiotto commented Jul 8, 2025

Regression on GEMM (A^t@B): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16122585980/job/45491842774 GEMM (A^t@B) passes with second attempt, but there may be some issues of this change with that benchmark, as the same change caused failures before with the same benchmark: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14777632147/job/41503097292.

Thanks for the benchmark run @whitneywhtsang, looks like the code generated for the 2D block prefetch is incorrect.

@etiotto etiotto dismissed their stale review July 8, 2025 20:29

Failure in unit tests

@chengjunlu chengjunlu force-pushed the chengjun/fix_prefetch_issue branch 3 times, most recently from 88aed22 to 3c87de3 Compare July 9, 2025 08:46
@etiotto
Copy link
Copy Markdown
Contributor

etiotto commented Jul 9, 2025

Waiting for a PVC 1550 to run the benchmark tests again.

@chengjunlu chengjunlu force-pushed the chengjun/fix_prefetch_issue branch from 87b7652 to 1e9fd7e Compare July 10, 2025 00:05
Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>
@whitneywhtsang
Copy link
Copy Markdown
Contributor

@whitneywhtsang
Copy link
Copy Markdown
Contributor

Fixed the error shown in #4611 (comment) in main branch and rebased this PR.
New Benchmark BMG CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16202126345

@chengjunlu
Copy link
Copy Markdown
Contributor Author

Rerun the failed tests which is caused by test of interpreter. The changes has no impact to the Triton Interpreter.

@chengjunlu
Copy link
Copy Markdown
Contributor Author

Created an upstream PR triton-lang/triton#7470 for the Interpreter failure.

@etiotto
Copy link
Copy Markdown
Contributor

etiotto commented Jul 11, 2025

Created an upstream PR triton-lang/triton#7470 for the Interpreter failure.

That PR is merged in and in our latest main branch. Rebasing and trying CI again.

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>
@whitneywhtsang
Copy link
Copy Markdown
Contributor

Depends on #4690

@whitneywhtsang
Copy link
Copy Markdown
Contributor

whitneywhtsang commented Jul 11, 2025

@whitneywhtsang whitneywhtsang enabled auto-merge (squash) July 11, 2025 21:58
@chengjunlu
Copy link
Copy Markdown
Contributor Author

chengjunlu commented Jul 14, 2025

There are improvements on the flex attention kernel with the fixes.
image

image

No regression on GEMM performance.
image

The result is positive.

@whitneywhtsang whitneywhtsang merged commit c104666 into main Jul 14, 2025
15 checks passed
@whitneywhtsang whitneywhtsang deleted the chengjun/fix_prefetch_issue branch July 14, 2025 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants