Fix issue in prefetching column major matrix. by chengjunlu · Pull Request #4611 · intel/intel-xpu-backend-for-triton

chengjunlu · 2025-07-03T05:52:13Z

The prefetching lowering uses the incorrect shape sizes to get the tiling shape for column major matrix.

Copilot

Pull Request Overview

This PR fixes the tiling-shape computation for column-major matrices in the prefetch lowering by swapping the dimensions and updating the tensor type; it also adds new row-major prefetch tests.

Swap the tensor shape dimensions for column-major support and recreate the tensor type
Add MLIR tests covering scalar-mask and block-pointer prefetch in row-major mode

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp	Swap dimensions and reconstruct `tensorType` for column-major matrices
test/TritonIntelGPU/prefetch-to-llvm.mlir	Add new test cases for row-major `ttig.prefetch` scenarios

Comments suppressed due to low confidence (1)

test/TritonIntelGPU/prefetch-to-llvm.mlir:266

The new tests cover row_major prefetch paths but lack a column_major case. Add a test with ttig.block_io = "column_major" to verify the column-major fix.

etiotto · 2025-07-07T15:13:41Z

@chengjunlu pls add an attached issue to the PR.

etiotto · 2025-07-07T15:54:20Z

@chengjunlu any impact on the benchmarks ?

whitneywhtsang · 2025-07-07T16:31:59Z

@chengjunlu any impact on the benchmarks ?

CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16122585980

whitneywhtsang

Regression on GEMM (A^t@B): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16122585980/job/45491842774
GEMM (A^t@B) passes with second attempt, but there may be some issues of this change with that benchmark, as the same change caused failures before with the same benchmark: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14777632147/job/41503097292.

etiotto · 2025-07-08T14:09:41Z

Regression on GEMM (A^t@B): https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16122585980/job/45491842774 GEMM (A^t@B) passes with second attempt, but there may be some issues of this change with that benchmark, as the same change caused failures before with the same benchmark: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/14777632147/job/41503097292.

Thanks for the benchmark run @whitneywhtsang, looks like the code generated for the 2D block prefetch is incorrect.

Failure in unit tests

etiotto · 2025-07-09T22:08:14Z

Waiting for a PVC 1550 to run the benchmark tests again.

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

whitneywhtsang · 2025-07-10T12:33:46Z

Benchmark BMG CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16184418309
It fails with GEMM (A^t@B).

whitneywhtsang · 2025-07-10T17:43:18Z

Fixed the error shown in #4611 (comment) in main branch and rebased this PR.
New Benchmark BMG CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16202126345

chengjunlu · 2025-07-11T00:59:29Z

Rerun the failed tests which is caused by test of interpreter. The changes has no impact to the Triton Interpreter.

chengjunlu · 2025-07-11T01:43:42Z

Created an upstream PR triton-lang/triton#7470 for the Interpreter failure.

etiotto · 2025-07-11T13:53:08Z

Created an upstream PR triton-lang/triton#7470 for the Interpreter failure.

That PR is merged in and in our latest main branch. Rebasing and trying CI again.

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

whitneywhtsang · 2025-07-11T16:47:38Z

Depends on #4690

whitneywhtsang · 2025-07-11T17:32:29Z

Benchmark PVC 1100 CI: https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/16228680948

chengjunlu · 2025-07-14T08:08:47Z

There are improvements on the flex attention kernel with the fixes.

No regression on GEMM performance.

The result is positive.

chengjunlu requested review from Copilot, etiotto and whitneywhtsang July 3, 2025 05:52

Copilot AI reviewed Jul 3, 2025

View reviewed changes

Comment thread third_party/intel/lib/TritonIntelGPUToLLVM/LoadStoreOpToLLVM.cpp

chengjunlu force-pushed the chengjun/fix_prefetch_issue branch from 88fac6a to dc8c06c Compare July 3, 2025 09:16

chengjunlu mentioned this pull request Jul 3, 2025

[Performance] Add the support of tensor of pointer in the prefetching and loop pipelining #3634

Closed

chengjunlu force-pushed the chengjun/fix_prefetch_issue branch from dc8c06c to c13fa76 Compare July 3, 2025 09:20

etiotto previously approved these changes Jul 7, 2025

View reviewed changes

whitneywhtsang reviewed Jul 7, 2025

View reviewed changes

chengjunlu force-pushed the chengjun/fix_prefetch_issue branch 2 times, most recently from d671bdc to dd6c641 Compare July 8, 2025 02:54

chengjunlu force-pushed the chengjun/fix_prefetch_issue branch 3 times, most recently from 88aed22 to 3c87de3 Compare July 9, 2025 08:46

chengjunlu force-pushed the chengjun/fix_prefetch_issue branch from 87b7652 to 1e9fd7e Compare July 10, 2025 00:05

Fix issue in prefetching column major matrix.

1e9fd7e

Signed-off-by: Lu,Chengjun <chengjun.lu@intel.com>

Merge branch 'main' into chengjun/fix_prefetch_issue

4c69c8a

Merge branch 'main' into chengjun/fix_prefetch_issue

31b7bca

Merge branch 'main' into chengjun/fix_prefetch_issue

ab0f713

[TritonGEN] Lower to GenISA for 2d_block_prefetch_16b_16r8x1c

8a4287d

Signed-off-by: Whitney Tsang <whitney.tsang@intel.com>

Merge branch 'main' into chengjun/fix_prefetch_issue

18c95b9

whitneywhtsang enabled auto-merge (squash) July 11, 2025 21:58

chengjunlu added 2 commits July 14, 2025 16:10

Merge branch 'main' into chengjun/fix_prefetch_issue

dd3888a

Merge branch 'main' into chengjun/fix_prefetch_issue

a73a5d9

whitneywhtsang approved these changes Jul 14, 2025

View reviewed changes

whitneywhtsang merged commit c104666 into main Jul 14, 2025
15 checks passed

whitneywhtsang deleted the chengjun/fix_prefetch_issue branch July 14, 2025 13:03

Conversation

chengjunlu commented Jul 3, 2025 • edited by whitneywhtsang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

etiotto commented Jul 7, 2025

Uh oh!

etiotto commented Jul 7, 2025

Uh oh!

whitneywhtsang commented Jul 7, 2025

Uh oh!

whitneywhtsang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etiotto commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

etiotto commented Jul 9, 2025

Uh oh!

whitneywhtsang commented Jul 10, 2025

Uh oh!

whitneywhtsang commented Jul 10, 2025

Uh oh!

chengjunlu commented Jul 11, 2025

Uh oh!

chengjunlu commented Jul 11, 2025

Uh oh!

etiotto commented Jul 11, 2025

Uh oh!

whitneywhtsang commented Jul 11, 2025

Uh oh!

whitneywhtsang commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chengjunlu commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chengjunlu commented Jul 3, 2025 •

edited by whitneywhtsang

Loading

whitneywhtsang left a comment •

edited

Loading

etiotto commented Jul 8, 2025 •

edited

Loading

whitneywhtsang commented Jul 11, 2025 •

edited

Loading

chengjunlu commented Jul 14, 2025 •

edited

Loading