cpu:ppc64: fix GEMM reorder build issue on Power system by Tiwari-Avanish · Pull Request #4002 · uxlfoundation/oneDNN

Tiwari-Avanish · 2025-09-25T04:35:22Z

My previous changes #3156 got reverted back because of build issue on different power system.

This PR addresses the build issues for the GEMM reorder kernels on ppc64 architectures
including Power8, Power9, and Power10.

The changes include:

Fix alignment and vectorization issues in gemm_driver.cpp and ppc64_gemm_reorder.cpp
Update reorder logic in cpu_reorder_regular_f32_u8.cpp
Ensure compatibility with all supported Power architectures

Test Results:

Power10:

Build successfully:
[100%] Linking CXX executable benchdnn
[100%] Linking CXX executable test_binary
[100%] Built target test_binary
[100%] Linking CXX executable test_internals
[100%] Built target test_internals
[100%] Built target test_graph_unit
[100%] Built target benchdnn

Testcase:
338/338 Test #338: noexcept-cpp ............................................ Passed    0.04 sec
100% tests passed, 0 tests failed out of 338

Power 9:

Build successfully: 

[100%] Built target test_internals
[100%] Linking CXX executable test_convolution_forward_f32
[100%] Built target test_convolution_forward_f32
[100%] Linking CXX executable test_binary
[100%] Linking CXX executable test_rnn_forward
[100%] Built target test_binary
[100%] Built target test_rnn_forward
[100%] Linking CXX executable benchdnn
[100%] Built target benchdnn
[100%] Linking CXX executable test_reorder
[100%] Built target test_reorder 

TestCase: 
338/338 Test #338: noexcept-cpp ............................................   Passed    0.01 sec

100% tests passed, 0 tests failed out of 338

Power8:

Build successfully:
[100%] Built target test_gemm_u8s8s32
[100%] Linking CXX executable test_internals
[100%] Built target test_rnn_forward
[100%] Built target test_graph_cpp_api_compile
[100%] Built target test_internals
[100%] Built target test_graph_unit
[100%] Linking CXX executable test_sum
[100%] Built target test_sum
[100%] Linking CXX executable benchdnn
[100%] Built target benchdnn

Testcase:
338/338 Test #338: noexcept-cpp ............................................ Passed    0.06 sec
100% tests passed, 0 tests failed out of 338

Related PRs:

Power8 compatibility fixes: PR #3968
Saved GEMM/reorder changes: PR #3974

Thanks to @vpirogov and @spalicki for helping identify and preserve the relevant changes.

Please review this PR. All changes have been tested locally across Power8, Power9, and Power10.

Tiwari-Avanish · 2025-09-25T19:13:14Z

Thank you, @vpirogov, for reviewing the changes!

@spalicki, could you please review this PR when you have a moment? Let me know if any changes are needed, and I’ll address them promptly.

dzarukin · 2025-09-25T20:36:16Z

Could you, please, elaborate this bullet?

Ensure compatibility with all supported Power architectures

I don't see changes to the build system. How this compatibility was achieved?

spalicki · 2025-09-26T08:11:04Z

@Tiwari-Avanish Our conda build succeeds now.

For future reference to test it you can run:

# Download conda feedstock
git clone https://github.com/conda-forge/onednn-feedstock.git && cd onednn-feedstock
# Check SHA of your changes, e.g. for this change:
wget https://github.com/Tiwari-Avanish/oneDNN/archive/refs/heads/ppc64-gemm-reorder-fix.zip
sha256sum ppc64-gemm-reorder-fix.zip
# Add your change to the recipe
vim recipe/recipe.yaml
-  url: https://github.com/oneapi-src/${{ name }}/archive/v${{ version }}.tar.gz
-  sha256: fa44702f5979ed5ab927f7ccc1d2947adb4e6d0e58c433149465c5fc71e3bd45
+  url: https://github.com/Tiwari-Avanish/oneDNN/archive/refs/heads/ppc64-gemm-reorder-fix.zip
+  sha256: 5235d10691833f44041aac9a2f0afaf82aa2134c3a8dde63ebe88e0f47dcbdb4
# Build the package
./build-locally.py linux_ppc64le_

Currently the build shows 2 warnings that need to be fixed:

 │ │ $SRC_DIR/src/cpu/ppc64/ppc64_gemm_reorder.cpp:107:9: warning: unused variable 'i' [-Wunused-variable]
 │ │   107 |     int i = 0;

which is quite obvious, and

 │ │ $SRC_DIR/src/cpu/ppc64/gemm/gemm_driver.cpp:17:5: warning: "__MMA__" is not defined, evaluates to 0 [-Wundef]
 │ │    17 | #if __MMA__

which I assume is just a typo since in all other __MMA__ guards you correctly use #ifdef.

Please also check if the performance change is in line with your expectations on Power10 and there is no degradation in case of Power8/9. You do not need to run the entire perf test suite, since you tested that last time, just a smaller subset.

Tiwari-Avanish · 2025-09-26T14:20:41Z

Hi @dzarukin

Thanks for reviewing this PR.

By “compatibility with all supported Power architectures,” I meant source-level compatibility:

Power10 (MMA + VSX): MMA paths are now guarded with #ifdef MMA.
Power7/8/9 (VSX-only): Reorder implementation rewritten with VSX intrinsics. Earlier code used Power10-only intrinsics unconditionally, causing build failures; this is now fixed.
Build system: No changes. Fallbacks (ref_gemm_s8x8s32() for GEMM, fp32_u8 reorder) are already handled in oneDNN.

So “compatibility” means correct compilation and execution across Power7–Power10 using conditional compilation, not build-system modifications.

Thanks @spalicki for reviewing this and checking the conda build system.

I’ll fix the unused variable warning and the MMA guard typo.
For performance validation: could you confirm if I should run the following perf tests only on Power10 (where MMA changes apply), or also on Power8/9?
- perf_matmul_inference_batched
- perf_matmul_inference_lb
- perf_matmul_training

Previously, I ran these tests during performance validation, so I can rerun them if needed. My assumption is that Power8/9 performance should remain unchanged, but I’m happy to test if you’d like confirmation.

Tiwari-Avanish · 2025-09-30T07:11:55Z

Hi @spalicki,

I’ve fixed the warning and corrected the typo.
I also checked for any performance degradation with these changes and did not observe any.

Top 15 Benchmark improvements for perf_matmul_inference_lb (earlier vs current changes):

Click to expand

Workload	Reverted Code(With MMA) GFLOPS	Reverted code(With MMA) + Power8 and 9 support GFLOPS	Improvement
Transformer_lt:Decoder_vocabulary*40	1.6099	1.7071	+6.04%
Transformer_lt:Encoder_MM_1*18	1.6180	1.7147	+5.98%
Transformer_lt:Decoder_MM_1*720	1.6202	1.7151	+5.86%
Transformer_lt:Encoder_MM_2*12	1.6195	1.7137	+5.82%
Transformer_lt:Encoder_MM_6*6	1.6202	1.7137	+5.77%
Transformer_lt:Encoder_MM_7*6	1.6089	1.6999	+5.65%
Transformer_lt:Decoder_MM_6*480	1.6161	1.7069	+5.62%
Transformer_lt:Decoder_MM_2*240	1.6212	1.7117	+5.58%
Transformer_lt:Decoder_MM_8*240	1.6234	1.7110	+5.40%
Transformer_lt:Encoder_MM_8*6	1.6228	1.7099	+5.37%
Transformer_lt:Decoder_MM_7*240	1.6056	1.6898	+5.24%
Transformer_lt:Decoder_MM_xx20*240	0.9286	0.9545	+2.79%
DLRM:3*1	1251.86	1286.54	+2.77%
Transformer_lt:Encoder_MM_5*6	129.267	132.162	+2.24%
DLRM:4*1	2087.73	2134.24	+2.23%

Whenever you have a moment, please review this PR. If everything looks good, kindly help merge it.

spalicki · 2025-10-01T14:40:56Z

My assumption is that Power8/9 performance should remain unchanged, but I’m happy to test if you’d like confirmation.

Yes, that is why I asked. Power 8/9 should remain unchanged with Power10 getting an uplift.

Macro is always defined, must have a value of 1 instead.

vpirogov · 2025-10-03T21:41:56Z

Cherry-picked to rls-v3.10 as well.

Tiwari-Avanish requested a review from a team as a code owner September 25, 2025 04:35

github-actions bot added platform:cpu-ppc64 PowerPC component:common labels Sep 25, 2025

Tiwari-Avanish changed the title ~~cpu:ppc64: fix GEMM reorder build issues on Power8, Power9 and Power10~~ cpu:ppc64: fix GEMM reorder build issue on Power system Sep 25, 2025

Tiwari-Avanish force-pushed the ppc64-gemm-reorder-fix branch from 0d66ac0 to ab697b0 Compare September 25, 2025 05:00

vpirogov mentioned this pull request Sep 25, 2025

cpu: ppc64: add gemm and reorder kernels #3974

Closed

vpirogov approved these changes Sep 25, 2025

View reviewed changes

Tiwari-Avanish force-pushed the ppc64-gemm-reorder-fix branch from ab697b0 to d3e1264 Compare September 25, 2025 17:57

Tiwari-Avanish mentioned this pull request Sep 25, 2025

[PPC64] '-mmma' requiring instructions not guarded by __MMA__ #3749

Closed

Tiwari-Avanish force-pushed the ppc64-gemm-reorder-fix branch 2 times, most recently from 2aea3d4 to 7c8e525 Compare September 30, 2025 05:35

spalicki approved these changes Oct 3, 2025

View reviewed changes

Tiwari-Avanish and others added 3 commits October 3, 2025 12:47

cpu: ppc64: add gemm and reorder kernels

f0351ba

fixup: cpu: ppc64: add gemm and reorder kernels

aa33691

Macro is always defined, must have a value of 1 instead.

cpu: ppc64: fix GEMM reorder build issues on Power8, Power9, Power10

aca5334

vpirogov force-pushed the ppc64-gemm-reorder-fix branch from 7c8e525 to aca5334 Compare October 3, 2025 19:48

vpirogov merged commit c8cd40f into uxlfoundation:main Oct 3, 2025
29 checks passed

vpirogov added this to the v3.10 milestone Oct 3, 2025

Akashcodes732 mentioned this pull request Nov 12, 2025

[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu attention backend and bump oneDNN version vllm-project/vllm#28535

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpu:ppc64: fix GEMM reorder build issue on Power system#4002

cpu:ppc64: fix GEMM reorder build issue on Power system#4002
vpirogov merged 3 commits intouxlfoundation:mainfrom
Tiwari-Avanish:ppc64-gemm-reorder-fix

Tiwari-Avanish commented Sep 25, 2025

Uh oh!

Tiwari-Avanish commented Sep 25, 2025

Uh oh!

dzarukin commented Sep 25, 2025

Uh oh!

spalicki commented Sep 26, 2025

Uh oh!

Tiwari-Avanish commented Sep 26, 2025 •

edited

Loading

Uh oh!

Tiwari-Avanish commented Sep 30, 2025 •

edited

Loading

Uh oh!

spalicki commented Oct 1, 2025

Uh oh!

Uh oh!

vpirogov commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Tiwari-Avanish commented Sep 25, 2025

Uh oh!

Tiwari-Avanish commented Sep 25, 2025

Uh oh!

dzarukin commented Sep 25, 2025

Uh oh!

spalicki commented Sep 26, 2025

Uh oh!

Tiwari-Avanish commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tiwari-Avanish commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spalicki commented Oct 1, 2025

Uh oh!

Uh oh!

vpirogov commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tiwari-Avanish commented Sep 26, 2025 •

edited

Loading

Tiwari-Avanish commented Sep 30, 2025 •

edited

Loading