Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add) #25669

Rohanjames1997 · 2025-08-06T17:25:42Z

Description

The vfmaq_f32 intrinsic compiles to the FMLA instruction which is more performant than separate fmul+fadd instructions that vmlaq_f32 compiles to on latest GCC versions: https://godbolt.org/z/aYc9as5Wh
Note that this is not a breaking change, as vmlaq_f32 compiles to FMLA instructions already on the latest clang compilers (which are the default for MacOS ORT builds already)

Motivation and Context

With this change, the NEON version of MlasMultiplyAddFloat32x4 achieves parity with the x86 version that uses _mm_fmadd_ps.
It also achieves up to ~15% speedups compared to the current vmlaq_f32 implementation when tested on top of #25580

Rohanjames1997 · 2025-08-14T16:48:48Z

@skottmckay @snnn @yufenglee, appreciate it if this tiny PR could be reviewed & CI triggered.

Thanks!

Rohanjames1997 · 2025-08-22T17:40:16Z

@hariharans29 could you review this PR & run CI? Thanks!

hariharans29 · 2025-08-22T22:33:22Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-22T22:33:43Z

Azure Pipelines successfully started running 5 pipeline(s).

onnxruntime/core/mlas/lib/mlasi.h

Rohanjames1997 · 2025-08-25T16:25:57Z

Looks like the Windows x64 build timed out after 2 hours.

hariharans29 · 2025-08-25T19:41:34Z

/azp run Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-25T19:41:44Z

Azure Pipelines successfully started running 1 pipeline(s).

Rohanjames1997 · 2025-08-26T15:21:03Z

The Windows x64 QNN CI Pipeline timed out again. The logs suggest that the build was in progress, and not hung.

What could be the issue?

hariharans29 · 2025-08-27T03:11:55Z

The Windows x64 QNN CI Pipeline timed out again. The logs suggest that the build was in progress, and not hung.

What could be the issue?

I think you may have to wait for this: #25864

Rohanjames1997 · 2025-08-27T18:14:49Z

Thanks @hariharans29. It looks like #25864 has been closed as the updated VM SKU seems to be running better.

Do you mind re-triggering the CI for this PR?

Additionally, could you also review/request someone to review #25580 ?

Thanks!

hariharans29 · 2025-08-27T18:38:14Z

/azp run Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-27T18:38:24Z

Azure Pipelines successfully started running 1 pipeline(s).

hariharans29 · 2025-08-27T18:39:22Z

Thanks @hariharans29. It looks like #25864 has been closed as the updated VM SKU seems to be running better.

Do you mind re-triggering the CI for this PR?

Additionally, could you also review/request someone to review #25580 ?

Thanks!

Thanks for the contribution and patience. I am trying to find someone to review it (and a few other MLAS PRs). It may take some time. I will get back on that. Thanks.

Rohanjames1997 · 2025-08-28T04:49:06Z

Thanks @hariharans29 !

The CI for this PR timed out again unfortunately 🤔

hariharans29 · 2025-08-28T23:34:37Z

/azp run Windows x64 QNN CI Pipeline

azure-pipelines · 2025-08-28T23:34:46Z

Azure Pipelines successfully started running 1 pipeline(s).

Rohanjames1997 · 2025-09-02T16:52:18Z

Thanks @hariharans29!

This reverts commit af4bf43.

Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add)

93a6f74

hariharans29 reviewed Aug 22, 2025

View reviewed changes

onnxruntime/core/mlas/lib/mlasi.h Show resolved Hide resolved

Rohanjames1997 requested a review from hariharans29 August 25, 2025 16:26

hariharans29 closed this Aug 25, 2025

hariharans29 reopened this Aug 25, 2025

hariharans29 requested a review from edgchen1 August 25, 2025 19:42

hariharans29 closed this Aug 28, 2025

hariharans29 reopened this Aug 28, 2025

hariharans29 approved these changes Sep 2, 2025

View reviewed changes

hariharans29 merged commit af4bf43 into microsoft:main Sep 2, 2025
157 checks passed

hariharans29 added a commit that referenced this pull request Sep 4, 2025

Revert "Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add) (#25669)"

415b513

This reverts commit af4bf43.

hariharans29 mentioned this pull request Sep 4, 2025

Revert "Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add) (#25669)" #25954

Closed

Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add) #25669

Replace vmlaq_f32 with vfmaq_f32 (fused multiply-add) #25669

Uh oh!

Conversation

Rohanjames1997 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

Rohanjames1997 commented Aug 14, 2025

Uh oh!

Rohanjames1997 commented Aug 22, 2025

Uh oh!

hariharans29 commented Aug 22, 2025

Uh oh!

azure-pipelines bot commented Aug 22, 2025

Uh oh!

Uh oh!

Rohanjames1997 commented Aug 25, 2025

Uh oh!

hariharans29 commented Aug 25, 2025

Uh oh!

azure-pipelines bot commented Aug 25, 2025

Uh oh!

Rohanjames1997 commented Aug 26, 2025

Uh oh!

hariharans29 commented Aug 27, 2025

Uh oh!

Rohanjames1997 commented Aug 27, 2025

Uh oh!

hariharans29 commented Aug 27, 2025

Uh oh!

azure-pipelines bot commented Aug 27, 2025

Uh oh!

hariharans29 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rohanjames1997 commented Aug 28, 2025

Uh oh!

hariharans29 commented Aug 28, 2025

Uh oh!

azure-pipelines bot commented Aug 28, 2025

Uh oh!

Uh oh!

Rohanjames1997 commented Sep 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rohanjames1997 commented Aug 6, 2025 •

edited

Loading

hariharans29 commented Aug 27, 2025 •

edited

Loading