Skip to content

Conversation

@Rohanjames1997
Copy link
Contributor

@Rohanjames1997 Rohanjames1997 commented Aug 6, 2025

Description

The vfmaq_f32 intrinsic compiles to the FMLA instruction which is more performant than separate fmul+fadd instructions that vmlaq_f32 compiles to on latest GCC versions: https://godbolt.org/z/aYc9as5Wh
Note that this is not a breaking change, as vmlaq_f32 compiles to FMLA instructions already on the latest clang compilers (which are the default for MacOS ORT builds already)

Motivation and Context

With this change, the NEON version of MlasMultiplyAddFloat32x4 achieves parity with the x86 version that uses _mm_fmadd_ps.
It also achieves up to ~15% speedups compared to the current vmlaq_f32 implementation when tested on top of #25580

@Rohanjames1997
Copy link
Contributor Author

@skottmckay @snnn @yufenglee, appreciate it if this tiny PR could be reviewed & CI triggered.

Thanks!

@Rohanjames1997
Copy link
Contributor Author

@hariharans29 could you review this PR & run CI? Thanks!

@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@Rohanjames1997
Copy link
Contributor Author

Looks like the Windows x64 build timed out after 2 hours.

@hariharans29
Copy link
Member

/azp run Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hariharans29 hariharans29 requested a review from edgchen1 August 25, 2025 19:42
@Rohanjames1997
Copy link
Contributor Author

The Windows x64 QNN CI Pipeline timed out again. The logs suggest that the build was in progress, and not hung.

What could be the issue?

@hariharans29
Copy link
Member

The Windows x64 QNN CI Pipeline timed out again. The logs suggest that the build was in progress, and not hung.

What could be the issue?

I think you may have to wait for this: #25864

@Rohanjames1997
Copy link
Contributor Author

Thanks @hariharans29. It looks like #25864 has been closed as the updated VM SKU seems to be running better.

Do you mind re-triggering the CI for this PR?

Additionally, could you also review/request someone to review #25580 ?

Thanks!

@hariharans29
Copy link
Member

/azp run Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hariharans29
Copy link
Member

hariharans29 commented Aug 27, 2025

Thanks @hariharans29. It looks like #25864 has been closed as the updated VM SKU seems to be running better.

Do you mind re-triggering the CI for this PR?

Additionally, could you also review/request someone to review #25580 ?

Thanks!

Thanks for the contribution and patience. I am trying to find someone to review it (and a few other MLAS PRs). It may take some time. I will get back on that. Thanks.

@Rohanjames1997
Copy link
Contributor Author

Thanks @hariharans29 !

The CI for this PR timed out again unfortunately 🤔

@hariharans29
Copy link
Member

/azp run Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@hariharans29 hariharans29 merged commit af4bf43 into microsoft:main Sep 2, 2025
157 checks passed
@Rohanjames1997
Copy link
Contributor Author

Thanks @hariharans29!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants