-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Add build option for ARM NCHWc kernels #26171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Added a comment regarding the performance of NCHWc ARM kernels and their default state.
|
Hi all, I have noticed a unit test failure likely associated with this PR using KleidiAI on the Mac M4. Interestingly the failure only happens when the test is run as part of the full onnxruntime_test_all suite. Yet if you run it in isolation it passes. This points to a potential variable that has not been reset. Unit Test Name: NchwcOptimizerTests.ConvNoBiasAddFusion Reproduce instructions: ./onnxruntime_test_all - Shows test failure |
Thanks @damdoo01-arm ! Hi @Rohanjames1997 - Could you please take a look when you get a chance ? Our partners from ARM recently encountered the above test failure that seems to originate from the NCHWc ARM64 support (#25580). Thanks! |
|
Hi @damdoo01-arm , thanks for reporting! I tried reproducing it, but I don't have the same setup. So a few questions:
Also, any idea why the CI did not catch this? @hariharans29🤔 |
|
@hariharans29, please ensure you update the Readme or other documentation so that it is clear to all how to enable this amazing feature. Thanks! |
Hi @Rohanjames1997 - If I were to take an educated guess, I think this will only repro on a machine that has SME2 supported (Mac M4) not just on a build with KleidiAI is enabled. This is the PR that introduced KleidiAI SME2 Conv kernels for ARM64 - https://github.com/microsoft/onnxruntime/pull/25187/files#diff-ae80f8c17f8c3c31a01bff6f1058df55c4287ce3f6741a4bb73df3a24253b7c0. Perhaps, there is an edge case to be accounted for somewhere at the boundary of the 2 PRs. Unfortunately, that is all I can think of right now. any idea why the CI did not catch this? |
We will document it and announce it in the next release, for now enabling it is as simple as using the build flag in this PR to build the feature from main |
|
Hi @Rohanjames1997, |
|
Thanks @damdoo01-arm , Is the test failing only on a SME2-supported machine like @hariharans29 suggested? I couldn't reproduce this on a Neoverse-V1 or a V2 machine. |
|
Apologies for the delay @Rohanjames1997, since I have an M4, I can attempt to diagnose and attempt to solve it, I'll post here with any updates, Damien. |
### Description Add a build option for new kernels introduced in #25580 ### Motivation and Context This enables building ORT with NCHWc ARM kernels. At the time of writing, it is turned OFF by default because its performance relative to "regular" NCHW kernels is not good at smaller thread counts. But its speed-up is non-negligible with higher thread counts on supporting ARM platforms. Once the gap is closed for smaller thread counts, it can be turned on by default. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Add a build option for new kernels introduced in microsoft#25580 ### Motivation and Context This enables building ORT with NCHWc ARM kernels. At the time of writing, it is turned OFF by default because its performance relative to "regular" NCHW kernels is not good at smaller thread counts. But its speed-up is non-negligible with higher thread counts on supporting ARM platforms. Once the gap is closed for smaller thread counts, it can be turned on by default. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Add a build option for new kernels introduced in #25580
Motivation and Context
This enables building ORT with NCHWc ARM kernels.
At the time of writing, it is turned OFF by default because its performance relative to "regular" NCHW kernels
is not good at smaller thread counts. But its speed-up is non-negligible with higher thread counts on supporting
ARM platforms.
Once the gap is closed for smaller thread counts, it can be turned on by default.