Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OQS_DIST_BUILD with strange results on M1 #1201

Closed
baentsch opened this issue Apr 18, 2022 · 4 comments
Closed

OQS_DIST_BUILD with strange results on M1 #1201

baentsch opened this issue Apr 18, 2022 · 4 comments

Comments

@baentsch
Copy link
Member

When looking at the performance results at https://openquantumsafe.org/benchmarking/visualization/speed_kem.html, filtering for aarch64 and Kyber (as an algorithm supporting run-time switching), it becomes apparent that setting OQS_DIST_BUILD yields the slowest-running code on that architecture; At first blush I attributed that to "weak" CPU features available by the AWS ARM VMs we use for profiling. However, now the same becomes visible when trying things for M1.

Isn't this counterintuitive, as this flag should dynamically select the fastest-running code? Especially on M1 silicon not having any optimizations that are not supported, shouldn't code with this flag set be expected to yield performance as high as code with the OQS_OPT_TARGET=auto and OQS_DIST_BUILD=OFF (the "-noport" option in the benchmarking suite)?

On "x86_64" the performance behaviour is as expected: On machines/VMs with CPU features available, code built with OQS_DIST_BUILD=ON runs as fast as code with OQS_OPT_TARGET=auto (or skylake) and OQS_DIST_BUILD=OFF. The slowest performance is visible if OQS_DIST_BUILD=OFF and OQS_OPT_TARGET=generic (ie., the "-ref" setting).

On "aarch64", to the opposite, as long as OQS_DIST_BUILD=OFF, no performance difference can be observed, regardless of the choice of OQS_OPT_TARGET. This in turn means that "-ref" and "-noport" benchmark numbers are basically the same -- which also is confusing --at least to me--, as one was meant to display performance of reference implementation and the other that of the best optimized code. This then also debunks my initial thought that AWS aarch64 machines do not have all ARM performance features: They clearly do as the performance numbers are (much) higher than with OQS_DIST_BUILD=ON.

This issue is a continuation of #1146 making me wonder whether #1148 is a correct fix.

@baentsch
Copy link
Member Author

Partial fix in https://github.com/open-quantum-safe/liboqs/tree/mb-aarch64-dist. @Martyrshot : I'd be glad for a glance-over before doing a PR, especially wrt ARM32.

Remaining question: Is there any reason for running "-ref" (non-optimized) code on M1 ever? If so, which build option combination should activate it?

@Martyrshot
Copy link
Member

I pushed a small change to make the naming consistent for ARM32_V7 (here), otherwise it looks good to me!

I personally think running the reference implementation on M1 is worth it to see the relative performance improvements -noport offers.

@baentsch
Copy link
Member Author

I pushed a small change to make the naming consistent for ARM32_V7 (here), otherwise it looks good to me!

Thanks for this.

I personally think running the reference implementation on M1 is worth it to see the relative performance improvements -noport offers.

This performance differential is only visible if we have a platform that needs reference code to run. If there is no such ARM platform (as seems to be the case for M1), I'd suggest doing profiling only for a single setting, i.e., the default (-DOQS_DIST_BUILD=OFF).

Or asked another way: What setting of OQS_OPT_TARGET would/should trigger execution of reference code? For x86_64, it's "generic".

@baentsch
Copy link
Member Author

baentsch commented May 4, 2022

As decided in our call: Leave semantics as-is: DIST_BUILD basically behaves as -mnative for M1; thus, run profiling only with this setting.

@baentsch baentsch closed this as completed May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants