Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HQC Performance - Draft #2054

Closed
wants to merge 3 commits into from
Closed

Conversation

BartBBM
Copy link

@BartBBM BartBBM commented Jan 25, 2025

This PR shows that a fix for #2047 is possible, bringing the performance of the keygen, encaps and decaps from miliseconds again to submiliseconds, improving the performance on my machine by one to two orders of magnitude (see some numbers below).

This PR is only a draft PR focusing on the performance side of things. I do not know if this code is secure.

This PR is for people who need a quick fix for the performance of HQC in liboqs (like me :D ).

What I have done

  1. I downloaded the latest submission (2024/10/30) from the official HQC Website and copied the optimized version into liboqs.
  2. I renamed all global symbols for HQC192 and HQC256 to avoid naming conflicts.
  3. Made sure every optimization is on.

What this PR lacks

  1. It does not use shared code of liboqs.
  2. It does not have optional optimizations.
  3. It is not beautiful code, just code that works.

Some Numbers

old hqc-128

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       64bceb37fafa9b90cf228965079de9ebd77a83b9
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-26 00:19:04
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |       3148 |          3.001 |         953.162 |    533.640 |                   3430799 |    1920946
encaps                               |       1580 |          3.000 |        1898.947 |     96.527 |                   6835317 |     346923
decaps                               |       1034 |          3.002 |        2903.450 |    143.596 |                  10451388 |     516315
Ended at 2025-01-26 00:19:13

new hqc-128

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       15f40e6af53270874672786db8cd1756beb1b7c2 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-01-25 22:49:45
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |     126559 |          3.000 |          23.705 |     38.937 |                     85172 |     139316
encaps                               |      55479 |          3.000 |          54.075 |     13.453 |                    194472 |      48308
decaps                               |      30065 |          3.000 |          99.786 |     15.852 |                    359043 |      56917
Ended at 2025-01-25 22:49:54

old hqc-192

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       64bceb37fafa9b90cf228965079de9ebd77a83b9
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-26 00:20:12
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-192                              |            |                |                 |            |                           |           
keygen                               |       1006 |          3.001 |        2983.080 |    608.481 |                  10737845 |    2190153
encaps                               |        510 |          3.002 |        5886.704 |    345.127 |                  21190920 |    1241984
decaps                               |        344 |          3.004 |        8731.360 |    212.984 |                  31431855 |     766423
Ended at 2025-01-26 00:20:21

new hqc-192

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       15f40e6af53270874672786db8cd1756beb1b7c2 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-01-25 22:48:59
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-192                              |            |                |                 |            |                           |           
keygen                               |      52198 |          3.000 |          57.474 |     20.694 |                    206684 |      74155
encaps                               |      22420 |          3.000 |         133.812 |    179.441 |                    481384 |     645844
decaps                               |      14081 |          3.000 |         213.058 |     19.846 |                    766719 |      71295
Ended at 2025-01-25 22:49:08

old hqc-256

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       64bceb37fafa9b90cf228965079de9ebd77a83b9
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-26 00:21:25
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-256                              |            |                |                 |            |                           |           
keygen                               |        558 |          3.005 |        5385.332 |    483.098 |                  19385631 |    1738663
encaps                               |        278 |          3.007 |       10817.669 |   1205.905 |                  38941805 |    4341028
decaps                               |        182 |          3.002 |       16495.527 |   2415.614 |                  59381746 |    8695994
Ended at 2025-01-26 00:21:34

new hqc-256

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.1-dev (major: 0, minor: 12, patch: 1, pre-release: -dev)
Git commit:       15f40e6af53270874672786db8cd1756beb1b7c2 (+ local modifications)
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_DIST_BUILD OQS_LIBJADE_BUILD OQS_OPT_TARGET=generic CMAKE_BUILD_TYPE=Release 
CPU exts active:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3
Speed test
==========
Started at 2025-01-25 22:47:35
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-256                              |            |                |                 |            |                           |           
keygen                               |      26591 |          3.000 |         112.824 |     29.642 |                    406000 |     106564
encaps                               |      12265 |          3.000 |         244.601 |     30.475 |                    880243 |     109519
decaps                               |       7203 |          3.000 |         416.517 |     36.322 |                   1499109 |     130517
Ended at 2025-01-25 22:47:44

@baentsch
Copy link
Member

Thank you very much @BartBBM to "(re)start from first principles" with this PR. To me, this might be indicative of something being "seriously fishy" either with the common code and/or its integration. Regarding this PR, I'm not sure this is the right approach, though (to not use the shared code): While this solves the problem for HQC, it may hide a bigger problem of OQS...

@BartBBM
Copy link
Author

BartBBM commented Jan 27, 2025

In March I may have time to tidy this code and look further into it. In the mean time I just wanted to put it here for people to see, not for liboqs to adopt.

Though to give a heads up, a small comparison between kilo cpu cycles stated by the authors and the cpu cycles needed by liboqs:
(liboqs_kilo_cycles_keygen + liboqs_kilo_cycles_encaps + liboqs_kilo_cycles_decaps) / (stated_kilo_cycles_keygen + stated_kilo_cycles_encaps + stated_kilo_cycles_decaps)

hqc128 of PR: (80+189+353)/(75+177+323) = 1.08
bikel1: (774+129+2678)/(589+97+1135) = 1.97

So it is not as bad as it may seem.

@baentsch
Copy link
Member

So it is not as bad as it may seem.

Can I ask how you did the cycle count @BartBBM ? Also, is the comparison implying that liboqs is only 8% off for HQC but 97% worse for Bike? The former indeed may not be so much (but still doesn't quite explain the measurements in #2047, does it?). But for Bike, this seems to be really bad, no?

@BartBBM
Copy link
Author

BartBBM commented Jan 27, 2025

I just ran the speed_kem test from liboqs, like before. You can look for CPU cycles: mean in the logs. See the numbers for my last comment below.

Also, is the comparison implying that liboqs is only 8% off for HQC but 97% worse for Bike?

Yes. but 8% off for hqc that this draft supposes, not for liboqs 0.12.0.

Taking my numbers in the initial description of this PR leads us to the value for liboqs 0.12.0:
(3430+6835+10451)/(75+177+323) = 36.03
So when the optimized submitted version of hqc128 would take 1 unit of time, liboqs 0.12.0 would take 36 units of time.

I meant it is not that bad for bike, since a slowdown of 2 is not really noticable in a wider context, the slowdown of 36 for hqc in 0.12.0 is (at least I found it :D ).

Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.0 (major: 0, minor: 12, patch: 0)
Git commit:       80962f0eef63610e14442645cc5962725e4bacab
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-27 11:24:09
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
HQC-128                              |            |                |                 |            |                           |           
keygen                               |     131695 |          3.000 |          22.780 |      8.575 |                     80095 |      30120
encaps                               |      55762 |          3.000 |          53.801 |      9.233 |                    189259 |      32429
decaps                               |      29891 |          3.000 |         100.368 |     16.131 |                    353124 |      56678
Ended at 2025-01-27 11:24:18
Configuration info
==================
Target platform:  x86_64-Linux-5.15.153.1-bebbo-WSL2-local-166808-g33cad9854e0b
Compiler:         gcc (13.3.0)
Compile options:  [-march=native;-Wa,--noexecstack;-O3;-fomit-frame-pointer;-fdata-sections;-ffunction-sections;-Wl,--gc-sections;-Wbad-function-cast]
OQS version:      0.12.0 (major: 0, minor: 12, patch: 0)
Git commit:       80962f0eef63610e14442645cc5962725e4bacab
OpenSSL enabled:  Yes (OpenSSL 3.4.0 22 Oct 2024)
AES:              NI
SHA-2:            OpenSSL
SHA-3:            C
OQS build flags:  OQS_LIBJADE_BUILD OQS_OPT_TARGET=auto CMAKE_BUILD_TYPE=Release 
CPU exts compile-time:  ADX AES AVX AVX2 BMI1 BMI2 PCLMULQDQ POPCNT SSE SSE2 SSE3

Speed test
==========
Started at 2025-01-27 11:24:25
Operation                            | Iterations | Total time (s) | Time (us): mean | pop. stdev | CPU cycles: mean          | pop. stdev
------------------------------------ | ----------:| --------------:| ---------------:| ----------:| -------------------------:| ----------:
BIKE-L1                              |            |                |                 |            |                           |           
keygen                               |      13624 |          3.000 |         220.200 |    665.831 |                    774814 |    2347132
encaps                               |      81451 |          3.000 |          36.832 |      9.993 |                    129471 |      34914
decaps                               |       3942 |          3.000 |         761.054 |     46.829 |                   2677765 |     164674
Ended at 2025-01-27 11:24:34

@baentsch
Copy link
Member

Thanks for the additional explanation @BartBBM . I originally did not correctly latch on to the baseline, that you compared the PR's HQC performance with the author's cycle counts -- where 8% I guess is within the margin of error -- as this PR is pretty much exactly using the original code, that's reasonably to be expected, right?

I meant it is not that bad for bike, since a slowdown of 2 is not really noticable

Hmm -- would you agree one could also consider this 100% worse than expected? Admitted, not as bad as 3500% off :} but still pretty undesirable and worth while a serious investigation where this performance gets lost, no?

@SWilson4
Copy link
Member

Thanks for the benchmarking comparisons. I agree that the performance issues should be investigated, but please bear the following in mind.

This PR is for people who need a quick fix for the performance of HQC in liboqs (like me :D ).

Please be advised that the current reference implementation of HQC has a serious correctness and security flaw that has not been patched in the official implementation from https://pqc-hqc.org.

What I have done

1. I downloaded the latest submission (2024/10/30) from the [official HQC Website](https://pqc-hqc.org/implementation.html) and copied the optimized version into liboqs.

2. I renamed all global symbols for HQC192 and HQC256 to avoid naming conflicts.

3. Made sure every optimization is on.

What this PR lacks

1. It does not use shared code of liboqs.

2. It does not have optional optimizations.

3. It is not beautiful code, just code that works.

Unfortunately, it does not "work" in the sense that it does not correctly handle invalid decapsulation inputs—as stated in the security advisory linked above. Even for correct inputs, the reference code mishandles the private key.

@BartBBM
Copy link
Author

BartBBM commented Jan 28, 2025

Thank you @SWilson4 for bringing this to my attention.

@baentsch
Copy link
Member

In the light of the latest conversation, would it be OK to close this PR then, @BartBBM ? I don't envision you'd rather use it at high performance but with security problems, right?

@SWilson4 shall we create a separate issue (eventually regression-test PR) for general performance "sanity checks"? Also, apologies for completely forgetting that you updated only the immediate upstream in #2026 and not the original HQC code base :-( --> What about adding more explicit wording for each algorithm as to its "full" maintenance status (including all direct and indirect upstreams)? In this example it might read "HQC code unmaintained at code origin, maintained at best-effort basis at direct upstream PQClean and integrated without patch to liboqs" (or so).

@BartBBM
Copy link
Author

BartBBM commented Jan 28, 2025

It's fine for me to close it, as it is only for my performace comparisons, not for real world applications. As I said, I might revisit the performance issue of liboqs in a month.

For Context: The security patch referenced by @SWilson4 seems simple and should not result in a performance loss for this hqc implementation.

@SWilson4
Copy link
Member

SWilson4 commented Feb 6, 2025

It's fine for me to close it, as it is only for my performace comparisons, not for real world applications. As I said, I might revisit the performance issue of liboqs in a month.

For Context: The security patch referenced by @SWilson4 seems simple and should not result in a performance loss for this hqc implementation.

FYI @BartBBM, I took a look at the performance degradation in more detail here: #2047 (comment)

If you do revisit this, I would be more than happy to support/review any work to improve HQC (while maintaining security, of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants