-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster VSHA256? #3787
Comments
Assuming we'd eventually use openssl for TLS support, would the 5x throughput increase be enough to marginalize the SHA256 CPU footprint in your workload? |
@dridi I think so, yes. |
Since we already have a breakage planned for VUT, we could also retire vsha256 in favor of libcrypto as part of the next soname bump for libvarnishapi. |
correction with apologies: Though Compiling again with
$ make V=1 libvarnish.la
....
/bin/bash ../../libtool --tag=CC --mode=compile clang -DHAVE_CONFIG_H -I. -I../.. -I../../include -I../../include -DVARNISH_STATE_DIR='"/var/run"' -g -O3 -MT libvarnish_la-vsha256.lo -MD -MP -MF .deps/libvarnish_la-vsha256.Tpo -c -o libvarnish_la-vsha256.lo `test -f 'vsha256.c' || echo './'`vsha256.c
libtool: compile: clang -DHAVE_CONFIG_H -I. -I../.. -I../../include -I../../include -DVARNISH_STATE_DIR=\"/var/run\" -g -O3 -MT libvarnish_la-vsha256.lo -MD -MP -MF .deps/libvarnish_la-vsha256.Tpo -c vsha256.c -fPIC -DPIC -o .libs/libvarnish_la-vsha256.o
mv -f .deps/libvarnish_la-vsha256.Tpo .deps/libvarnish_la-vsha256.Plo
|
I was intrigued by the performance numbers shown for that SHA implementation. If I understand it correctly, this library achieves its impressive numbers only when it can be handling multiple separate hashes in parallel. That is, there needs to be a need to simultaneously produce many distinct SHA256 in the same core. That doesn't match well with the way hashes are used in Varnish, where a single hash is produced in one thread and the result of that is needed to move forward. I did a naive hack to the perf example to make it non-parallel to get a feeling how it compares to libcrypto in something more similar to our use cases. To achieve this I asked it to flush the mgr immediately after each data set is added.
So unless I did something very wrong, I think it performs worse than libcrypto in the non-parallel use case, and that libcrypto is still the better drop in replacement to VSHA256. |
@mbgrydeland good point, see also intel/isa-l_crypto#45 To begin with, my primary question really was if there would be any interest in adding a faster implementation and if platform specific assembly code was at all something we would consider to pull in. If the answer was no, we would not even need to look into any more details of isa-l. Regarding how we could best integrate this specific implementation, I have not looked at the details yet. For my use case, hashing multiple objects in parallel would actually be an option. In fact, I already considered parallelizing VSHA256 over multiple threads, and with the mb implementation it seems possible to avoid multiple threads and still get a speed up. Overall, a faster VSHA256 would be good, and while isa-l just appeared particularly attractive at first sight, I have no stakes in any particular implementation. |
I second using a faster SHA256, and I believe libcrypto's implementation does all the hard parts of figuring the level of processor support and such at runtime already. So I would love to see a configure check if libcrypto is available and use it if so. In my opinion bringing in libcrypto is justifiable regardless of any feelings of OpenSSL in general.
I assume that for your use case the cryptographic qualities of SHA256 are not important. If it is only data consistency check that you are after, there are much faster alternatives available. Maybe have a look at https://github.com/Cyan4973/xxHash |
thank you for the pointer :) The current choice of SHA256 was primarily for being very well known and accepted to reduce FUD potential. But you are right that I should consider other options :) |
See #3787 (comment) and #3787 (comment), I would advise against adding ifdefery and simply migrate our SHA256 usage to libcrypto. We already agreed that regardless of how we |
FTR: When considering alternatives, we should also look at a realistic benchmark comparison for the primary varnish-cache use case of creating the hash for cache lookup.The results could differ significantly from the benchmarks on bulk data. |
bugwash: @mbgrydeland volunteered to look after benchmarks, thank you |
I've some benchmark of VSHA256, OpenSSL's SHA256 (and the one from The program was compiled with:
The libvarnish.a was compiled using the standard options (ie not a debug build, and using -O2). When using TEST_LEN_URL==39 and TEST_LEN_HOST==11:
When using TEST_LEN_URL==89 and TEST_LEN_HOST==67:
It looks like libcrypto's SHA256 is ~50% faster than VSHA on my laptop's i7-7500U. The numbers for The program used as a diff against
|
FYI if someone opens a pull request, this comment from @simonvik on IRC:
|
I patched varnish to use libcrypto instead of VSHA256 with the following patch: https://allg.one/2dZm The server i tested on handles about 2k req/s, ive not done any more performance tests than this. It's ofc possible that i failed with the patch in some way! |
I'm not thrilled about adding a dependency, but there are compelling arguments for it. Let's see an actual patch incl. live-testing |
IIRC @mbgrydeland wanted to suggest using libcrypt, but we do not need the ticket for that. |
ZeSecretProject (slash/fellow storage engine) is going to use SHA256 a lot to ensure on-disk data integrity.
Benchmarking the current code with
perf
on cache-misses only,VSHA256_Transform
is the top CPU consumer:So I looked for alternative, faster implementations and this post led me to https://github.com/intel/isa-l_crypto
To get a rough idea of the relevance, I hacked their benchmark code to include vsha256 ...
... and got this result (varnish compiled with
clang -g -O2
,clang 11.0.1-2
):So I wonder if I should pull in some custom code into ZeSecretProject or if there would be any interest in adding a faster implementation with platform specific assembly code? For the particular project mentioned, the license is quite permissive, but would require inclusion of it.
The text was updated successfully, but these errors were encountered: