Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blake2: integrate blake2b_simd/blake2s_simd crates #228

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

tarcieri
Copy link
Member

@tarcieri tarcieri commented Feb 4, 2021

Integrates original sources from these crates, which provide AVX2-accelerated SIMD backends:

https://github.com/oconnor663/blake2_simd

Taken from this commit:

  • Hash: 7bf791e67245bb84132d1ee0e6a893bb8c85c093
  • Author: Jack O'Connor [email protected]
  • Date: Fri Nov 13 15:50:16 2020 -0500
  • Title: AES-CTR benchmarks

Next steps

  • Determine if we want to proceed with a combined blake2 crate or if we can get the blake2b crate and use blake2b/blake2s
  • Define Blake2b/Blake2s structs which provide impls of the digest traits. These can also be used to replace the blake2::blake2b::blake2b/blake2::blake2s::blake2s static functions e.g. with Blake2b::digest(...)
  • Also define VarBlake2b/VarBlake2s(?)
  • Address compile errors/panics on no_std targets
  • Figure out solution for handling KATs
  • Address clippy errors (presently TODOs) in the PR (caused by arrayref, can circle back on that)

Platform::AVX2 | Platform::SSE41 => unsafe {
sse41::compress2_loop(jobs, finalize, stride)
},
_ => panic!("unsupported"),
Copy link
Member Author

@tarcieri tarcieri Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oconnor663 noticing compile failures (well, more like warnings) for this and the other cases below when testing on a thumbv7em-none-eabi target (which is SOP for us).

Is there some sort of fallback on these platforms that will avoid this panic? If not, these should probably be changed to compile_error! (and separately, we should address the issue before merging if so)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarcieri are you sure you can trigger this panic in practice? If you look at how these functions are used in compress_many, imp.degree() is checked first, which should prevent this panic. In particular, I expect the Portable platform should get selected on thumbv7em-none-eabi, which should have a degree() of 1.

@tarcieri tarcieri force-pushed the blake2_simd branch 2 times, most recently from e086b47 to 40180b5 Compare February 4, 2021 15:43
@tarcieri
Copy link
Member Author

tarcieri commented Feb 4, 2021

As discussed in #88, this PR merges the blake2b_simd and blake2s_simd crates into the blake2 crate, placing each algorithm under blake2b and blake2s modules respectively, and promoting the blake2bp and blake2sp modules to the toplevel of the crate.

Alternatively it might be possible to retain the two-crate structure, however unfortunately while we own the blake2s crate, we don't own the blake2b crate. I have contacted its current owner (as presently all releases are yanked so it's unused) to see if we can obtain it, in which case we can do a more straightforward import of blake2b_simd => blake2b and blake2s_simd => blake2s.

This PR is presently about 170 klocs(!) owing to the vendored C implementations in blake2/benches/. We should probably decide if we want to retain all that code or figure out a different solution. It'd be nice if there were *-sys crates of the C code sourced from elsewhere which we could use for comparison instead. (Edit: removed the benches for now. We can follow up on them separately)

@tarcieri
Copy link
Member Author

tarcieri commented Feb 4, 2021

Also note: I haven't tried to impl any of the digest traits. I'd like to sort out whether we'd like to proceed with a combined single blake2 crate with submodules for blake2b and blake2s or if we can get the blake2b crate first.

@newpavlov any thoughts on this?

@tarcieri tarcieri force-pushed the blake2_simd branch 4 times, most recently from 5f449a9 to 704dde6 Compare February 4, 2021 16:01
@tarcieri
Copy link
Member Author

tarcieri commented Feb 4, 2021

Oof, the KATs are enormous:

$ ls -lh tests/blake2-kat.json
-rw-r--r--  1 bascule  staff   1.8M Feb  4 05:15 tests/blake2-kat.json

It seems like https://github.com/oconnor663/blake2_simd was able to keep them conveniently out-of-band by virtue of having a workspace and keeping it outside the crates themselves. We should definitely do something similar to keep the crate size small.

Edit: moved them to .kat/blake2-kat.json in the toplevel of the repo. This unfortunately means that these tests will fail if anyone tries to test the packaged crate. Perhaps we could have e.g. a special cfg directive (e.g. --cfg kat) to enable these tests so they're ignored when testing from a released crate?

@tarcieri
Copy link
Member Author

tarcieri commented Feb 4, 2021

The good news: it compiles and all of the original tests are now passing! 🎉

I'll update the toplevel description with some next steps.

@tarcieri
Copy link
Member Author

tarcieri commented Feb 5, 2021

@oconnor663 @newpavlov so it turns out I was able to obtain ownership of the blake2b crate after all.

So the question is: should we proceed with a combined blake2 crate as this PR currently implements, or deprecate the blake2 crate and split it into blake2b and blake2s like blake2_simd did?

@newpavlov
Copy link
Member

Assuming that amount of code reuse between b and s modules is minimal, I am fine with the crate split. But it could be worth to postpone it until digest v0.10 release.

Oof, the KATs are enormous:

This is why we have blobby.

@tarcieri
Copy link
Member Author

tarcieri commented Feb 6, 2021

Assuming that amount of code reuse between b and s modules is minimal...

When I was making this PR I did notice considerable overlap/duplication between the two crates. There are large swaths of the code that are completely identical. For example, the file many.rs is nearly identical, with some small changes to constants (e.g. 2/4/8), the blake2b vs blake2s names, and some small differences in comments.

I think there's quite a bit of opportunity to extract a common module shared between them which eliminates such duplication.

This is why we have blobby.

I did a quick ballparking exercise, ignoring all framing and just decoding and concatenating the hex strings of the 1.8MB blake2-kat.json file.

This still results in 827kB of binary data, which I would consider enormous and probably a bad idea to include in a crate that would otherwise be ~60kB otherwise.

@tarcieri
Copy link
Member Author

tarcieri commented Feb 6, 2021

But it could be worth to postpone it until digest v0.10 release.

I definitely think we should land #217 first and do another release of the current blake2 crate before moving forward with this.

@oconnor663
Copy link

oconnor663 commented Feb 8, 2021

Yes there is quite a lot of duplicated code between 2b and 2s. I experimented a couple times with refactoring it to share more, but I wound up getting frustrated with having everything in a giant macro, and I gave up and decided copy-pasting was a better result (even as a maintainer, but especially for anyone who has to read the code). Some scattered thoughts:

  • There are some quirky differences in the way the parameter blocks are laid out between 2b and 2s, particularly the "node offset", which needs different code and not just different constants. This sort of thing is annoying to model in a shared macro.
  • I don't think it's possible to share SIMD code between 2b and 2s. Of course they use different intrinsics, though maybe that could be abstracted out. But more fundamentally, the word loading in the single-block compression functions is all optimized by hand (originally by Samuel upstream), and the two versions don't look similar.
  • In contrast, most of the code that might be easier to share is the relatively straightforward parts like buffer management and the public parameters API. I'm not thrilled to maintain two copies of this code, but so far I've preferred to do that rather than make the SIMD code any harder to reason about.
  • The many.rs code is in between. It's duplicated, but it's definitely tricky code that's not easy to reason about. That might be the highest-value opportunity for finding a way to share it. But on the other hand, again, the fact that it's so tricky makes me even less excited about making it a big macro.

Maybe there are some code sharing strategies besides "giant macro" that could work here, that I haven't thought about? Maybe some interesting possibilities open up when we can use const generics?

@oconnor663
Copy link

Another scattered thought: The many APIs were super valuable to me back when I was working on BLAKE3 prototypes. Now that BLAKE3 is its own thing and doesn't depend on this code, it might be that the only existing callers of the many APIs are the BLAKE2bp and BLAKE2sp implementations also in these crates. Having a totally generic "hash many" capability is neat, but it might be that in practice no one is ever going to use it. It's possible that we could simplify the API, if it was just a private implementation detail of BLAKE2bp and BLAKE2sp. Something to think about, if many ends up getting in the way of code sharing.

@newpavlov
Copy link
Member

newpavlov commented Feb 9, 2021

@oconnor663
Thank you for your comments!

In digest v0.10 we have buffer management included behind a feature (see the core_api module). So implementation crates only have to define state initialization, blocks compression, and finalization parts.

One possible alternative to giant macros is to be generic over a private trait with necessary associated constants and types. Since Rust monomorphizes generics by default, it should be identical to the macro approach performance-wise. I have successfully refactored some macros using this approach (though const generics would have been a better fit in some cases), so it may apply here as well.

@tarcieri
Copy link
Member Author

tarcieri commented Feb 9, 2021

I'm weakly in favor of merging the crates for the following reasons:

  1. It will provide the simplest transition path for existing users of the blake2 crate
  2. It leaves the door open for DRYing out some of the duplicated code

I think the core implementation of update_many could be converted into a sealed trait as @newpavlov suggested, with the constant value differences hoisted into associated consts. That could probably be made easier/more futureproof by making the many modules private and re-exporting the functions that comprise the public APIs from the blake2b/blake2s modules.

@oconnor663
Copy link

Apologies for the delay in getting to this. I notice that we have Blake2b, VarBlake2b, and State in the same module, with overlapping purposes. Similarly we have Hash and Params, but those are only used by State, and perhaps confusingly the Blake2b::with_params constructor doesn't involve Params. What do you think the long-term strategy will be with these API details?

@oconnor663
Copy link

Pie-in-the-sky, this has me wondering about an alternative strategy: What if we separated out the blake2_simd backends into a couple standalone crates (e.g. blake2b_backend and blake2s_backend) that provided a very low-level API, like the raw compression function and a minimal parallel compression function for BLAKE2bp/sp? That could be maintained here in the hashes repo, and then depended on by all the user-facing crates. That might let us kick the can down the road on API questions. How ridiculous would this be?

@tarcieri
Copy link
Member Author

tarcieri commented Sep 7, 2021

I notice that we have Blake2b, VarBlake2b, and State in the same module, with overlapping purposes. Similarly we have Hash and Params, but those are only used by State, and perhaps confusingly the Blake2b::with_params constructor doesn't involve Params. What do you think the long-term strategy will be with these API details?

There's definitely a lot of work and consolidation to be done there.

One thing that can definitely go away is Blake2* vs VarBlake2*, which is already addressed in #217.

From there the rest is largely up in the air. I'd probably suggest getting rid of State or at least making it non-pub. Hash should probably go away in favor of digest::Output and crypto_mac::Output (which would also eliminate the arrayvec dependency)

What if we separated out theblake2_simd backends into a couple standalone crates (e.g. blake2b_backend and blake2s_backend) that provided a very low-level API, like the raw compression function and a minimal parallel compression function for BLAKE2bp/sp? That could be maintained here in the hashes repo, and then depended on by all the user-facing crates. That might let us kick the can down the road on API questions. How ridiculous would this be?

Personally I'm still in favor of trying to bring everything under one roof. There is a lot of duplication right now.

The current blake2 crate does this using macros, which isn't great. An alternative to that which is worth considering IMO is writing the internals more using (sealed) traits/generics, and then wrapping that up in a higher-level API which hides the internal implementation strategy.

I don't think it's a good idea to explore that sort of thing as part of this PR, but merging everything into a single crate at least leaves the door open to such refactorings.

Replaces the current implementation of `blake2` by integrating the
original sources from these crates, which provide AVX2-accelerated
SIMD backends:

https://github.com/oconnor663/blake2_simd

Taken from this commit:

Hash:   7bf791e67245bb84132d1ee0e6a893bb8c85c093
Author: Jack O'Connor <[email protected]>
Date:   Fri Nov 13 15:50:16 2020 -0500
Title:  AES-CTR benchmarks
Renames the `guts` module to `backend`, and refactors the various
implementations to be submodules thereof.
Clearing out the `blake2b` and `blake2s` modules in order to add the
primary user-facing types there.
Clearing out the `blake2b` and `blake2s` modules in order to add the
primary user-facing types there.
Clearing out the `blake2b` and `blake2s` modules in order to add the
primary user-facing types there.
This is more consistent with our other crates
Adds a set of `Blake2b`/`VarBlake2b` and `Blake2s`/`VarBlake2s` which
are API-compatible with the ones in the current `blake2` v0.9.x release.

This commit does not yet include proper tests besides the rustdoc tests.
These are the original tests from:

https://github.com/RustCrypto/hashes/tree/4a2845c226ba6748babdb2e704713fa9103d09f0/blake2/tests

They show that the new `blake2_simd` crate is API-compatible with the
old one.
These clippy warnings all originate in macros from the `arrayref` crate.
Many of the warnings were occurring because the parallel backend
requires x86 SIMD intrinsics (AVX2 or SSE41) and therefore the parallel
code is dead on other targets.

This commit feature gates all such code, ensuring that everything will
build warning-free on other targets, and that users who attempt to use
the parallel APIs on unsupported targets get compile errors rather than
warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants