Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does xxhsum utility support xxh3 (64bit)? #447

Closed
AstolfoKawaii opened this issue Aug 20, 2020 · 12 comments · Fixed by #615
Closed

Does xxhsum utility support xxh3 (64bit)? #447

AstolfoKawaii opened this issue Aug 20, 2020 · 12 comments · Fixed by #615

Comments

@AstolfoKawaii
Copy link

xxhsum definitely supports xxh3_128bit, but I didn't find support of 64 variant there, this is seems wierd for me, is the man lagging behind the actual features or is it a really missing one?

@Cyan4973
Copy link
Owner

This feature is currently not exposed,
as it could lead to confusion with existing XXH64.

@AstolfoKawaii
Copy link
Author

This feature is currently not exposed,
as it could lead to confusion with existing XXH64.

Formally it has resolved my issue, but I'm not sure if I should close it, because if I do, I will open new with similar content, but with accent on "please implement this 🥺".

@Cyan4973
Copy link
Owner

Cyan4973 commented Aug 21, 2020

It's not because something can be implemented that it should be implemented.
The confusion risk is very real, and its potential damage to existing ecosystem should not be under-estimated.
I believe any solution must take this risk in consideration quite seriously, and come with some good mitigation proposals attached.

@devinrsmith
Copy link

I think exposing xxh3 64 bit to xxhsum makes sense if there is to be wider adoption of this variation. I do agree though that careful consideration needs to be made to make sure that there is as little confusion as possible. I think part of a solution would be to try and make the command line arguments more explicit, while keeping -H# around for backwards compatibility.

Maybe something like, xxhsum --type xxh3, xxhsum --type xxh128, xxhsum --type xxh32; or shorter variations xxhsum --xxh3, xxhsum --xxh128, xxhsum --xxh32. Trying to specify -H# and an explicit type would result in an error.

On a side-note: I'm a brand new user of xxHash, and assumed on first use that xxhsum -H64 was the xxh3 variation, and couldn't figure out why my use of the library was outputting a different hash than xxhsum. Of course, this type of confusion for new users is preferable compared to confusing existing users who are already relying on the functionality. But just thought I'd share my initial experience.

@Cyan4973
Copy link
Owner

Cyan4973 commented Dec 19, 2020

We may have a way to do that by using the BSD output format, which explicitly states the hash used (rather than implying it based on the hash length).

@easyaspi314
Copy link
Contributor

Yeah, the options I see are this:

  • Don't use XXH3_64 (current)
  • Only allow in BSD format
  • Use some magic prefix/suffix (perhaps XXH3_0123456789abcdef)

@MaxPeal
Copy link

MaxPeal commented Mar 4, 2021

my +1 for:

  • allow in BSD format
  • Use some magic prefix/suffix (perhaps XXH3_0123456789abcdef)

it took me some time to figure out why the hashing is so slower

@sergeevabc
Copy link

sergeevabc commented Apr 17, 2021

Whatever the consensus will be, I urge you to bear in mind the notorious fate of confusing Blake2 flavours (b, s, bp, sp), which is why the subsequent release of Blake3 in a single, outperforming flavour brought a great relief to the community. People around the world develop various digest calculators as we speak and often they embed everything that can be embedded to boast “my app is that powerful” regardless of the actual tasks to be solved, which creates a paralyzing quagmire of abundance “what algo to choose”. So right now at the design stage it is important to lay down an understanding of what needs this or that option is for. That is, if XXH3_64 outperforms XXH64 and it surely does, then let’s endorse the former as a new default and gradually eliminate the latter from circulation before it’s not spread that much in the wild.

@Cyan4973
Copy link
Owner

Cyan4973 commented May 5, 2021

if XXH3_64 outperforms XXH64 and it surely does, then let’s endorse the former as a new default and gradually eliminate the latter from circulation before it’s not spread that much in the wild.

XXH64 has been released long ago (circa ~2014), and it's already in widespread circulation.
That's also why it needs to be protected,
and XXH3_64bits() is the new guy that needs to pay attention to its potential disruption of existing XXH64 ecosystem.

That being said, we have in this thread a few suggestions on how it could be done while reducing sensibly risks of confusion.

@konsolebox
Copy link

I just tried to modify xxhsum so I could produce test vectors for XXH3_64bits. There are some parts of it that seem to be too difficult to hack already because of its reliance on digest length. Maybe it's time to introduce a new tool with more properly designed options and overhauled code, and then perhaps name it xxhsum2 or xxh.

switch (hash_len)
{
case 8:

Also maybe it's time to make naming of the new algorithms more formal (and strict) in all parts of the code and all discussions to lessen the confusion. Like maybe always call XXH3 as XXH3_64bits (or xxh3-64), and XXH128 as XXH_128bits (or xxh3-128).

@konsolebox
Copy link

Just in case anyone's interested, I created a patch to make XXH3_64bits work in xxhsum: https://github.com/konsolebox/digest-xxhash-ruby/blob/master/test/xxhsum.c.c0e86bc0.diff

This also allows secrets to be entered.

@Cyan4973
Copy link
Owner

-H3 supported in latest dev branch update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants