Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Collisionless containers (DO NOT MERGE) #217

Conversation

NorfairKing
Copy link

No description provided.

showString "fromList " . shows (toList m)
-- instance (Show k, Show v) => Show (HashMap k v) where
-- showsPrec d m = showParen (d > 10) $
-- showString "fromList " . shows (toList m)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This instance must be restored!

denial of service exploit. However, we can use the fact that `hashable` supports
salted hashing by default to alleviate at least some of this problem. In the
current version, insertion time is not `O(number of collisions)` but `O(number
of consequtive levels of collision)`. While this is not better in case all the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/consequtive/consecutive/

@sjakobi
Copy link
Member

sjakobi commented Jun 19, 2020

Hi @NorfairKing!

Since @emilypi and I have joined @treeowl and @tibbe as maintainers of this package, we've been discussing what to do about the HashDoS potential and what to make of this PR.

I currently think it's very intriguing, others are more sceptical.

So far the focus of this package and hashable has been very much on performance. hashable even explicitly recommends against using HashMaps for untrusted input:

http://hackage.haskell.org/package/hashable-1.3.0.0/docs/Data-Hashable.html#g:1

But as you are fully aware, some users still use u-c with untrusted data.

So if your approach can sufficiently reduce the HashDoS potential without reducing u-c's performance too much and without introducing new pitfalls or security holes, I believe it has a good chance of being merged. After all, a lot of users could benefit!

In any case, we have lots of questions. I'll start with my own:

  1. Is this approach your own invention? Has it been tested in other hash map implementations or are there any (reviewed) publications about it?
  2. Do you have a practical exploit that is fixed by this PR? If so, please share it with us maintainers privately.
  3. How does this PR affect performance? We'll need benchmark results for this, both for "common" inputs, and inputs that could be used for a HashDoS. The most interesting operations to check are probably inserts and lookups. (For comparing the results, I recommend criterion-compare)
  4. Is this PR "correct" in that it doesn't introduce any new bugs? Any tests that still need to be added?

@sjakobi
Copy link
Member

sjakobi commented Jun 19, 2020

@treeowl There are a lot of open "conversations" resulting from previous reviews of yours. Could you possibly hide those that have been resolved, by clicking on "Resolve conversation" for those?

@NorfairKing
Copy link
Author

NorfairKing commented Jun 19, 2020

  • Is this approach your own invention? Has it been tested in other hash map implementations or are there any (reviewed) publications about it?

The approach is new-ish. See https://stackoverflow.com/questions/53495133/is-this-approach-to-dealing-with-hash-collisions-new-unique

  • Do you have a practical exploit that is fixed by this PR? If so, please share it with us maintainers privately.

Yes, I have >30K collisions in a ~1MiB json object that can keep a json-parsing server busy for minutes. I've already shared this with tibbe and treeowl.

  • How does this PR affect performance? We'll need benchmark results for this, both for "common" inputs, and inputs that could be used for a HashDoS. The most interesting operations to check are probably inserts and lookups. (For comparing the results, I recommend criterion-compare)

We did the benchmarks as well and saw no difference. That makes sense because this implementation only differs when there are (many) conflicts. If I remember correctly it's in the package that I sent to treeowl and tibbe.

  • Is this PR "correct" in that it doesn't introduce any new bugs? Any tests that still need to be added?

Well the pr needs to be rebased on top of something more recent, but all the tests still passed when I last did that.

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

Many thanks for the quick response, @NorfairKing! :)

For now, I'd just like to let you know that our internal discussion is progressing further, and I think we'll be able to give you more feedback within in the next few weeks.

@NorfairKing
Copy link
Author

Many thanks for the quick response, @NorfairKing! :)

For now, I'd just like to let you know that our internal discussion is progressing further, and I think we'll be able to give you more feedback within in the next few weeks.

Thanks for picking this up. I 'm curious to hear more but I am no longer being payed to work on this so I will not be able to justify spending a lot of time on this.

@sjakobi
Copy link
Member

sjakobi commented Jun 20, 2020

Out of curiosity: Have you ever considered using hashmap instead, @NorfairKing? Since it uses Data.Map to store hash collisions, it should be more resistant against HashDoS IMU.

hashmap is currently deprecated of course, but it could be revamped.

@NorfairKing
Copy link
Author

@sjakobi I haven't. We did this work because aeson uses unordered-containers instead of containers, which makes each haskell JSON server vulnerable. So unless you're able to convince the aeson maintainers to break their api and use another map type, it also doesn't matter to me.
The collisionless maintainers in this PR fix the problem without breaking API or sacrificing performance.

@TomMD
Copy link
Contributor

TomMD commented Sep 11, 2021

@NorfairKing Is there a reason this is still WIP and DO NOT MERGE?

@sjakobi The "internal discussion" seems to have gotten lost in the shuffle a year back. A timely fix would be good at this point. Can we merge?

@treeowl
Copy link
Collaborator

treeowl commented Sep 11, 2021 via email

@TomMD
Copy link
Contributor

TomMD commented Sep 11, 2021

Johan Tibell has said no

Is there a link to that discussion so we can hear the objections first hand? As an open source project it would be traditional to have the objection available in the code review.

@treeowl
Copy link
Collaborator

treeowl commented Sep 11, 2021 via email

@NorfairKing
Copy link
Author

Is there a link to that discussion so we can hear the objections first hand? As an open source project it would be traditional to have the objection available in the code review.

This was a private conversation as part of the responsible disclosure procedure.

@Boarders
Copy link

@NorfairKing You suggested this doesn't change performance, where are the benchmarks to show that?

@NorfairKing
Copy link
Author

NorfairKing commented Sep 12, 2021

@NorfairKing You suggested this doesn't change performance, where are the benchmarks to show that?

Yes, they were included in the original communication, but you can also just try to run the benchmarks locally to see.

Like I said:

We did the benchmarks as well and saw no difference. That makes sense because this implementation only differs when there > are (many) collisions. If I remember correctly it's in the package that I sent to treeowl and tibbe.

@Bodigrim
Copy link
Contributor

@NorfairKing thanks for your work and investigation and this PR, nicely done!

@TomMD it seems a bit unfair to mount pressure on maintainers of u-c: they are responsible neither for the choice of a hash function upstream, nor for downstream packages, which disregarded security warnings in documentation. Performance of u-c is a difficult business, and validating such change to the underlying data structure is no small feat. Especially given that this is a novel approach.

I guess from pragmatic perspective it would be much simpler to replace Array by Map, but this imposes Ord. This is likely a non-issue for mainstream languages, so they never explored approaches avoiding Ord such as the proposed sequence of nested HashMaps, which I personally find very appealing and beautiful.

@sjakobi
Copy link
Member

sjakobi commented Sep 13, 2021

@NorfairKing Thanks for the publishing this great write-up, and apologies for responding so slowly to your PR!

I'm happy that, now that the vulnerability is properly published, we can discuss it more easily in public.

I have opened #319, so we can discuss possible responses to the vulnerability separately from the proposed fix that you have provided in this PR. I invite anyone interested to participate in that discussion!

@sjakobi
Copy link
Member

sjakobi commented Sep 14, 2021

@sjakobi The "internal discussion" seems to have gotten lost in the shuffle a year back. A timely fix would be good at this point. Can we merge?

Unfortunately we (at least @tibbe) are not convinced that this PR is of much help with preventing collision attacks. Citing #319 (comment):

@tibbe's assessment was that it would still require a strong hash function and a random seed to be reasonably secure. With many weaker hash functions, it is possible to generate seed-independent collisions, see e.g. this blog post. By this assessment, #217 "adds" little security of its own.

This is no final judgment on this PR, but I don't currently expect that this PR will be part of a "timely fix" in u-c, nor do I currently know of a way to provide a quick fix within u-c itself at all.

If you're looking for potential mitigating measures, see #319 (comment) and #319 (comment).

@treeowl
Copy link
Collaborator

treeowl commented Sep 14, 2021

I agree; this is a fair bit of extra complexity and maintenance burden for something we don't actually know for sure is helpful.

@phadej
Copy link
Contributor

phadej commented Oct 2, 2021

I agree; this is a fair bit of extra complexity and maintenance burden for something we don't actually know for sure is helpful.

Hopefully this a helpful argument: https://medium.com/@robertgrosse/generating-64-bit-hash-collisions-to-dos-python-5b21404a5306 even Robert (the author of that blog post) doesn't share the universal collision sets (i.e. they collide for all salts), he is very far in showing they are feasible to construct. FNV-1 is just a weak hash. Thus having hashmap with another hashmap for collision resolution is potentially very bad, as that chain simply will not terminate: It's enough to have just two keys which collide universally! Collision resolution method should be vastly different then reusing (weak) hash function.

@NorfairKing
Copy link
Author

Alright, that convinced me. Closing this for good.

@NorfairKing NorfairKing closed this Oct 2, 2021
@sjakobi
Copy link
Member

sjakobi commented Oct 3, 2021

Thus having hashmap with another hashmap for collision resolution is potentially very bad, as that chain simply will not terminate: It's enough to have just two keys which collide universally!

That's a very good insight, @phadej. I hadn't realized this earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants