Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible DOS due to hash collision #864

Closed
TomMD opened this issue Sep 11, 2021 · 48 comments
Closed

Possible DOS due to hash collision #864

TomMD opened this issue Sep 11, 2021 · 48 comments

Comments

@TomMD
Copy link
Contributor

TomMD commented Sep 11, 2021

As detailed in a blog, Aeson is susceptible to DOS if given maliciously-crafted JSON. This is likely a tired conversation for many developers, but with the public posting today it seems good to have a public issue tracking the discussion and resolution.

@NorfairKing
Copy link

Author here: Happy to help with this.

@phadej
Copy link
Collaborator

phadej commented Sep 11, 2021

Indeed, it's old and known problem. See e.g. https://oleg.fi/gists/posts/2021-05-19-dont-default-to-hashmap.html

Yet, I'm not keen at changing Value, but rather would skip Value in decoding pipeline entirely. That would allow people e.g. process key-value pairs on order (if it matters for them), deal with duplicate keys (again, if it matters), possibly work with as-written values of number literals etc.

@NorfairKing
Copy link

Indeed, it's old and known problem. See e.g. https://oleg.fi/gists/posts/2021-05-19-dont-default-to-hashmap.html

With all due respect, that makes this worse, not better.

Yet, I'm not keen at changing Value, but rather would skip Value in decoding pipeline entirely.

That's a much bigger breaking change and would make it entirely new library.
We have played around with this idea by making the equivalent of optparse-applicative, but for json parsing without an intermediate type.
That approach has different tradeoffs and can be cool, but it doesn't solve the problem at hand in aeson right now.

@TomMD
Copy link
Contributor Author

TomMD commented Sep 11, 2021

@NorfairKing In the post you indicated a rejected PR, could you link that here now that the issue is publicly discussed?

@phadej Among the proposed paths forward, is there a short-term solution you'd be in favor of? This issue deserves a short term fix, even if it isn't the desired end state.

@NorfairKing
Copy link

@NorfairKing In the post you indicated a rejected PR, could you link that here now that the issue is publicly discussed?

It's linked in the post but here's the link again: haskell-unordered-containers/unordered-containers#217

@phadej
Copy link
Collaborator

phadej commented Sep 11, 2021

@TomMD:

  • Introduce Data.Aeson.Object module with anObject opaque newtype(fromList, toList, member), maybe TextMap v would be simpler to avoid circular dependencies.
  • Add a manual flag to switch between Map to HashMap- the internals shouldn't matter if they are hidden behind opaque TextMap.

EDIT: bonus points for providing HashMap SHA256HashedText variant too, maybe 64 lower bits would be enough to get collision resistance. That would be interesting benchmark wise.

@lehins
Copy link

lehins commented Sep 11, 2021

Just a suggestion, but maybe Blake2b_64/Blake2s_32 here is a good solution. Blake2b_256 is much faster than SHA-256, but maybe doing Blake2b_64 is even faster than just cropping 256bit hash to 64 bits.

I have suspicion that cooking up collisions for Blake2 (even with a small digest size) would be prohibitively expensive for the purpose of DoS attack

@konsumlamm
Copy link

Another solution could be using some map data structure specialized to Text keys (something like a trie, for example text-trie (although I have no idea how performant that implementation is) or a ternary search tree). This would of course be a breaking change, but so would using Data.Map (and it could be hidden behind a newtype, as suggested by @phadej).

This would have the advantage (apart from not being susceptible to hash DoS attacks obviously) that it doesn't (potentially) cause lots of string comparisons (this could be a problem with Data.Map if lots of the keys share the same prefix). Disclaimer: I didn't run any benchmarks, so I'm not sure if this would actually be faster in practice than using Data.Map (which has been extensively optimized after all), especially since text-2.0 will apparently use a more efficient compare implementation (see here).

@phadej
Copy link
Collaborator

phadej commented Sep 11, 2021

@konsumlamm my solution is to make Object opaque.Then people are free to experiment with different representations, secure, fast, both, neither.

@phadej
Copy link
Collaborator

phadej commented Sep 11, 2021

More concretely, something like: https://github.com/haskell/aeson/compare/textmap

The rest is just H.foo -> TM.foo all over the place. (And something to make FromJSONKeyCoerce work in happy situations)

I hope someone will complete that and make a PR.

@Boarders
Copy link
Contributor

I'm going to try and push that through @phadej.

@joelmccracken
Copy link

@Boarders @phadej I was just considering that too. Please flag me if you could use any help.

@Boarders
Copy link
Contributor

@phadej What did you have in mind with the Lift instance, is there an easy way to write it?

@TravisCardwell
Copy link

That would allow people e.g. process key-value pairs on order (if it matters for them), ...

I often work with YAML files where the order of object properties is important. Humans edit YAML files, and a logical order of object properties maximizes the usability. I do not know of a way to do this in Haskell using existing libraries, and this issue unfortunately sometimes results in having to use a different language.

This comment is in agreement with the current direction, but I am adding it as a concrete example of a real-world issue that Oleg's suggestion would resolve, in addition to the hash collision issue.

@phadej
Copy link
Collaborator

phadej commented Sep 13, 2021

@Boarders, thanks for working on this!

@phadej What did you have in mind with the Lift instance, is there an easy way to write it?

look at the diff, probably something like

lift (TextMap m) = [| TextMap (H.fromList . map (first pack) $ m') |]
      where m' = map (first unpack) . H.toList $ m

It would be good to first introduce TextMap, and only then fiddle with its implementation.

@piyush-kurur
Copy link

Just a suggestion, but maybe Blake2b_64/Blake2s_32 here is a good solution. Blake2b_256 is much faster than SHA-256, but maybe doing Blake2b_64 is even faster than just cropping 256bit hash to 64 bits.

I have suspicion that cooking up collisions for Blake2 (even with a small digest size) would be prohibitively expensive for the purpose of DoS attack

Probably this is relevant

google/highwayhash#28

@brendanhay
Copy link

brendanhay commented Sep 13, 2021

I realise the following probably isn't advisable because of (.:) and friends, but it'd be nicer (IMO) as the lowest common denominator (aka default) to use Vector (Text, Value) and avoid {Hash,}Map completely. It's more inline with the spec in that it doesn't specify how duplicate keys are handled, preserves the original ordering of the keys, and doesn't tie you to Ord or Hashable, etc.

That said, TextMap looks like a clear improvement over the status quo. 👍🏽

@TravisCardwell
Copy link

it'd be nicer (IMO) as the lowest common denominator (aka default) to use Vector (Text, Value) and avoid {Hash,}Map completely. It's more inline with the spec in that it doesn't specify how duplicate keys are handled, preserves the original ordering of the keys, and doesn't tie you to Ord or Hashable, etc.

I completely agree with this, as the order issue has been a significant paint point for me with the current implementation. I brainstormed about Aeson Object Design in my blog. Here are my ideas in brief, which address the (.:) issue.

The type is named ObjectEntries to emphasize that it is a list (Vector):

type ObjectEntries = Vector (Text, Value)

For working with other types, an Object type class defines the required API:

class Object o where
  lookup :: Text -> o -> Maybe Value

(.:) :: (FromJSON a, Object o) => o -> Text -> Parser a

...

Instances can be provided for built-in types, and users are free to define instances for other types:

instance Object ObjectEntries where
  lookup k = fmap snd . Vector.find ((== k) . fst)

instance Object (HashMap Text Value) where
  lookup = HashMap.lookup

...

The "with" function is named withObjectEntries to emphasize that it uses the list (Vector) by default:

withObjectEntries
  :: String
  -> (ObjectEntries -> Parser a)
  -> Value
  -> Parser a

Helper functions can be provided to perform conversions to other types:

asHashMap
  :: (HashMap Text Value -> Parser a)
  -> ObjectEntries
  -> Parser a
asHashMap f = f . HashMap.fromList . Vector.toList

The example from the documentation would be written as follows:

instance FromJSON Coord where
  parseJSON = withObjectEntries "Coord" . asHashMap $ \v -> Coord
    <$> v .: "x"
    <*> v .: "y"

This allows the user to decide how to implement each instance. For example, a public API may prioritize security while an internal or administrative API may prioritize performance.

Generic instances could have a suitable default and provide a compile-time flag that allows users to select an alternative, much like the current fix.

@brendanhay
Copy link

For my own codebases I'm not convinced that (.:) introducing a linear scan by way of Vector.find would even be slower, at least, as the deserialisation routines (for say, a record) never perform repeated lookup of the same object key. As mentioned in the Z-Data code I linked above you're probably paying more to materialise a {Hash,}Map than what the linear scan for a single key introduces? It would naturally require actual benchmarks to prove that I am not, in fact, a moron for claiming this.

@phadej
Copy link
Collaborator

phadej commented Sep 14, 2021

Isn't this issue about supposedly O(1) operation degenerating to O(n) (i.e. linear)...

@TravisCardwell
Copy link

TravisCardwell commented Sep 15, 2021

Isn't this issue about supposedly O(1) operation degenerating to O(n) (i.e. linear)...

Indeed. (The DoS vulnerability is in the construction of the HashMap. Without collision, fromList is O(n log n) which is practically O(n) with a high base. With collisions, however, each insertion becomes linear, resulting in quadratic performance.)

The current fix allows users to globally choose between Map and HashMap using a flag. My suggestion provides a way for users to choose which data structure to use for each instance, using the flag to set the default for automatic (Generic) instances.

In some cases, it would even be possible to use the list (better than Vector for conversion) length to mitigate attacks. For example, consider an API that uses 10 properties. If the length of the list is above a certain threshold (10 for a strict API), an error could be returned. Otherwise, the HashMap could be constructed and used as usual. This would allow users to mitigate attacks while keeping the performance of HashMap, with only the additional cost of determining the length of the list (one time O(n)).

Would delaying the construction of the (Hash)Map until parseJSON be problematic? If the goal is to minimize API change, the withObject function could do the conversion to HashMap and keep the same type, while users can use the constructor (or new API functions) to access the list/Vector when necessary, in order to use Map for example.

Thank you very much for your time and patience.

@enobayram
Copy link

enobayram commented Sep 15, 2021

Just wanted to chime in that I agree it'll be great for the Haskell ecosystem if whatever solution comes out of this discussion lets us keep the order of JSON object properties. Regardless of how wrong it may be to rely on the order of the properties, many tools and standards still do it, and it's a nightmare to work with them using Haskell. Feels suddenly like the entire ecosystem is taken away from you (i.e. not only aeson itself, but all the downstream packages that expose a convenient interface over aeson). I find it very unfortunate that this opinionated treatment of the object properties is baked so deeply into Haskell's de-facto standard package for working with JSON. Now that it seems like a breaking change is inevitable, this might be the best time to tackle this issue as well.

@TravisCardwell
Copy link

TravisCardwell commented Sep 18, 2021

For my own codebases I'm not convinced that (.:) introducing a linear scan by way of Vector.find would even be slower, at least, as the deserialisation routines (for say, a record) never perform repeated lookup of the same object key. As mentioned in the Z-Data code I linked above you're probably paying more to materialise a {Hash,}Map than what the linear scan for a single key introduces? It would naturally require actual benchmarks to prove that I am not, in fact, a moron for claiming this.

I ended up running some benchmarks (blog entry, code).

On my laptop (UPDATED after feedback from Talyor):

  • Lists marginally outperformed Vector for objects with 1 pair.
  • Lists marginally outperformed Map for objects with 6 or fewer pairs.
  • Lists marginally outperformed HashMap for objects with 8 or fewer pairs.
  • Vector marginally outperformed HashMap for objects with 17 or fewer
  • Vector marginally outperformed Map for objects with 35 or fewer pairs.
  • Map marginally outperformed HashMap for objects with 13 or fewer pairs.

In the context of the current fix, using Map should not result in decreased performance when dealing with small objects.

@tfausak
Copy link

tfausak commented Sep 18, 2021

@TravisCardwell: Did you run those benchmarks with stack script as this suggests? If so, your script was effectively interpreted with runghc rather than compiled with ghc and then run. The docs say that if you want to compile with ghc -O0 you can pass --compile and if you want to compile with ghc -O1 you can pass --optimize.

I wouldn't expect Vector to be slower than [], and I especially wouldn't expect looking up about 40 keys to take 200 microseconds. I wrote my own benchmarks and got these results (note this is a log-log plot):

output

Once you get to n=32, HashMaps are the fastest. InsOrd is slower by a constant factor since it has to track the order of the keys. After n=64, Map.Lazy is the next fastest with Map.Strict just behind. ZVector and Vector are quite a bit slower, and the List is slower still. Looking at smaller numbers, at n=16 the vectors are the fastest. At n=2 List edges out Vector, and at n=0 List is the fastest, which makes sense because it doesn't have any associated startup costs.

My takeaway is that you should basically never use List, Vector is pretty good until you get more than 32 elements, and Map is worse than HashMap but not by much.

@TravisCardwell
Copy link

Did you run those benchmarks with stack script as this suggests? If so, your script was effectively interpreted with runghc rather than compiled with ghc and then run. The docs say that if you want to compile with ghc -O0 you can pass --compile and if you want to compile with ghc -O1 you can pass --optimize.

Indeed I did! Thank you very much for catching that! I have updated my code and blog entry to fix the mistake. I also updated the stats in my comment above to make sure people are not misled by the problematic stats.

Thanks for sharing your benchmark code. I look forward to using gauge in the future!

My takeaway is that you should basically never use List, Vector is pretty good until you get more than 32 elements, and Map is worse than HashMap but not by much.

That sounds good. On my (old and slow) laptop, HashMap is better than Vector from 18 pairs, as the performance is system-dependent.

@ketzacoatl
Copy link

Indeed, it's old and known problem.

@phadej, if it's an old and known problem, why hasn't it been resolved? Is it too difficult? Too time-consuming? Not enough interest? Not in agreement there is a problem?

@hasufell
Copy link
Member

If anyone has an API-compatible fork that fixes this issue, please let me know. I'll replace all my uses of aeson with it, even if performance is 10 times worse.

@phadej
Copy link
Collaborator

phadej commented Oct 9, 2021

aeson-2.0.0.0 with ordered-keymap flag is released

@phadej phadej closed this as completed Oct 9, 2021
@NorfairKing
Copy link

NorfairKing commented Oct 9, 2021

@phadej, Do I understand correctly that this vulnerability is still there by default?

aeson/aeson.cabal

Lines 62 to 67 in 4ad6b40

flag ordered-keymap
description:
Use ordered @Data.Map.Strict@ for KeyMap implementation.
default: False
manual: True

@phadej
Copy link
Collaborator

phadej commented Oct 9, 2021

I changed the default of keymap-ordered in 2.0.1.0: https://hackage.haskell.org/package/aeson-2.0.1.0

Though there is no guarantee it won't be changed again: that's the point of it being internal. If it is important, please pin the flag explicitly in your cabal.project / stack.yaml / ...

@ketzacoatl
Copy link

@phadej, for us laymen, can you explain how this is resolved and what level of certainty we can place in the solution?

@phadej
Copy link
Collaborator

phadej commented Oct 10, 2021

@ketzacoatl upgrade to aeson-2.0.1.0 or later, then the underlying object implementation uses Data.Map.

EDIT i.e. the plan as in #864 (comment) was implemented, but with different module/type names.

@ketzacoatl
Copy link

@phadej, thank you for clarifying!

Are there tests that exist or could be added to this package as a regression test or otherwise help us confirm this is resolved?

@Lysxia
Copy link
Collaborator

Lysxia commented Oct 11, 2021

@ketzacoatl What's a good way to test this? We could generate colliding inputs and make sure that the resulting map gets constructed in reasonable time, but that only prevents reverting back to unordered-containers specifically, doesn't that seem overly narrow?

If the lesson learned is "don't use vulnerable hashmaps" I think enough noise was made about this issue to make maintainers vigilant for years to come at least as far as aeson is concerned. I'm unsure about this. I could imagine abusing this line of thinking to justify not testing for any bugs, but at the same time I'm not convinced this issue in particular is one that can easily be tested against.

@phadej
Copy link
Collaborator

phadej commented Oct 11, 2021

Testing performance issues is tricky. I'll be happy to learn how to make non-flaky performance tests.

@Lysxia
Copy link
Collaborator

Lysxia commented Oct 11, 2021

When it's about the difference between O(n) and O(n^2), you can set n to a big number like n=10k and it's about detecting a 10k order of magnitude difference. That doesn't require much sophistication, does it? Make a test run in 0.1s without collisions, but 100s with collisions, set the cut-off at 1s.

@phadej
Copy link
Collaborator

phadej commented Oct 11, 2021

for Data.Map it's O (n log n) always (or should be). By making one measurement you don't know anything. We'd need to construct a series and fit the n log n curve with some confidence interval (i forgot all statistics I knew to figure that). (CPU) time measurements are not very precise (especially in cloud CIs, these machines are very noisy). Would be better to count instructions, or something else stable. How? perf on Linux?

Doable, but I don't have time to spend on making that work reliably. Such complexity testing tool would be nice though.

@NorfairKing
Copy link

@phadej; this might fit your use-case: https://github.com/nh2/haskell-cpu-instruction-counter

@phadej
Copy link
Collaborator

phadej commented Oct 11, 2021

Sadly, it's not on Hackage, nh2/haskell-cpu-instruction-counter#8

@Lysxia
Copy link
Collaborator

Lysxia commented Oct 11, 2021

By making one measurement you don't know anything.

I just explained why that's not the case. You do not need to actually measure the curve, just find one point where the discrepancy is large enough for all reasonable machines where one might run the test:

Make a test run in 0.1s without collisions, but 100s with collisions, set the cut-off at 1s.

How do you expect 0.1s on one machine ever take more than 1s on another machine? (and these numbers are just for the sake of example, if a 10x discrepancy is actually not enough slack you can easily turn it up to 100x or 1000x) You might get a false positive on a very busy machine but then you're likely to be in much more trouble already anyway.

@phadej
Copy link
Collaborator

phadej commented Oct 11, 2021

Why no curve? To test pass the curve should be e.g. n log n with some confidence. It is easy to make a test which will report true negatives, but we want a test which doesn't report false positives, i.e. doesn't tell that everything is ok when it isn't.

@Bodigrim
Copy link
Contributor

You can test two points: if processing a tenfold data increases time 50x, then it's highly unlikely to be O(n log n) and more like O(n^2). One can use measureCpuTime to measure run time, and there is probably something similar available from criterion-measurement.

@ulysses4ever
Copy link

I think @phadej wants to test a stronger hypothesis to find out which asymptotics we are in: linearifmic or quadratic. But @Lysxia makes a (perfectly valid imo) point that a much easier test -- to catch a difference between the two asymptotics (e.g. with explicit keymap-ordered and with implicit) -- perfectly suffices: for large enough data you will have a large enough difference to cover for any possible noise on CI machines.

@phadej
Copy link
Collaborator

phadej commented Oct 12, 2021

@ulysses4ever but we have only one implementation at the time to test, there is nothing to compare to.

Maybe I'm failing to explain, so I'd welcome the patch of your suggestions, so we can discuss something concrete.

@Lysxia
Copy link
Collaborator

Lysxia commented Oct 13, 2021

but we have only one implementation at the time to test, there is nothing to compare to.

I could be misunderstanding you, but like all regression tests, I thought the point is to distinguish the implementation before and the implementation after the bug was fixed.

@NorfairKing Your opinion would be valuable on this. Is a regression test worth adding to this library, considering that it could realistically only catch a regression to a specific hashtable implementation with a specific hashing function, which is likely to end up being obsolete if people start to act on the issue in the unordered-containers library (haskell-unordered-containers/unordered-containers#319)?

@NorfairKing
Copy link

@NorfairKing Your opinion would be valuable on this. Is a regression test worth adding to this library, [...] ?

I don't think so.
The best you could do against a resource exhaustion attack is to create a situation in which resources would be exhausted and then assert that resources are not exhausted.
We could do that in unordered-containers using a type for which collisions are common, but in aeson that involves using text-based collisions and I don't think it's worth the effort and maintenance work TBH.

@juhp
Copy link
Contributor

juhp commented Oct 15, 2021

(I think it would be interesting to see any benchmark comparison(s) of 1.5 and 2.0 anyway)

@phadej
Copy link
Collaborator

phadej commented Oct 15, 2021

@juhp

(I think it would be interesting to see any benchmark comparison(s) of 1.5 and 2.0 anyway)

See #883

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests