Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtable performance regression #674

Closed
andreistefanescu opened this issue Apr 13, 2020 · 16 comments
Closed

Hashtable performance regression #674

andreistefanescu opened this issue Apr 13, 2020 · 16 comments
Assignees

Comments

@andreistefanescu
Copy link
Contributor

andreistefanescu commented Apr 13, 2020

The change in GaloisInc/parameterized-utils#73 results in a performance regression that makes the BIKE proofs run several times slower. To reproduce, use 57d4a49 with GaloisInc/parameterized-utils@43cb6d2. Use BIKE R2 proofs from GaloisInc/s2n@caa6bea. A possible place to start would be the use of IdxCache in https://github.com/GaloisInc/crucible/blob/5c46b3ae77ddb2e95d2005f8cca675163a55fa1b/crucible-saw/src/Lang/Crucible/Backend/SAWCore.hs#L779.

@andreistefanescu
Copy link
Contributor Author

cc @atomb

@brianhuffman
Copy link
Contributor

I did some timing measurements on the bike_r2 proofs with the original Data.HashTable.ST.Cuckoo hash table implementation, and then with the recent change to Data.HashTable.ST.Basic. I found that most of the slowdown was in the verifications of functions find_err1 and find_err2, so I did some additional timings of find_err1 by itself.

Proving find_err1 with Data.HashTable.ST.Cuckoo:

real	2m43.682s
user	2m36.929s
sys	0m6.169s

Proving find_err1 with Data.HashTable.ST.Basic:

real	20m19.691s
user	20m1.885s
sys	0m12.511s

(Profiling shows that the hash table operations went from about 1% of runtime with cuckoo up to more than 70% of total runtime with basic)

I also tried switching out the definition of the IdxCache type in what4 to use an IORef of a Data.Parameterized.Map (basically a clone of Data.Map but with fancier types) instead of a hash table. Here's the result:

real	2m33.826s
user	2m28.080s
sys	0m5.278s

Yes, that's right: It's actually faster than the old cuckoo-based hash table. (Although, to be honest, the difference in the measurements is small enough that it's probably not significant.) If parameterized-utils provided an IntMap-based table type, that would be even faster.

@joehendrix
Copy link
Contributor

I don't quite see how to reconcile the profiling data that shows Cuckoo at 1% of the runtime and still has a greater than 1% speedup with Data.Parameterized.Map but this seems like good news.
I'm pretty sure I tested Cuckoo to make sure it had performance advantages over Data.Map, but I don't recall the exact test I used.

@brianhuffman
Copy link
Contributor

The difference in the timings is probably due to thermal throttling; the actual runtime I expect is almost exactly the same. Another run with Data.Parameterized.Map took 2:44, so there is quite some variability between runs (at least on my machine).

@robdockins
Copy link
Contributor

Wow, that's intense. I'm sort of amazed the basic hashtable performs so badly.

Internally Nonces use Word64, which doesn't quite fit into an IntMap sigh.

@brianhuffman
Copy link
Contributor

We can fit nonces into an IntMap. Word64 is 64 bits. Int is 64 bits. What's the problem?

@robdockins
Copy link
Contributor

Isn't that platform-specific? If we can reliably make that coercion work, I'm happy. I guess we could just switch Nonce to use Int as well, but I like getting the compiler to promise we get as many bits as we want.

@brianhuffman
Copy link
Contributor

Right, using IntMap would only work if Int is actually 64 bits, so we'd need to either make the code explicitly non-portable or else use some CPP trickery to select between IntMap and doubly-nested IntMap (for the 32-bit case).

It would be nice if someone would provide an explicitly 64-bit version of the IntMap library.

brianhuffman pushed a commit to GaloisInc/what4 that referenced this issue Apr 16, 2020
This avoids a major performance regression with the hash table
implementation now used by parameterized-utils in module
`Data.Parameterized.HashTable`. (see GaloisInc/saw-script#674)
@brianhuffman
Copy link
Contributor

I have submitted a PR on the what4 repo that switches IdxCache from Data.Parameterized.HashMap to Data.Parameterized.Map with an IORef. It's probably not optimal, but it seems to be good enough (at least for the s2n BIKE proofs).

I looked at the profiling numbers more closely for the find_err1 subproofs, to get a clearer picture of the cuckoo vs basic hash table slowdown:

  • Each version did 4.8 million hash table insertions:
    • cuckoo: 4.3% of 285.39s runtime = 12.3s
    • basic: 32.1% of 1011.48 runtime = 325s
    • 26x slowdown
  • Each version did 8.5 million hash table lookups:
    • cuckoo: 1.4% of 285.39s runtime = 4.0s
    • basic: 40.4% of 1011.48s runtime = 408s
    • 102x slowdown

A hundred-fold slowdown seems pretty extreme. Is this ratio considered reasonable for the algorithms in the hashtables package?

@robdockins
Copy link
Contributor

That seems really bad. I wouldn't be very happy about that if I were maintaining hashtables

@brianhuffman
Copy link
Contributor

I was just wondering if this is something worth reporting to the hashtables maintainers.

@robdockins
Copy link
Contributor

Especially given that the package documentation reads:

Data.HashTable.ST.Basic contains a basic open-addressing hash table using linear probing as the collision strategy. On a pure speed basis it should currently be the fastest available Haskell hash table implementation for lookups, although it has a higher memory overhead than the other tables and can suffer from long delays when the table is resized because all of the elements in the table need to be rehashed.

@brianhuffman
Copy link
Contributor

Resizing takes up about 10% of the total runtime for insert with the basic hash tables.

@robdockins
Copy link
Contributor

100x slowdown is clearly not "the fastest available Haskell hash table implementation for lookups". I'd say that's bug-worthy

@atomb
Copy link
Contributor

atomb commented Apr 27, 2020

From what I understand, this is fixed now. Am I right about that?

@brianhuffman
Copy link
Contributor

brianhuffman commented Apr 28, 2020

Fixed by GaloisInc/what4#30 and #684.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants