Hashtable performance regression #674

andreistefanescu · 2020-04-13T21:12:38Z

The change in GaloisInc/parameterized-utils#73 results in a performance regression that makes the BIKE proofs run several times slower. To reproduce, use 57d4a49 with GaloisInc/parameterized-utils@43cb6d2. Use BIKE R2 proofs from GaloisInc/s2n@caa6bea. A possible place to start would be the use of IdxCache in https://github.com/GaloisInc/crucible/blob/5c46b3ae77ddb2e95d2005f8cca675163a55fa1b/crucible-saw/src/Lang/Crucible/Backend/SAWCore.hs#L779.

The text was updated successfully, but these errors were encountered:

andreistefanescu · 2020-04-13T21:12:56Z

cc @atomb

brianhuffman · 2020-04-16T01:07:05Z

I did some timing measurements on the bike_r2 proofs with the original Data.HashTable.ST.Cuckoo hash table implementation, and then with the recent change to Data.HashTable.ST.Basic. I found that most of the slowdown was in the verifications of functions find_err1 and find_err2, so I did some additional timings of find_err1 by itself.

Proving find_err1 with Data.HashTable.ST.Cuckoo:

real	2m43.682s
user	2m36.929s
sys	0m6.169s

Proving find_err1 with Data.HashTable.ST.Basic:

real	20m19.691s
user	20m1.885s
sys	0m12.511s

(Profiling shows that the hash table operations went from about 1% of runtime with cuckoo up to more than 70% of total runtime with basic)

I also tried switching out the definition of the IdxCache type in what4 to use an IORef of a Data.Parameterized.Map (basically a clone of Data.Map but with fancier types) instead of a hash table. Here's the result:

real	2m33.826s
user	2m28.080s
sys	0m5.278s

Yes, that's right: It's actually faster than the old cuckoo-based hash table. (Although, to be honest, the difference in the measurements is small enough that it's probably not significant.) If parameterized-utils provided an IntMap-based table type, that would be even faster.

joehendrix · 2020-04-16T02:51:34Z

I don't quite see how to reconcile the profiling data that shows Cuckoo at 1% of the runtime and still has a greater than 1% speedup with Data.Parameterized.Map but this seems like good news.
I'm pretty sure I tested Cuckoo to make sure it had performance advantages over Data.Map, but I don't recall the exact test I used.

brianhuffman · 2020-04-16T07:46:07Z

The difference in the timings is probably due to thermal throttling; the actual runtime I expect is almost exactly the same. Another run with Data.Parameterized.Map took 2:44, so there is quite some variability between runs (at least on my machine).

robdockins · 2020-04-16T16:34:12Z

Wow, that's intense. I'm sort of amazed the basic hashtable performs so badly.

Internally Nonces use Word64, which doesn't quite fit into an IntMap sigh.

brianhuffman · 2020-04-16T16:41:13Z

We can fit nonces into an IntMap. Word64 is 64 bits. Int is 64 bits. What's the problem?

robdockins · 2020-04-16T16:44:33Z

Isn't that platform-specific? If we can reliably make that coercion work, I'm happy. I guess we could just switch Nonce to use Int as well, but I like getting the compiler to promise we get as many bits as we want.

brianhuffman · 2020-04-16T16:53:01Z

Right, using IntMap would only work if Int is actually 64 bits, so we'd need to either make the code explicitly non-portable or else use some CPP trickery to select between IntMap and doubly-nested IntMap (for the 32-bit case).

It would be nice if someone would provide an explicitly 64-bit version of the IntMap library.

This avoids a major performance regression with the hash table implementation now used by parameterized-utils in module `Data.Parameterized.HashTable`. (see GaloisInc/saw-script#674)

brianhuffman · 2020-04-16T17:53:20Z

I have submitted a PR on the what4 repo that switches IdxCache from Data.Parameterized.HashMap to Data.Parameterized.Map with an IORef. It's probably not optimal, but it seems to be good enough (at least for the s2n BIKE proofs).

I looked at the profiling numbers more closely for the find_err1 subproofs, to get a clearer picture of the cuckoo vs basic hash table slowdown:

Each version did 4.8 million hash table insertions:
- cuckoo: 4.3% of 285.39s runtime = 12.3s
- basic: 32.1% of 1011.48 runtime = 325s
- 26x slowdown
Each version did 8.5 million hash table lookups:
- cuckoo: 1.4% of 285.39s runtime = 4.0s
- basic: 40.4% of 1011.48s runtime = 408s
- 102x slowdown

A hundred-fold slowdown seems pretty extreme. Is this ratio considered reasonable for the algorithms in the hashtables package?

robdockins · 2020-04-16T18:50:11Z

That seems really bad. I wouldn't be very happy about that if I were maintaining hashtables

brianhuffman · 2020-04-16T18:52:22Z

I was just wondering if this is something worth reporting to the hashtables maintainers.

robdockins · 2020-04-16T18:52:38Z

Especially given that the package documentation reads:

Data.HashTable.ST.Basic contains a basic open-addressing hash table using linear probing as the collision strategy. On a pure speed basis it should currently be the fastest available Haskell hash table implementation for lookups, although it has a higher memory overhead than the other tables and can suffer from long delays when the table is resized because all of the elements in the table need to be rehashed.

brianhuffman · 2020-04-16T18:54:49Z

Resizing takes up about 10% of the total runtime for insert with the basic hash tables.

robdockins · 2020-04-16T18:56:23Z

100x slowdown is clearly not "the fastest available Haskell hash table implementation for lookups". I'd say that's bug-worthy

This goes along with the fix for #674.

atomb · 2020-04-27T23:24:20Z

From what I understand, this is fixed now. Am I right about that?

brianhuffman · 2020-04-28T00:51:39Z

Fixed by GaloisInc/what4#30 and #684.

andreistefanescu assigned brianhuffman Apr 13, 2020

brianhuffman mentioned this issue Apr 16, 2020

Change IdxCache type to use Data.Parameterized.Map. GaloisInc/what4#30

Merged

This was referenced Apr 21, 2020

Poor performance of Basic compared to Cuckoo gregorycollins/hashtables#65

Open

Update submodule versions. #684

Merged

brianhuffman pushed a commit that referenced this issue Apr 23, 2020

Re-enable BIKE test in .travis.yml (reverts 803b61a).

d1a4770

This goes along with the fix for #674.

brianhuffman closed this as completed Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashtable performance regression #674

Hashtable performance regression #674

andreistefanescu commented Apr 13, 2020 •

edited

Loading

andreistefanescu commented Apr 13, 2020

brianhuffman commented Apr 16, 2020

joehendrix commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

atomb commented Apr 27, 2020

brianhuffman commented Apr 28, 2020 •

edited

Loading

Hashtable performance regression #674

Hashtable performance regression #674

Comments

andreistefanescu commented Apr 13, 2020 • edited Loading

andreistefanescu commented Apr 13, 2020

brianhuffman commented Apr 16, 2020

joehendrix commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

brianhuffman commented Apr 16, 2020

robdockins commented Apr 16, 2020

atomb commented Apr 27, 2020

brianhuffman commented Apr 28, 2020 • edited Loading

andreistefanescu commented Apr 13, 2020 •

edited

Loading

brianhuffman commented Apr 28, 2020 •

edited

Loading