You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently switched one of our projects from from Data.HashTable.ST.Cuckoo to Data.HashTable.ST.Basic, to avoid a bug in Cuckoo (related to #55, I believe). As a result, we noticed a surprisingly large performance hit. Here are some numbers from profiling a part of our test suite that makes heavy use of hash table operations, where I compared runs with Cuckoo and Basic hash table libraries:
Each version did 4.8 million hash table insertions:
cuckoo: 4.3% of 285.39s runtime = 12.3s
basic: 32.1% of 1011.48 runtime = 325s
26x slowdown
Each version did 8.5 million hash table lookups:
cuckoo: 1.4% of 285.39s runtime = 4.0s
basic: 40.4% of 1011.48s runtime = 408s
102x slowdown
This was especially surprising considering that the hashtables documentation claims Basic should have the fastest lookups. Profiling showed that resizing of the Basic hash table happened 42 times, taking up about 10% of total runtime, so resizing only accounts for a small portion of the slowdown.
To see whether profiling itself had anything to do with the numbers, I recorded a trace of the keys for all insert and lookup operations we were doing, and then ran a testbench that only performed the hash table operations. Timing (without profiling) shows that after parsing, Cuckoo takes an additional ten seconds or so, while Basic takes approximately an additional 8 minutes. So the slowdown ratio indicated by profiling is not far off.
These tests were done on MacOS with ghc-8.8.3; I haven't done thorough testing with other compiler versions.
The text was updated successfully, but these errors were encountered:
Have you tried Data.HashMap from unordered-containers? As I mentioned here, although it is a pure data structure, it is actually faster than this library (including Cuckoo), as indicated by benchmarks ran by @ninegua on an internal company use-case from a year ago. Since you're doing this now, perhaps you could try that out? Would be interested to see the numbers for someone else's use-case.
Around the same time I opened this ticket, we switched our application to use an IORef around an ordinary Data.Map. (The keys in our application are Word64s, so something like Data.IntMap would probably be better, except Data.IntMap keys are fixed to Int, which is not guaranteed to have 64 bits.) It turned out that this was fast enough not to be a bottleneck, so we've left it at that.
This ticket is based on GaloisInc/saw-script#674.
We recently switched one of our projects from from
Data.HashTable.ST.Cuckoo
toData.HashTable.ST.Basic
, to avoid a bug in Cuckoo (related to #55, I believe). As a result, we noticed a surprisingly large performance hit. Here are some numbers from profiling a part of our test suite that makes heavy use of hash table operations, where I compared runs withCuckoo
andBasic
hash table libraries:This was especially surprising considering that the
hashtables
documentation claimsBasic
should have the fastest lookups. Profiling showed that resizing of theBasic
hash table happened 42 times, taking up about 10% of total runtime, so resizing only accounts for a small portion of the slowdown.To see whether profiling itself had anything to do with the numbers, I recorded a trace of the keys for all insert and lookup operations we were doing, and then ran a testbench that only performed the hash table operations. Timing (without profiling) shows that after parsing,
Cuckoo
takes an additional ten seconds or so, whileBasic
takes approximately an additional 8 minutes. So the slowdown ratio indicated by profiling is not far off.These tests were done on MacOS with ghc-8.8.3; I haven't done thorough testing with other compiler versions.
The text was updated successfully, but these errors were encountered: