Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance of Basic compared to Cuckoo #65

Open
brianhuffman opened this issue Apr 21, 2020 · 3 comments
Open

Poor performance of Basic compared to Cuckoo #65

brianhuffman opened this issue Apr 21, 2020 · 3 comments

Comments

@brianhuffman
Copy link

This ticket is based on GaloisInc/saw-script#674.

We recently switched one of our projects from from Data.HashTable.ST.Cuckoo to Data.HashTable.ST.Basic, to avoid a bug in Cuckoo (related to #55, I believe). As a result, we noticed a surprisingly large performance hit. Here are some numbers from profiling a part of our test suite that makes heavy use of hash table operations, where I compared runs with Cuckoo and Basic hash table libraries:

  • Each version did 4.8 million hash table insertions:
    • cuckoo: 4.3% of 285.39s runtime = 12.3s
    • basic: 32.1% of 1011.48 runtime = 325s
    • 26x slowdown
  • Each version did 8.5 million hash table lookups:
    • cuckoo: 1.4% of 285.39s runtime = 4.0s
    • basic: 40.4% of 1011.48s runtime = 408s
    • 102x slowdown

This was especially surprising considering that the hashtables documentation claims Basic should have the fastest lookups. Profiling showed that resizing of the Basic hash table happened 42 times, taking up about 10% of total runtime, so resizing only accounts for a small portion of the slowdown.

To see whether profiling itself had anything to do with the numbers, I recorded a trace of the keys for all insert and lookup operations we were doing, and then ran a testbench that only performed the hash table operations. Timing (without profiling) shows that after parsing, Cuckoo takes an additional ten seconds or so, while Basic takes approximately an additional 8 minutes. So the slowdown ratio indicated by profiling is not far off.

These tests were done on MacOS with ghc-8.8.3; I haven't done thorough testing with other compiler versions.

@infinity0
Copy link

Have you tried Data.HashMap from unordered-containers? As I mentioned here, although it is a pure data structure, it is actually faster than this library (including Cuckoo), as indicated by benchmarks ran by @ninegua on an internal company use-case from a year ago. Since you're doing this now, perhaps you could try that out? Would be interested to see the numbers for someone else's use-case.

@infinity0
Copy link

i.e. just wrap it in a MutVar or something, that's what we did

@brianhuffman
Copy link
Author

Around the same time I opened this ticket, we switched our application to use an IORef around an ordinary Data.Map. (The keys in our application are Word64s, so something like Data.IntMap would probably be better, except Data.IntMap keys are fixed to Int, which is not guaranteed to have 64 bits.) It turned out that this was fast enough not to be a bottleneck, so we've left it at that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants