Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt_merge: hashing performance and correctness #4677

Draft
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

widlarizer
Copy link
Collaborator

This is a direct remake of #4175 sans the 64-bit hash value. I'm making use of the interface in #4524 and requiring that PR (and containing its commits at the moment). Instead of xorshifts, values are sorted, though a final xorshift is included as a part of the fudge (--hash-seed=N) mechanism.

Additionally, I discovered opt_merge behaves incorrectly in the case of hash collisions. This suggests that this PR might in rare cases bring improvements in quality of results for flows that use opt_merge, since prior to it, hash collisions would inhibit merging. I modified the sharemap from a dict<hash_t, Cell*> to an equivalent std::unordered_multimap so that multiple cells can be associated with the same hash. This can't be a separate change since this bug actually broke the build just by changing how the hashes are constructed.

Sorry for the spam to code owners due to being based on the above mentioned wide-reaching PR #4524, I don't have a way of removing you from the reviewer list. The diff for this PR is not going to be very readable on github either until that's merged

@widlarizer widlarizer force-pushed the emil/opt_merge-hashing branch from ae88e5f to c68b6c2 Compare November 8, 2024 19:36
@widlarizer
Copy link
Collaborator Author

widlarizer commented Nov 11, 2024

The current implementation regresses opt_merge runtime many times over. I'll see if std::unordered_multimap is salvageable.
Currently, std::unordered_multimap, as I use it

  • maps from hash_t to Cell*
  • hashes hash_t with std::hash (unnecessary)
  • compares Cell*s with pointer equality

Instead it should:

  • map from Cell* to Cell*
  • hash with hash_cell_function
  • compare with compare_cell_parameters_and_connections

@widlarizer widlarizer force-pushed the emil/opt_merge-hashing branch from c68b6c2 to 2e1e5a8 Compare November 12, 2024 13:43
@widlarizer widlarizer force-pushed the emil/opt_merge-hashing branch from 2e1e5a8 to 45cbadc Compare November 12, 2024 13:59
@widlarizer
Copy link
Collaborator Author

widlarizer commented Nov 13, 2024

pool fails to bring a clear advantage over std::unordered_set:

ibex pool
Elapsed time: 0:42.16[h:]min:sec. CPU time: user 41.87 sys 0.17 (99%). Peak memory: 164800KB.

ibex std::unordered_set
Elapsed time: 0:42.85[h:]min:sec. CPU time: user 42.56 sys 0.16 (99%). Peak memory: 164076KB.

ibex pool
Elapsed time: 0:49.41[h:]min:sec. CPU time: user 48.76 sys 0.49 (99%). Peak memory: 634672KB.

ibex std::unordered_set
Elapsed time: 0:47.62[h:]min:sec. CPU time: user 46.95 sys 0.51 (99%). Peak memory: 629852KB.

On par on these designs in comparison with the state prior to touching opt_merge (cf5585e)

ibex
Elapsed time: 0:42.12[h:]min:sec. CPU time: user 41.87 sys 0.13 (99%). Peak memory: 162396KB.

jpeg
Elapsed time: 0:48.41[h:]min:sec. CPU time: user 47.70 sys 0.54 (99%). Peak memory: 621304KB.

When isolated, std takes home the perf prize:

$ hyperfine "./uut/std/bin/yosys -p \"read_rtlil garbage/ibex-pre-opt-merge.il; opt_merge\"" --warmup 5
Benchmark 1: ./uut/std/bin/yosys -p "read_rtlil garbage/ibex-pre-opt-merge.il; opt_merge"
  Time (mean ± σ):      99.8 ms ±   1.4 ms    [User: 90.6 ms, System: 8.8 ms]
  Range (min … max):    97.7 ms … 102.7 ms    29 runs

$ hyperfine "./uut/pool/bin/yosys -p \"read_rtlil garbage/ibex-pre-opt-merge.il; opt_merge\"" --warmup 5
Benchmark 1: ./uut/pool/bin/yosys -p "read_rtlil garbage/ibex-pre-opt-merge.il; opt_merge"
  Time (mean ± σ):     151.6 ms ±   1.5 ms    [User: 142.4 ms, System: 8.6 ms]
  Range (min … max):   149.4 ms … 154.2 ms    19 runs

Memory consumption is equal because the RTLIL read and Yosys state dominates the memory consumption in this case

@rmlarsen
Copy link
Contributor

@widlarizer nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants