Update benchmarks to report the egraph size for each case#265
Update benchmarks to report the egraph size for each case#265
Conversation
…the return value assertion only once).
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## ale/3.0 #265 +/- ##
========================================
Coverage 81.42% 81.42%
========================================
Files 19 19
Lines 1491 1491
========================================
Hits 1214 1214
Misses 277 277 ☔ View full report in Codecov by Sentry. |
…prove function in older versions.
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
|
Using the updated scripts from nmheim/egg-benchmark#10 this should also produce the egraph size. Probably, we still need to update the CI script to extract the second table. It should look similar to the following. Values in the second table are (num_classes, num_nodes, length(memo)). Edit: updated for 641212a (after fixing inconsistencies in the benchmarks) | | egg-sym | egg-cust | MT@641212a | MT@ale/3.0 | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:-------:|:--------:|:----------:|:----------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr | 1.62 ms | | 5.38 ms | 5.05 ms | 0.3 | | 0.937 |
| basic_maths_simpl2 | 12 ms | 5.05 ms | 27.5 ms | 24.7 ms | 0.438 | 0.183 | 0.898 |
| prop_logic_freges_theorem | 2.56 ms | 1.61 ms | 2.42 ms | 2.31 ms | 1.06 | 0.664 | 0.955 |
| calc_logic_demorgan | 64.1 μs | 36.8 μs | 80.2 μs | 78.8 μs | 0.799 | 0.458 | 0.983 |
| calc_logic_freges_theorem | 23 ms | 11.7 ms | 44.3 ms | 42.3 ms | 0.518 | 0.263 | 0.953 |
| basic_maths_simpl1 | 5.77 ms | 2.9 ms | 4.99 ms | 4.6 ms | 1.16 | 0.582 | 0.922 |
| egraph_constructor | 0.13 μs | | 0.1 μs | 0.1 μs | 1.3 | | 1 |
| prop_logic_prove1 | 38 ms | 14.6 ms | 44.7 ms | 39 ms | 0.851 | 0.327 | 0.873 |
| prop_logic_demorgan | 89.9 μs | 49.7 μs | 98.8 μs | 97 μs | 0.91 | 0.503 | 0.982 |
| while_superinterpreter_while_10 | | | 19.7 ms | 18.8 ms | | | 0.954 |
| prop_logic_rewrite | | | 131 μs | 116 μs | | | 0.891 |
| time_to_load | | | 121 ms | 111 ms | | | 0.92 |
[ Info: Saving table at output.md
| | egg-sym | egg-cust | MT@641212a | MT@ale/3.0 | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:----------------:|:----------------:|:----------------:|:----------------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr | 6762 6762 6762 | | 6663 6663 6663 | 6640 6640 6640 | | | |
| basic_maths_simpl2 | 440 2235 2836 | 440 2235 2839 | 7092 14530 15001 | 7092 14530 15001 | | | |
| prop_logic_freges_theorem | 316 1197 2315 | 316 1197 2322 | 297 899 1313 | 297 899 1313 | | | |
| calc_logic_demorgan | 16 33 35 | 16 33 35 | 15 30 32 | 15 30 32 | | | |
| calc_logic_freges_theorem | 1072 4289 17394 | 1072 4289 17280 | 1527 5402 18092 | 1527 5402 18092 | | | |
| basic_maths_simpl1 | 368 1910 2543 | 368 1910 2567 | 528 1815 3042 | 528 1815 3042 | | | |
| egraph_constructor | | | | | | | |
| prop_logic_prove1 | 5510 17371 27976 | 4668 13644 18522 | 3061 9176 15637 | 3061 9176 15637 | | | |
| prop_logic_demorgan | 16 35 42 | 16 35 42 | 15 30 33 | 15 30 33 | | | |
| while_superinterpreter_while_10 | | | 39 205 499 | 39 205 499 | | | |
| prop_logic_rewrite | | | | | | | |
| time_to_load | | | | | | | |
|
|
Nice, thanks a lot! Can we keep the old |
Yes, separating the two prove functions (for tests and benchmarks) is better because then we can always use the prove function from benchmarks.jl to calculate sizes even for other branches. |
…nction for benchmarks, rollback changes to tests.
I think they should call each other ( |
I tried at first, but it is problematic because the benchmarks.jl script is taken from the reference revision and examples/prove.jl is taken from the benchmarked branch. I think it is better to say that benchmarks.jl has its own prove which is then the same for all branches. (We would also need to move |
|
@gkronber is this ready to merge? |
|
@0x0f0f0f yes |
... and check he return value assertion only once.
This is useful to compare the behaviour of MT to egg additionally to the runtime.