Update benchmarks to report the egraph size for each case by gkronber · Pull Request #265 · JuliaSymbolics/Metatheory.jl

gkronber · 2025-02-18T13:58:29Z

... and check he return value assertion only once.

This is useful to compare the behaviour of MT to egg additionally to the runtime.

…the return value assertion only once).

codecov-commenter · 2025-02-18T14:01:34Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.42%. Comparing base (32786c0) to head (641212a).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff            @@
##           ale/3.0     #265   +/-   ##
========================================
  Coverage    81.42%   81.42%           
========================================
  Files           19       19           
  Lines         1491     1491           
========================================
  Hits          1214     1214           
  Misses         277      277

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…prove function in older versions.

github-actions · 2025-02-18T15:04:33Z

Benchmark Results

	egg-sym	egg-cust	MT@641212a8e55...	MT@32786c06710...	egg-sym/MT@641...	egg-cust/MT@64...	MT@32786c06710...
egraph_addexpr	1.45 ms		6.16 ms	6.33 ms	0.236		1.03
basic_maths_simpl2	14 ms	5.16 ms	25.8 ms	25.6 ms	0.543	0.2	0.992
prop_logic_freges_theorem	2.55 ms	1.56 ms	2.43 ms	2.41 ms	1.05	0.642	0.991
calc_logic_demorgan	60.8 μs	34.4 μs	81.2 μs	80.2 μs	0.748	0.423	0.988
calc_logic_freges_theorem	24 ms	11.6 ms	49 ms	47.3 ms	0.49	0.237	0.964
basic_maths_simpl1	6.51 ms	2.89 ms	4.93 ms	4.89 ms	1.32	0.587	0.992
egraph_constructor	0.0846 μs		0.106 μs	0.11 μs	0.8		1.04
prop_logic_prove1	37.5 ms	14.6 ms	45.4 ms	43 ms	0.827	0.321	0.949
prop_logic_demorgan	79.9 μs	45.5 μs	98.8 μs	95.5 μs	0.809	0.46	0.967
while_superinterpreter_while_10			19.1 ms	19.9 ms			1.04
prop_logic_rewrite			117 μs	114 μs			0.973
time_to_load			132 ms	128 ms			0.971

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

gkronber · 2025-02-18T16:49:23Z

Using the updated scripts from nmheim/egg-benchmark#10 this should also produce the egraph size. Probably, we still need to update the CI script to extract the second table.

It should look similar to the following. Values in the second table are (num_classes, num_nodes, length(memo)). Edit: updated for 641212a (after fixing inconsistencies in the benchmarks)

|                                 | egg-sym | egg-cust | MT@641212a | MT@ale/3.0 | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:-------:|:--------:|:----------:|:----------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr                  | 1.62 ms |          | 5.38 ms    | 5.05 ms    | 0.3               |                   | 0.937             |
| basic_maths_simpl2              | 12 ms   | 5.05 ms  | 27.5 ms    | 24.7 ms    | 0.438             | 0.183             | 0.898             |
| prop_logic_freges_theorem       | 2.56 ms | 1.61 ms  | 2.42 ms    | 2.31 ms    | 1.06              | 0.664             | 0.955             |
| calc_logic_demorgan             | 64.1 μs | 36.8 μs  | 80.2 μs    | 78.8 μs    | 0.799             | 0.458             | 0.983             |
| calc_logic_freges_theorem       | 23 ms   | 11.7 ms  | 44.3 ms    | 42.3 ms    | 0.518             | 0.263             | 0.953             |
| basic_maths_simpl1              | 5.77 ms | 2.9 ms   | 4.99 ms    | 4.6 ms     | 1.16              | 0.582             | 0.922             |
| egraph_constructor              | 0.13 μs |          | 0.1 μs     | 0.1 μs     | 1.3               |                   | 1                 |
| prop_logic_prove1               | 38 ms   | 14.6 ms  | 44.7 ms    | 39 ms      | 0.851             | 0.327             | 0.873             |
| prop_logic_demorgan             | 89.9 μs | 49.7 μs  | 98.8 μs    | 97 μs      | 0.91              | 0.503             | 0.982             |
| while_superinterpreter_while_10 |         |          | 19.7 ms    | 18.8 ms    |                   |                   | 0.954             |
| prop_logic_rewrite              |         |          | 131 μs     | 116 μs     |                   |                   | 0.891             |
| time_to_load                    |         |          | 121 ms     | 111 ms     |                   |                   | 0.92              |

[ Info: Saving table at output.md
|                                 | egg-sym          | egg-cust         | MT@641212a       | MT@ale/3.0       | egg-sym/MT@641... | egg-cust/MT@64... | MT@ale/3.0/MT@... |
|:--------------------------------|:----------------:|:----------------:|:----------------:|:----------------:|:-----------------:|:-----------------:|:-----------------:|
| egraph_addexpr                  | 6762 6762 6762   |                  | 6663 6663 6663   | 6640 6640 6640   |                   |                   |                   |
| basic_maths_simpl2              | 440 2235 2836    | 440 2235 2839    | 7092 14530 15001 | 7092 14530 15001 |                   |                   |                   |
| prop_logic_freges_theorem       | 316 1197 2315    | 316 1197 2322    | 297 899 1313     | 297 899 1313     |                   |                   |                   |
| calc_logic_demorgan             | 16 33 35         | 16 33 35         | 15 30 32         | 15 30 32         |                   |                   |                   |
| calc_logic_freges_theorem       | 1072 4289 17394  | 1072 4289 17280  | 1527 5402 18092  | 1527 5402 18092  |                   |                   |                   |
| basic_maths_simpl1              | 368 1910 2543    | 368 1910 2567    | 528 1815 3042    | 528 1815 3042    |                   |                   |                   |
| egraph_constructor              |                  |                  |                  |                  |                   |                   |                   |
| prop_logic_prove1               | 5510 17371 27976 | 4668 13644 18522 | 3061 9176 15637  | 3061 9176 15637  |                   |                   |                   |
| prop_logic_demorgan             | 16 35 42         | 16 35 42         | 15 30 33         | 15 30 33         |                   |                   |                   |
| while_superinterpreter_while_10 |                  |                  | 39 205 499       | 39 205 499       |                   |                   |                   |
| prop_logic_rewrite              |                  |                  |                  |                  |                   |                   |                   |
| time_to_load                    |                  |                  |                  |                  |                   |                   |                   |

~~One issue that I couldn't resolve yet, is that for three benchmarks we get a reported size for ale/3.0. This cannot be correct.~~ Edit: misunderstanding on my side, we are using simplify from the new benchmark.jl script, and therefore also get size results for other branches.

~~I also need to double check whether the hyperparameters and benchmark implementation for egg and MT are equal.~~ Edit: done

0x0f0f0f · 2025-02-18T21:47:51Z

Nice, thanks a lot! Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean? Also, for the changes to be picked up by the Github CI, we need to merge the benchmarks.jl file into master. I will do it after 3.0 merge

gkronber · 2025-02-19T07:11:01Z

Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean?

Yes, separating the two prove functions (for tests and benchmarks) is better because then we can always use the prove function from benchmarks.jl to calculate sizes even for other branches.

…nction for benchmarks, rollback changes to tests.

0x0f0f0f · 2025-02-19T07:53:42Z

Can we keep the old prove interface by renaming the function that returns also the egraph to something like saturate_prove, then have prove to check if the condition is met and return just the boolean?

Yes, separating the two prove functions (for tests and benchmarks) is better because then we can always use the prove function from benchmarks.jl to calculate sizes even for other branches.

I think they should call each other (prove should call prove_saturate and return the bool)

gkronber · 2025-02-19T08:01:53Z

I think they should call each other (prove should call prove_saturate and return the bool)

I tried at first, but it is problematic because the benchmarks.jl script is taken from the reference revision and examples/prove.jl is taken from the benchmarked branch. I think it is better to say that benchmarks.jl has its own prove which is then the same for all branches. (We would also need to move EGraphSize and its methods into examples/prove.jl.)

…g-benchmarks.

0x0f0f0f · 2025-02-24T19:28:22Z

@gkronber is this ready to merge?

gkronber · 2025-02-25T07:11:02Z

@0x0f0f0f yes

Update benchmarks to report the egraph size for each case (and check …

0c020ee

…the return value assertion only once).

gkronber added 2 commits February 18, 2025 15:18

Move report_size function

7aa49b8

Improve benchmarks script to remove duplicate expressions and handle …

e4d1b4a

…prove function in older versions.

gkronber added 2 commits February 18, 2025 16:50

Actually evaluate the benchmarked expression.

31db7e8

Change identifiers.

ddd5619

gkronber mentioned this pull request Feb 18, 2025

Inform schedulers with correct number of matches. #259

Open

gkronber added 2 commits February 19, 2025 08:42

Track sizes of all steps in prove and implement a separate prove() fu…

3cfb889

…nction for benchmarks, rollback changes to tests.

Report size for egraph/addexpr.

a783920

Bugfix.

c776370

Triple-check benchmarking parameters and use same parameters as in eg…

641212a

…g-benchmarks.

gkronber merged commit 74f9d5b into ale/3.0 Sep 9, 2025
4 checks passed

gkronber deleted the report_egraph_size_for_benchmarks branch March 9, 2026 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update benchmarks to report the egraph size for each case#265

Update benchmarks to report the egraph size for each case#265
gkronber merged 9 commits intoale/3.0from
report_egraph_size_for_benchmarks

gkronber commented Feb 18, 2025

Uh oh!

codecov-commenter commented Feb 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2025 •

edited

Loading

Uh oh!

gkronber commented Feb 18, 2025 •

edited

Loading

Uh oh!

0x0f0f0f commented Feb 18, 2025

Uh oh!

gkronber commented Feb 19, 2025

Uh oh!

0x0f0f0f commented Feb 19, 2025

Uh oh!

gkronber commented Feb 19, 2025

Uh oh!

0x0f0f0f commented Feb 24, 2025

Uh oh!

gkronber commented Feb 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

gkronber commented Feb 18, 2025

Uh oh!

codecov-commenter commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Benchmark Plots

Uh oh!

gkronber commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0x0f0f0f commented Feb 18, 2025

Uh oh!

gkronber commented Feb 19, 2025

Uh oh!

0x0f0f0f commented Feb 19, 2025

Uh oh!

gkronber commented Feb 19, 2025

Uh oh!

0x0f0f0f commented Feb 24, 2025

Uh oh!

gkronber commented Feb 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Feb 18, 2025 •

edited

Loading

github-actions bot commented Feb 18, 2025 •

edited

Loading

gkronber commented Feb 18, 2025 •

edited

Loading