Ale/3.0 ematch hashes by 0x0f0f0f · Pull Request #261 · JuliaSymbolics/Metatheory.jl

0x0f0f0f · 2025-02-02T13:16:47Z

@gkronber I've managed to simplify a lot the logic behind e-matching yesterday by tinkering around:

The ematch buffer is now using UInt64. There is no pairing anymore, and no delimiters anymore. The structure of a match in the buffer is as follows. First element gives the id of the e-class that matched, the second gives the rule index in the theory (positive or negative based on direction). Third element contains isliteral_bitvec, a UInt64 bitmask, where a bit in position i is 1 if and only if the pattern variable with Debruijn index i in the match contains a literal.
The length of a match group is given by length(theory[rule_idx].patvars). No need for magic number delimiters (although it may be good to have them to double check during development.

This isliteral_bitvec limits the number of pattern variables to be 64 per rule, (we can easily extend to 128 by just using two elements of the buffer for isliteral_bitvec).

Example. If we have @rule f(~x, ~y::Int, ~z) --> g(~x, ~z) * ~y then during pattern matching it will append to the buffer a vector of the form

EClass ID that matched root
1 (rule index, left to right), can infer that there's 1 remaining element for bitflags and 3 remaining buffer elements for variables
isliteral_bitvec = 0x0000000000000002 because pattern variable at position 2 is matching a literal
EClass ID that matched x
Hash of literal integer that matched y
EClass ID that matched z

instantiate_enode! will know what pattern variable is a literal by receiving and checking against isliteral_bitvec.

I've also simplified the logic of consuming the buffer in eqsat_apply!. It is now reading the buffer from position 0 to end (as a queue). Before, it was reading the buffer in reverse (as a stack of matches).

This should use asymptotically less memory and in theory should be a lot faster. In practice, it turned out to be slightly slower, and some tests are broken.

Could you help me take a look at the broken tests?

test/egraphs/ematch.jl

src/utils.jl

src/EGraphs/saturation.jl

src/vecexpr.jl

gkronber · 2025-02-11T15:06:06Z

test/tutorials/calculational_logic.jl

+  freges = :((p ⟹ (q ⟹ r)) ⟹ ((p ⟹ q) ⟹ (p ⟹ r)))   # Frege's theorem
  params = SaturationParams(timeout = 12, eclasslimit = 10000, schedulerparams = (match_limit = 6000, ban_length = 5))
-  @test prove(calculational_logic_theory, ex, 2, 10, params)
+  @test_broken true == prove(calculational_logic_theory, freges, 2, 10, params)


This fails now because saturation reaches the enodelimit before the theory can be proven. Increasing the enodelimit fixes the test.

The question is why it produces more enodes....

The reason seems to be the different order of applying rules (vector instead of stack).
In fact, this test succeeds:

@test true == prove(reverse(calculational_logic_theory), freges, 2, 10, params)

I would argue that the order of applying rules should not matter for EqSat.

I'm not 100% sure why reversing the order has an effect on the number of enodes after x iterations.

The issue that the order of rules affects the results is probably not caused by the changes in this PR but already exists in ale/3.0.

Thanks a lot @gkronber

I'm not 100% sure why reversing the order has an effect on the number of enodes after x iterations.

Me neither, it's probably because of union!-ing of e-classes that

# Make sure class 2 has fewer parents if length(g.classes[id_1].parents) < length(g.classes[id_2].parents) id_1, id_2 = id_2, id_1 end union!(g.uf, id_1.val, id_2.val)

So the IDs likely change? I'm not sure why we take the e-class with fewer parents, it was taken from egg

We should keep this. I suspect another issue in the rebuilding phase (where we clean up duplicate enodes).

To solve the problem in this PR, should we just increase the enodelimit for the failing tests? Another way, would be to reverse the order of applying the matches again (iterating from back to front). Then behaviour should be the same as before.

@gkronber could it be something related to analysis_pending and rebuilding is processed?

Adding this

function rebuild_memo!(g::EGraph) new_memo = Dict{VecExpr,Id}() for eclass in values(g.classes) for node in eclass.nodes canonicalize!(g, node) new_memo[node] = eclass.id end end g.memo = new_memo end

After every rebuild didn't seem to solve the issue at all, so it looks like the memo is alright

I think we should not concern ourself with rebuilding in this PR. The changes in this PR are ok and we can fix the failing test by increasing the enodelimit. I have a branch ready with several improvements of the rebuild procedure.

…lics/Metatheory.jl into ale/3.0-ematch-hashes

codecov-commenter · 2025-02-15T14:00:44Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.81159% with 16 lines in your changes missing coverage. Please review.

Project coverage is 81.42%. Comparing base (0cf46fd) to head (da4816f).

Files with missing lines	Patch %	Lines
src/utils.jl	0.00%	15 Missing ⚠️
src/vecexpr.jl	66.66%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@             Coverage Diff             @@
##           ale/3.0     #261      +/-   ##
===========================================
- Coverage    81.42%   81.42%   -0.01%     
===========================================
  Files           19       19              
  Lines         1497     1491       -6     
===========================================
- Hits          1219     1214       -5     
+ Misses         278      277       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/EGraphs/egraph.jl

…ses to wrapped vector v.

…lics/Metatheory.jl into ale/3.0-ematch-hashes

src/optbuffer.jl

gkronber and others added 5 commits January 23, 2025 09:19

Add a test set that raises the bug.

8a022f8

Bugfix when matching literals in dynamic rules.

54be8da

experiment

8252ba6

mark failures

2cc911b

Merge branch 'ale/3.0' into ale/3.0-ematch-hashes

b8a8874