Skip to content

Ale/3.0 ematch hashes#261

Merged
0x0f0f0f merged 16 commits intoale/3.0from
ale/3.0-ematch-hashes
Feb 16, 2025
Merged

Ale/3.0 ematch hashes#261
0x0f0f0f merged 16 commits intoale/3.0from
ale/3.0-ematch-hashes

Conversation

@0x0f0f0f
Copy link
Member

@0x0f0f0f 0x0f0f0f commented Feb 2, 2025

@gkronber I've managed to simplify a lot the logic behind e-matching yesterday by tinkering around:

The ematch buffer is now using UInt64. There is no pairing anymore, and no delimiters anymore. The structure of a match in the buffer is as follows. First element gives the id of the e-class that matched, the second gives the rule index in the theory (positive or negative based on direction). Third element contains isliteral_bitvec, a UInt64 bitmask, where a bit in position i is 1 if and only if the pattern variable with Debruijn index i in the match contains a literal.
The length of a match group is given by length(theory[rule_idx].patvars). No need for magic number delimiters (although it may be good to have them to double check during development.

This isliteral_bitvec limits the number of pattern variables to be 64 per rule, (we can easily extend to 128 by just using two elements of the buffer for isliteral_bitvec).

Example. If we have @rule f(~x, ~y::Int, ~z) --> g(~x, ~z) * ~y then during pattern matching it will append to the buffer a vector of the form

  • EClass ID that matched root
  • 1 (rule index, left to right), can infer that there's 1 remaining element for bitflags and 3 remaining buffer elements for variables
  • isliteral_bitvec = 0x0000000000000002 because pattern variable at position 2 is matching a literal
  • EClass ID that matched x
  • Hash of literal integer that matched y
  • EClass ID that matched z

instantiate_enode! will know what pattern variable is a literal by receiving and checking against isliteral_bitvec.

I've also simplified the logic of consuming the buffer in eqsat_apply!. It is now reading the buffer from position 0 to end (as a queue). Before, it was reading the buffer in reverse (as a stack of matches).

This should use asymptotically less memory and in theory should be a lot faster. In practice, it turned out to be slightly slower, and some tests are broken.

Could you help me take a look at the broken tests?

freges = :((p ⟹ (q ⟹ r)) ⟹ ((p ⟹ q) ⟹ (p ⟹ r))) # Frege's theorem
params = SaturationParams(timeout = 12, eclasslimit = 10000, schedulerparams = (match_limit = 6000, ban_length = 5))
@test prove(calculational_logic_theory, ex, 2, 10, params)
@test_broken true == prove(calculational_logic_theory, freges, 2, 10, params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails now because saturation reaches the enodelimit before the theory can be proven. Increasing the enodelimit fixes the test.

The question is why it produces more enodes....

The reason seems to be the different order of applying rules (vector instead of stack).
In fact, this test succeeds:

  @test true == prove(reverse(calculational_logic_theory), freges, 2, 10, params)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that the order of applying rules should not matter for EqSat.

I'm not 100% sure why reversing the order has an effect on the number of enodes after x iterations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue that the order of rules affects the results is probably not caused by the changes in this PR but already exists in ale/3.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @gkronber

I'm not 100% sure why reversing the order has an effect on the number of enodes after x iterations.

Me neither, it's probably because of union!-ing of e-classes that

  # Make sure class 2 has fewer parents
  if length(g.classes[id_1].parents) < length(g.classes[id_2].parents)
    id_1, id_2 = id_2, id_1
  end

  union!(g.uf, id_1.val, id_2.val)

So the IDs likely change? I'm not sure why we take the e-class with fewer parents, it was taken from egg

Copy link
Collaborator

@gkronber gkronber Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep this. I suspect another issue in the rebuilding phase (where we clean up duplicate enodes).

To solve the problem in this PR, should we just increase the enodelimit for the failing tests? Another way, would be to reverse the order of applying the matches again (iterating from back to front). Then behaviour should be the same as before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkronber could it be something related to analysis_pending and rebuilding is processed?

Adding this

function rebuild_memo!(g::EGraph)
  new_memo = Dict{VecExpr,Id}()
  for eclass in values(g.classes)
    for node in eclass.nodes
      canonicalize!(g, node)
      new_memo[node] = eclass.id
    end
  end
  g.memo = new_memo
end

After every rebuild didn't seem to solve the issue at all, so it looks like the memo is alright

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not concern ourself with rebuilding in this PR. The changes in this PR are ok and we can fix the failing test by increasing the enodelimit. I have a branch ready with several improvements of the rebuild procedure.

@codecov-commenter
Copy link

codecov-commenter commented Feb 15, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 76.81159% with 16 lines in your changes missing coverage. Please review.

Project coverage is 81.42%. Comparing base (0cf46fd) to head (da4816f).

Files with missing lines Patch % Lines
src/utils.jl 0.00% 15 Missing ⚠️
src/vecexpr.jl 66.66% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           ale/3.0     #261      +/-   ##
===========================================
- Coverage    81.42%   81.42%   -0.01%     
===========================================
  Files           19       19              
  Lines         1497     1491       -6     
===========================================
- Hits          1219     1214       -5     
+ Misses         278      277       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@0x0f0f0f 0x0f0f0f merged commit 32786c0 into ale/3.0 Feb 16, 2025
3 of 4 checks passed
@gkronber gkronber deleted the ale/3.0-ematch-hashes branch March 9, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants