Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ast/base.py hash issues #296

Closed
wants to merge 9 commits into from
Closed

Fix ast/base.py hash issues #296

wants to merge 9 commits into from

Conversation

zwimer
Copy link
Member

@zwimer zwimer commented Oct 6, 2022

This fixes multiple issues:

  1. Non-integer (and non-str) ._hash values for ASTs. Notice how
    h = (op, kwargs.get('length', None), a_args)
    can propagate to
    self._hash = h

    Fundamentally, if ._hash is never used as a hash of the AST, this isn't an issue; i.e. if only hash(my_ast) is used it is ok because we do convert in that function on demand. The problem is, that ._hash is used as the hash.
  2. Differing behavior if annotations exist / do not. At this if statement
    elif op in {'BVS', 'BVV', 'BoolS', 'BoolV', 'FPS', 'FPV'} and not annotations:
    notice we have special cases for floats like fixing the problem that hash(0.0) == hash(-0.0) here:
    h = (op, kwargs.get('length', None), ("-",) + a_args)
    But this only happens if not annotations.
  3. Fix anything that uses _calc_hash to determine what key to enter into a hash cache for looking up existing ASTs; since for some cases our cache uses the tuples made via our fast-path here:
    elif op in {'BVS', 'BVV', 'BoolS', 'BoolV', 'FPS', 'FPV'} and not annotations:
    Ex:
    self = cache.get(h, None)
  4. Reduce code duplication between Base.__new__ and Base.__init_with_annotations__.
  5. Fix a bug in Base.__init_with_annotations__ where we didn't use the leaf cache at all and instead were looking for leaves in the non-leaf cache:
    cache = cls._hash_cache
  6. Non-reproducible hashes; since self._hash was not always calculated via _calc_hash but sometimes as a tuple (see point 1).
  7. When deserializing via base.py's _d function
    def _d(h, cls, state):
    this bypasses the leaf cache and incorrectly stores the leaves in the non-leaf cache.
  8. Serialization is incomplete; it looses information such as +0.0 vs -0.0; thus deserialization may be incorrect.
  9. (Unknown if applicable) If _d always takes an integer (or str) as a hash, deserialization will lead to ASTs with corrupt _hashs because of point 1.

Before merging:

  • Add fastpath for annotation-free children (this was removed temporarily to get hashing work first)

Linked: angr/binaries#101

@zwimer zwimer marked this pull request as draft October 6, 2022 03:57
@github-actions
Copy link
Contributor

github-actions bot commented Oct 6, 2022

Unit Test Results

     94 files  +     84       94 suites  +84   1h 49m 17s ⏱️ + 1h 48m 49s
1 436 tests +1 130  1 342 ✔️ +1 096  90 💤 +30  0 ±0  4 🔥 +4 
1 442 runs  +1 136  1 348 ✔️ +1 102  90 💤 +30  0 ±0  4 🔥 +4 

For more details on these errors, see this check.

Results for commit 17f9151. ± Comparison against base commit adbf300.

♻️ This comment has been updated with latest results.

else:
h = Base._calc_hash(op, a_args, kwargs) if hash is None else hash
@classmethod
def __init_with_annotations__(cls, op, a_args, depth=None, uneliminatable_annotations=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__new_with_annotations__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a better name for this function, in my opinion. If we want to change that I'll need to see where else it is used and do a synced change

@zwimer
Copy link
Member Author

zwimer commented Oct 25, 2022

Ping @ltfish Regenerate all caches please

@ltfish
Copy link
Member

ltfish commented Oct 27, 2022

The reason why you couldn't regenerate caches locally: https://github.com/angr/claripy/blob/fix/hash-fail/claripy/ast/base.py#L250

Can you fix it (by removing the breakpoint() call) and then try generating caches across all projects again?

@zwimer zwimer force-pushed the fix/hash-fail branch 2 times, most recently from 5269a85 to 58981c9 Compare November 5, 2022 02:39
@ltfish
Copy link
Member

ltfish commented Jan 2, 2023

ROP caches updated. You will want to rebase this branch on master to re-trigger the CI.

@zwimer
Copy link
Member Author

zwimer commented Jan 9, 2023

@ltfish The same issues I was seeing before still seem to occur in the CI.

@ltfish
Copy link
Member

ltfish commented Jan 9, 2023

@zwimer what issue?

@ltfish
Copy link
Member

ltfish commented Jan 9, 2023

The four decompiler test cases fail because the fix/hash-fail branch in angr/binaries misses some new binaries that I pushed over the weekend. So they are totally expected.

@ltfish
Copy link
Member

ltfish commented Jan 9, 2023

You should only pay attention to the "push" tests since "PR" tests do not synchronize repos based on the same branch name.

@zwimer
Copy link
Member Author

zwimer commented Jan 9, 2023

@ltfish
Copy link
Member

ltfish commented Jan 9, 2023

@ltfish The push CI is failing too: https://github.com/angr/claripy/actions/runs/3841788720/jobs/6576714321

That's exactly what my reply was about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants