[ty] Implement Duboc's TDD optimization for unions of constraint sets#23881
[ty] Implement Duboc's TDD optimization for unions of constraint sets#23881
Conversation
Typing conformance resultsNo changes detected ✅Current numbersThe percentage of diagnostics emitted that were expected errors held steady at 85.29%. The percentage of expected errors that received a diagnostic held steady at 78.13%. The number of fully passing files held steady at 64/132. |
|
Memory usage reportMemory usage unchanged ✅ |
5eb95c3 to
d50b3f1
Compare
|
| Lint rule | Added | Removed | Changed |
|---|---|---|---|
invalid-await |
40 | 0 | 0 |
invalid-return-type |
1 | 0 | 0 |
| Total | 41 | 0 | 0 |
|
Ah good, this only changes performance characteristics, not semantics (as expected). The ecosystem diagnostic changes are all noise in the flaky projects. There are some ecosystem projects reporting longer processing times, but for the most part they are all small-ish projects. |
A pretty rosy picture given by our Codspeed benchmarks too! None of them appear quite big enough to trigger Codspeed's PR comment, but nice speedups across the board, it looks like: https://codspeed.io/astral-sh/ruff/branches/dcreager%2Fmore-tdd?utm_source=github&utm_medium=check&utm_content=button |
| // iff(a, b) = (a ∧ b) ∨ (¬a ∧ ¬b) | ||
| let a_and_b = self.and_inner(builder, other, other_offset); | ||
| let not_a_and_not_b = | ||
| self.negate(builder) | ||
| .and_inner(builder, other.negate(builder), other_offset); | ||
| a_and_b.or(builder, not_a_and_not_b) |
There was a problem hiding this comment.
I didn't want to derive TDD construction rules for iff, and I don't think this will be any less performant, especially since all of the desugared operations are cached.
| ) | ||
| .unwrap_or(ALWAYS_FALSE); | ||
| if_true.or(builder, if_false) | ||
| let if_uncertain = path |
There was a problem hiding this comment.
This is just a copy of the existing recursion rules for the true and false branches.
| self, | ||
| db: &'db dyn Db, | ||
| builder: &ConstraintSetBuilder<'db>, | ||
| negated: bool, |
There was a problem hiding this comment.
This negated-or-not logic really belongs to ConstraintAssignment, not to Constraint, so I've moved most of this down below.
|
This is so cool. |
| //! Our TDD operations follow Duboc's algorithms (union, intersection) with one correction: the | ||
| //! `n1 > n2` case for difference uses the original Frisch formulation, since Duboc's restructuring | ||
| //! of that case is incorrect. Negation is defined as `1 \ T` (difference from the universe), and |
There was a problem hiding this comment.
If Duboc's reformulations are more efficient, can we find the proper solution for that one case (instead of using the inefficient version)? Or is that too complex? If the n2 > n1 case for difference is correct, is there some symmetry/duality that would allow us to write down the n1 > n2 solution by applying simple replacements?
There was a problem hiding this comment.
Ah I can correct this comment — the mistake in the thesis is just a typo, and as you say, it's easy to construct the right more efficient rule from the n1 < n2 case, and just swapping the 1s and 2s. And that's what we've implemented.
There was a problem hiding this comment.
I actually just removed this snippet. We don't implement difference directly; it's only used because of how we now implement negate(C) as 1 \ C. And the comment we have down below for negate describes the result without having to mention that it's expanded from the definition of difference. That makes the typo irrelevant to us.
There was a problem hiding this comment.
Fantastic work!
I won't have enough time for a full review today, but I'd like to ask some high level (and very beginner) questions:
- I understand that this TDD structure optimizes for cases of large unions by making union operations "lazy". What is the drawback of doing that (if any)? Do we sacrifice any of the "good" properties of BDDs by making these operations lazy?
- There's usually a duality between unions and intersections, and it looks like this TDD structure "violates" that by treating unions in a special way. Are there "dual" examples where we construct large intersections that are still problematic? Or even become worse by doing this?
The performance results on ecosystem projects look great, but it might still be worth writing a microbenchmark for this if possible?
| // The loaded constraint set should NOT be never satisfied (it's a valid union) | ||
| assert!(!loaded.is_never_satisfied(&db)); | ||
|
|
||
| // The loaded constraint set should NOT be always satisfied (it requires specific types) | ||
| assert!(!loaded.is_always_satisfied(&db)); | ||
|
|
||
| // Verify semantic equivalence: loaded ∧ ¬loaded should be never satisfied | ||
| let negated = loaded.negate(&db, &builder2); | ||
| assert!( | ||
| loaded | ||
| .and(&db, &builder2, || negated) | ||
| .is_never_satisfied(&db) | ||
| ); | ||
|
|
||
| // loaded ∨ ¬loaded should be always satisfied | ||
| assert!( | ||
| loaded | ||
| .or(&db, &builder2, || negated) | ||
| .is_always_satisfied(&db) | ||
| ); | ||
|
|
||
| // Also verify iff(loaded, loaded) is always satisfied | ||
| assert!(loaded.iff(&db, &builder2, loaded).is_always_satisfied(&db)); |
There was a problem hiding this comment.
It looks like these properties would be fulfilled by almost all (non-trivial) constraint sets. I understand that we can't directly compare to the owned constraint set above, but could we instead maybe make this a snapshot test that asserts on the pretty-printed structure of the BDD? Or is the structure too complicated to be verified by inspection (to be semantically equivalent to owned)?
Similarly, I would have hoped that we could do something like that for a basic union/intersection/difference operation (like in the tests above) to see if the structure looks like we expect it to?
There was a problem hiding this comment.
Ah, good idea re snapshot tests! That will let us replace all of the has_uncertain_branch tests, too, since we'll see the uncertain branch in the graph output.
It looks like these properties would be fulfilled by almost all (non-trivial) constraint sets.
This also suggests that we could use some property tests here. I've added a TODO comment for that.
The main drawbacks are the increase in complexity of the construction and graph walking rules; and the increase memory usage, since every internal node now has an additional outgoing edge. It's only one extra Otherwise it should be a pure win. The operations that we delay through the laziness are the same ones we would have performed eagerly before. And a TDD with
One way to look at it is that intersections already had a special treatment, in that the constraints along a path through a BDD are AND-ed together. So big intersections tend not to blow up in size the same way, because they naturally collapse to a small compact BDD, with a small number of paths in it. (A union of intersections might blow up, but that's because of the union, not because of the intersection, and the efficiencies added by this representation should help there too.)
Can do 👍 |
|
This PR appears to cut around 50% off the execution time for the snippet in astral-sh/ty#3039 (if you have |
| interior | ||
| .if_true | ||
| .or(builder, interior.if_uncertain) | ||
| .negate(builder), |
There was a problem hiding this comment.
Not sure if that would be beneficial in some way, but I guess we could distribute the negation over the OR here via de Morgan while turning the OR into an AND.
There was a problem hiding this comment.
That will let us share the negation of the if_uncertain branch. Done
| // This is from Frisch's original description of TDDs. If self < other, we check self | ||
| // first. Instead of distributing other into the if_true and if_false branches, we | ||
| // "park" it in the if_uncertain branch. That causes us to only evaluate other "lazily" | ||
| // when needed. |
There was a problem hiding this comment.
I'm still trying to understand if there are potential drawbacks here. I would imagine that there are cases where we would have previously collapsed a Boolean expression purely based on structure. For example, maybe (A = str) OR always-satisfied would have collapsed to always-satisfied (?), but with this TDD structure, we now build the graph
<0> (A = str) 1/1
┡━₁ never
├─? always
└─₀ never
which probably means that we still (need to) evaluate the A = str constraint?
There was a problem hiding this comment.
I would imagine that there are cases where we would have previously collapsed a Boolean expression purely based on structure. For example, maybe
(A = str) OR always-satisfiedwould have collapsed toalways-satisfied(?)
That would be true with fully reduced BDDs, but with quasi-reduced we didn't collapse like that. (A = str) ∨ true would end up producing
(A = str)
┡━₁ always
└─₀ always
So the graph you mention is the correct quasi-reduced TDD.
which probably means that we still (need to) evaluate the
A = strconstraint?
This is purposeful: the quasi-reduction is needed so that the BDD structure includes all of the constraints that were used to create the constraint set, so that we can always make sure to include them in the solutions that we create. I'm investigating tracking that separately, which would allow us to go back to fully reduced BDDs and then we'd simplify as you suggest. (It would simplify to always in both BDDs and TDDs.)
There was a problem hiding this comment.
Is this TDD optimization still applicable if we go back to fully reduced BDDs?
There was a problem hiding this comment.
It is! Duboc describes it in terms of fully reduced BDDs
There was a problem hiding this comment.
FWIW, we explicitly check for this in Elixir's type system:
<0> (A = str) 1/1
┡━₁ never
├─? always
└─₀ never
Is equivalent to ? always and therefore we simply return ? always:
This important to remove nodes from the BDD so you don't end-up reordering "dead nodes".
093682a to
237d499
Compare
| setup_micro_case_with_dependencies( | ||
| "pandas_tdd", | ||
| &["pandas-stubs"], |
There was a problem hiding this comment.
At first I tried to minimize the example down to a single file, with all of the necessary pandas and numpy bits copied in. That would have let me embed that file as a micro benchmark. However, the result was having to pull in rather large parts of those libraries. So instead I added the ability to provide a list of dependencies for a micro benchmark.
There was a problem hiding this comment.
(I also confirmed that this benchmark shows the same ~2× speed reduction as astral-sh/ty#3039)
* main: Pass through ParamSpec relation check for non-overloaded signatures (#23927) [ty] Narrow keyword arguments when unpacking dictionary instances (#23436) [ty] Implement Duboc's TDD optimization for unions of constraint sets (#23881) Remove the repository security policy in favor of the organization one (#24008) Remove the repository code of conduct in favor of the organization one (#24007)
| other_interior.if_uncertain, | ||
| other_offset, | ||
| ), | ||
| other_offset, |
There was a problem hiding this comment.
Btw, we found there is a lot to gain from eagerly checking for bottom/top here. For example, in this case:
(C1 ∧ (C2 ∨ U2))
If C1 is empty, there is no need to compute the union and we got measurable benefits from it. So we encapsulated these checks here:
YMMV but I thought I'd mention!
For awhile we've known that our constraint sets can balloon in size when they involve large unions, and especially intersections of large unions. And we've had ecosystem runs (typically involving projects that depend on numpy) that trigger this pathological behavior. #23848 is the latest example, with a 25× performance regression for the
mesonbuildproject.Guillaume Duboc just defended his PhD thesis in January, and in §11.2, he calls out an optimization first introduced by Frisch for handling these kinds of unions more efficiently. The approach is also described in a post on the Elixir blog. (Frisch and Duboc are both using these BDDs to represent types, whereas we're using them to represent constraints on types, but the same ideas apply.)
Frisch describes the basic idea, which is to add an "uncertain" branch to each BDD node, turning them into ternary decision diagrams (TDDs). Frisch also provides TDD construction rules for union (OR), intersection (AND), and difference. Duboc takes this further and derives more efficient rules for intersection and difference.
This PR implements TDDs and Frisch's and Duboc's construction rules. I've confirmed that this completely eliminates the performance regression for
mesonbuildon #23848.More details on how this works, and why we get these savings:
The key benefit is that they let us represent unions more efficiently. As a simple example, with our quasi-reduced BDDs from before,
a ∨ bbecomes:With TDDs, the rhs of
a ∨ bis moved into the new "uncertain" branch:We already have some savings, since the TDD representation "allows unions to be kept lazy, postponing expansion until needed for intersection or difference operations". We "park" the rhs as-is into the uncertain branch, so we only need one (existing) copy of it. In the BDD case, we had to fold the rhs into the
a = truecase, creating an entire (modified) copy of its subgraph. That means we only need 2 nodes for the TDD instead of 3 for the BDD. (With only two variables, this might not seem like a lot, but we've actually gone from O(n²) nodes to O(n).)We get even more savings when with more complex formulas, like
(a ∨ b) ∧ (c ∨ d). With BDDs, we get:With TDDs, we get:
That's 7 nodes for the BDDs, and 4 for TDDs — still linear in the total number of variables, even though our BDDs are only quasi-reduced. And also note that we never had to modify the TDD that represented the rhs of the AND (
t3 = c ∨ d).