Skip to content

[ty] Implement Duboc's TDD optimization for unions of constraint sets#23881

Merged
dcreager merged 19 commits intomainfrom
dcreager/more-tdd
Mar 17, 2026
Merged

[ty] Implement Duboc's TDD optimization for unions of constraint sets#23881
dcreager merged 19 commits intomainfrom
dcreager/more-tdd

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Mar 10, 2026

For awhile we've known that our constraint sets can balloon in size when they involve large unions, and especially intersections of large unions. And we've had ecosystem runs (typically involving projects that depend on numpy) that trigger this pathological behavior. #23848 is the latest example, with a 25× performance regression for the mesonbuild project.

Guillaume Duboc just defended his PhD thesis in January, and in §11.2, he calls out an optimization first introduced by Frisch for handling these kinds of unions more efficiently. The approach is also described in a post on the Elixir blog. (Frisch and Duboc are both using these BDDs to represent types, whereas we're using them to represent constraints on types, but the same ideas apply.)

Frisch describes the basic idea, which is to add an "uncertain" branch to each BDD node, turning them into ternary decision diagrams (TDDs). Frisch also provides TDD construction rules for union (OR), intersection (AND), and difference. Duboc takes this further and derives more efficient rules for intersection and difference.

This PR implements TDDs and Frisch's and Duboc's construction rules. I've confirmed that this completely eliminates the performance regression for mesonbuild on #23848.


More details on how this works, and why we get these savings:

The key benefit is that they let us represent unions more efficiently. As a simple example, with our quasi-reduced BDDs from before, a ∨ b becomes:

<n1> a
┡━₁ <n2> b
│   ┡━₁ true
│   └─₀ true
└─₀ <n3> b
    ┡━₁ true
    └─₀ false

With TDDs, the rhs of a ∨ b is moved into the new "uncertain" branch:

<t1> a
┡━₁ true
├─? <t2> b
│   ┡━₁ true
│   ├─? false
│   └─₀ false
└─₀ false

We already have some savings, since the TDD representation "allows unions to be kept lazy, postponing expansion until needed for intersection or difference operations". We "park" the rhs as-is into the uncertain branch, so we only need one (existing) copy of it. In the BDD case, we had to fold the rhs into the a = true case, creating an entire (modified) copy of its subgraph. That means we only need 2 nodes for the TDD instead of 3 for the BDD. (With only two variables, this might not seem like a lot, but we've actually gone from O(n²) nodes to O(n).)

We get even more savings when with more complex formulas, like (a ∨ b) ∧ (c ∨ d). With BDDs, we get:

<n1> a                      <n7> a
┡━₁ <n2> b                  ┡━₁ <n8> b
│   ┡━₁ true                │   ┡━₁ <n4> c
│   └─₀ true                │   │   ┡━₁ <n5> d
└─₀ <n3> b                  │   │   │   ┡━₁ true
    ┡━₁ true                │   │   │   └─₀ true
    └─₀ false               │   │   └─₀ <n6> d
                            │   │       ┡━₁ true
<n4> c                      │   │       └─₀ false
┡━₁ <n5> d                  │   └─₀ <n4> SHARED
│   ┡━₁ true                └─₀ <n9> b
│   └─₀ true                    ┡━₁ <n4> SHARED
└─₀ <n6> d                      └─₀ false
    ┡━₁ true
    └─₀ false

With TDDs, we get:

<t1> a                      <t5> a
┡━₁ true                    ┡━₁ <t3> c
├─? <t2> b                  │   ┡━₁ true
│   ┡━₁ true                │   ├─? <t4> d
│   ├─? false               │   │   ┡━₁ true
│   └─₀ false               │   │   ├─? false
└─₀ false                   │   │   └─₀ false
                            │   └─₀ false
<t3> c                      ├─? <t6> b
┡━₁ true                    │   ┡━₁ <t3> SHARED
├─? <t4> d                  │   ├─? false
│   ┡━₁ true                │   └─₀ false
│   ├─? false               └─₀ false
│   └─₀ false
└─₀ false

That's 7 nodes for the BDDs, and 4 for TDDs — still linear in the total number of variables, even though our BDDs are only quasi-reduced. And also note that we never had to modify the TDD that represented the rhs of the AND (t3 = c ∨ d).

@dcreager dcreager added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Mar 10, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Mar 10, 2026

Typing conformance results

No changes detected ✅

Current numbers
The percentage of diagnostics emitted that were expected errors held steady at 85.29%. The percentage of expected errors that received a diagnostic held steady at 78.13%. The number of fully passing files held steady at 64/132.

@astral-sh-bot
Copy link

astral-sh-bot bot commented Mar 10, 2026

mypy_primer results

Changes were detected when running on open source projects
scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:99:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 60 diagnostics
+ Found 59 diagnostics

@astral-sh-bot
Copy link

astral-sh-bot bot commented Mar 10, 2026

Memory usage report

Memory usage unchanged ✅

@dcreager dcreager force-pushed the dcreager/more-tdd branch from 5eb95c3 to d50b3f1 Compare March 10, 2026 23:05
@astral-sh-bot
Copy link

astral-sh-bot bot commented Mar 11, 2026

ecosystem-analyzer results

Lint rule Added Removed Changed
invalid-await 40 0 0
invalid-return-type 1 0 0
Total 41 0 0

Full report with detailed diff (timing results)

@dcreager
Copy link
Member Author

dcreager commented Mar 11, 2026

Ah good, this only changes performance characteristics, not semantics (as expected). The ecosystem diagnostic changes are all noise in the flaky projects. There are some ecosystem projects reporting longer processing times, but for the most part they are all small-ish projects. scikit-learn is the only beefy project with longer times — 5 seconds vs 4.4. No other large projects have performance regressions. freqtrade and pandas-stubs both have ~2× improvements.

@AlexWaygood
Copy link
Member

Ah good, this only changes performance characteristics, not semantics (as expected). The ecosystem diagnostic changes are all noise in the flaky projects. There are some ecosystem projects reporting longer processing times, but for the most part they are all small-ish projects. scikit-learn is the only beefy project with longer times — 5 seconds vs 4.4. No other large projects have performance regressions. freqtrade and pandas-stubs both have ~2× improvements.

A pretty rosy picture given by our Codspeed benchmarks too! None of them appear quite big enough to trigger Codspeed's PR comment, but nice speedups across the board, it looks like: https://codspeed.io/astral-sh/ruff/branches/dcreager%2Fmore-tdd?utm_source=github&utm_medium=check&utm_content=button

Comment on lines +1992 to +1997
// iff(a, b) = (a ∧ b) ∨ (¬a ∧ ¬b)
let a_and_b = self.and_inner(builder, other, other_offset);
let not_a_and_not_b =
self.negate(builder)
.and_inner(builder, other.negate(builder), other_offset);
a_and_b.or(builder, not_a_and_not_b)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to derive TDD construction rules for iff, and I don't think this will be any less performant, especially since all of the desugared operations are cached.

)
.unwrap_or(ALWAYS_FALSE);
if_true.or(builder, if_false)
let if_uncertain = path
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a copy of the existing recursion rules for the true and false branches.

self,
db: &'db dyn Db,
builder: &ConstraintSetBuilder<'db>,
negated: bool,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This negated-or-not logic really belongs to ConstraintAssignment, not to Constraint, so I've moved most of this down below.

@dcreager dcreager marked this pull request as ready for review March 11, 2026 15:03
@carljm
Copy link
Contributor

carljm commented Mar 11, 2026

This is so cool.

Comment on lines +47 to +49
//! Our TDD operations follow Duboc's algorithms (union, intersection) with one correction: the
//! `n1 > n2` case for difference uses the original Frisch formulation, since Duboc's restructuring
//! of that case is incorrect. Negation is defined as `1 \ T` (difference from the universe), and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Duboc's reformulations are more efficient, can we find the proper solution for that one case (instead of using the inefficient version)? Or is that too complex? If the n2 > n1 case for difference is correct, is there some symmetry/duality that would allow us to write down the n1 > n2 solution by applying simple replacements?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I can correct this comment — the mistake in the thesis is just a typo, and as you say, it's easy to construct the right more efficient rule from the n1 < n2 case, and just swapping the 1s and 2s. And that's what we've implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just removed this snippet. We don't implement difference directly; it's only used because of how we now implement negate(C) as 1 \ C. And the comment we have down below for negate describes the result without having to mention that it's expanded from the definition of difference. That makes the typo irrelevant to us.

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic work!

I won't have enough time for a full review today, but I'd like to ask some high level (and very beginner) questions:

  • I understand that this TDD structure optimizes for cases of large unions by making union operations "lazy". What is the drawback of doing that (if any)? Do we sacrifice any of the "good" properties of BDDs by making these operations lazy?
  • There's usually a duality between unions and intersections, and it looks like this TDD structure "violates" that by treating unions in a special way. Are there "dual" examples where we construct large intersections that are still problematic? Or even become worse by doing this?

The performance results on ecosystem projects look great, but it might still be worth writing a microbenchmark for this if possible?

Comment on lines +5598 to +5620
// The loaded constraint set should NOT be never satisfied (it's a valid union)
assert!(!loaded.is_never_satisfied(&db));

// The loaded constraint set should NOT be always satisfied (it requires specific types)
assert!(!loaded.is_always_satisfied(&db));

// Verify semantic equivalence: loaded ∧ ¬loaded should be never satisfied
let negated = loaded.negate(&db, &builder2);
assert!(
loaded
.and(&db, &builder2, || negated)
.is_never_satisfied(&db)
);

// loaded ∨ ¬loaded should be always satisfied
assert!(
loaded
.or(&db, &builder2, || negated)
.is_always_satisfied(&db)
);

// Also verify iff(loaded, loaded) is always satisfied
assert!(loaded.iff(&db, &builder2, loaded).is_always_satisfied(&db));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these properties would be fulfilled by almost all (non-trivial) constraint sets. I understand that we can't directly compare to the owned constraint set above, but could we instead maybe make this a snapshot test that asserts on the pretty-printed structure of the BDD? Or is the structure too complicated to be verified by inspection (to be semantically equivalent to owned)?

Similarly, I would have hoped that we could do something like that for a basic union/intersection/difference operation (like in the tests above) to see if the structure looks like we expect it to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good idea re snapshot tests! That will let us replace all of the has_uncertain_branch tests, too, since we'll see the uncertain branch in the graph output.

It looks like these properties would be fulfilled by almost all (non-trivial) constraint sets.

This also suggests that we could use some property tests here. I've added a TODO comment for that.

@dcreager
Copy link
Member Author

  • I understand that this TDD structure optimizes for cases of large unions by making union operations "lazy". What is the drawback of doing that (if any)? Do we sacrifice any of the "good" properties of BDDs by making these operations lazy?

The main drawbacks are the increase in complexity of the construction and graph walking rules; and the increase memory usage, since every internal node now has an additional outgoing edge. It's only one extra u32, but not nothing.

Otherwise it should be a pure win. The operations that we delay through the laziness are the same ones we would have performed eagerly before. And a TDD with false (null/nothing) for its if_uncertain branch is exactly equivalent to the previous BDD.

  • There's usually a duality between unions and intersections, and it looks like this TDD structure "violates" that by treating unions in a special way. Are there "dual" examples where we construct large intersections that are still problematic? Or even become worse by doing this?

One way to look at it is that intersections already had a special treatment, in that the constraints along a path through a BDD are AND-ed together. So big intersections tend not to blow up in size the same way, because they naturally collapse to a small compact BDD, with a small number of paths in it. (A union of intersections might blow up, but that's because of the union, not because of the intersection, and the efficiencies added by this representation should help there too.)

The performance results on ecosystem projects look great, but it might still be worth writing a microbenchmark for this if possible?

Can do 👍

@AlexWaygood
Copy link
Member

This PR appears to cut around 50% off the execution time for the snippet in astral-sh/ty#3039 (if you have pandas-stubs installed), so that might be helpful when it comes to writing a microbenchmark!

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +2628 to +2631
interior
.if_true
.or(builder, interior.if_uncertain)
.negate(builder),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that would be beneficial in some way, but I guess we could distribute the negation over the OR here via de Morgan while turning the OR into an AND.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will let us share the negation of the if_uncertain branch. Done

Comment on lines +2673 to +2676
// This is from Frisch's original description of TDDs. If self < other, we check self
// first. Instead of distributing other into the if_true and if_false branches, we
// "park" it in the if_uncertain branch. That causes us to only evaluate other "lazily"
// when needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to understand if there are potential drawbacks here. I would imagine that there are cases where we would have previously collapsed a Boolean expression purely based on structure. For example, maybe (A = str) OR always-satisfied would have collapsed to always-satisfied (?), but with this TDD structure, we now build the graph

<0> (A = str) 1/1
┡━₁ never
├─? always
└─₀ never

which probably means that we still (need to) evaluate the A = str constraint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would imagine that there are cases where we would have previously collapsed a Boolean expression purely based on structure. For example, maybe (A = str) OR always-satisfied would have collapsed to always-satisfied (?)

That would be true with fully reduced BDDs, but with quasi-reduced we didn't collapse like that. (A = str) ∨ true would end up producing

(A = str)
┡━₁ always
└─₀ always

So the graph you mention is the correct quasi-reduced TDD.

which probably means that we still (need to) evaluate the A = str constraint?

This is purposeful: the quasi-reduction is needed so that the BDD structure includes all of the constraints that were used to create the constraint set, so that we can always make sure to include them in the solutions that we create. I'm investigating tracking that separately, which would allow us to go back to fully reduced BDDs and then we'd simplify as you suggest. (It would simplify to always in both BDDs and TDDs.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TDD optimization still applicable if we go back to fully reduced BDDs?

Copy link
Member Author

@dcreager dcreager Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is! Duboc describes it in terms of fully reduced BDDs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we explicitly check for this in Elixir's type system:

<0> (A = str) 1/1
┡━₁ never
├─? always
└─₀ never

Is equivalent to ? always and therefore we simply return ? always:

https://github.com/elixir-lang/elixir/blob/e17069e559331c89cd38c212811be3b359b07d97/lib/elixir/lib/module/types/descr.ex#L5865-L5868

This important to remove nodes from the BDD so you don't end-up reordering "dead nodes".

@dcreager dcreager force-pushed the dcreager/more-tdd branch from 093682a to 237d499 Compare March 17, 2026 01:42
Comment on lines +822 to +824
setup_micro_case_with_dependencies(
"pandas_tdd",
&["pandas-stubs"],
Copy link
Member Author

@dcreager dcreager Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I tried to minimize the example down to a single file, with all of the necessary pandas and numpy bits copied in. That would have let me embed that file as a micro benchmark. However, the result was having to pull in rather large parts of those libraries. So instead I added the ability to provide a list of dependencies for a micro benchmark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also confirmed that this benchmark shows the same ~2× speed reduction as astral-sh/ty#3039)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@dcreager dcreager merged commit c71e169 into main Mar 17, 2026
51 checks passed
@dcreager dcreager deleted the dcreager/more-tdd branch March 17, 2026 01:58
carljm added a commit that referenced this pull request Mar 17, 2026
* main:
  Pass through ParamSpec relation check for non-overloaded signatures (#23927)
  [ty] Narrow keyword arguments when unpacking dictionary instances (#23436)
  [ty] Implement Duboc's TDD optimization for unions of constraint sets (#23881)
  Remove the repository security policy in favor of the organization one (#24008)
  Remove the repository code of conduct in favor of the organization one (#24007)
other_interior.if_uncertain,
other_offset,
),
other_offset,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we found there is a lot to gain from eagerly checking for bottom/top here. For example, in this case:

(C1 ∧ (C2 ∨ U2))

If C1 is empty, there is no need to compute the union and we got measurable benefits from it. So we encapsulated these checks here:

https://github.com/elixir-lang/elixir/blob/e17069e559331c89cd38c212811be3b359b07d97/lib/elixir/lib/module/types/descr.ex#L6050-L6069

YMMV but I thought I'd mention!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecosystem-analyzer internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants