[ty] Add constraint set implementation by dcreager · Pull Request #19997 · astral-sh/ruff

dcreager · 2025-08-19T22:16:41Z

This PR adds an implementation of constraint sets.

An individual constraint restricts the specialization of a single typevar to be within a particular lower and upper bound: the typevar can only specialize to types that are a supertype of the lower bound, and a subtype of the upper bound. (Note that lower and upper bounds are fully static; we take the bottom and top materializations of the bounds to remove any gradual forms if needed.) Either bound can be “closed” (where the bound is a valid specialization), or “open” (where it is not).

You can then build up more complex constraint sets using union, intersection, and negation operations. We use a disjunctive normal form (DNF) representation, just like we do for types: a constraint set is the union of zero or more clauses, each of which is the intersection of zero or more individual constraints. Note that the constraint set that contains no clauses is never satisfiable (⋃ {} = 0); and the constraint set that contains a single clause, which contains no constraints, is always satisfiable (⋃ {⋂ {}} = 1).

One thing to note is that this PR does not change the logic of the actual assignability checks, and in particular, we still aren't ever trying to create an "individual constraint" that constrains a typevar. Technically we're still operating only on bools, since we only ever instantiate C::always_satisfiable (i.e., true) and C::unsatisfiable (i.e., false) in the has_relation_to methods. So if you thought that #19838 introduced an unnecessarily complex stand-in for bool, well here you go, this one is worse! (But still seemingly not yielding a performance regression!) The next PR in this series, #20093, is where we will actually create some non-trivial constraint sets and use them in anger.

That said, the PR does go ahead and update the assignability checks to use the new ConstraintSet type instead of bool. That part is fairly straightforward since we had already updated the assignability checks to use the Constraints trait; we just have to actively choose a different impl type. (For the is_whatever variants, which still return a bool, we have to convert the constraint set, but the explicit is_always_satisfiable calls serve as nice documentation of our intent.)

dcreager · 2025-08-19T22:16:47Z

I'm excited to see how much slower this is than bool... 😬

github-actions · 2025-08-19T22:18:36Z

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

github-actions · 2025-08-19T22:21:56Z

`mypy_primer` results

No ecosystem changes detected ✅
No memory usage changes detected ✅

codspeed-hq · 2025-08-21T01:55:17Z

CodSpeed WallTime Performance Report

Merging #19997 will not alter performance

_{Comparing dcreager/dummy-constraint-sets (5257624) with main (5c2d4d8)}

Summary

✅ 8 untouched benchmarks

dcreager · 2025-08-28T00:51:08Z

This is ready for review! #20093 is the real proof that this representation works well. In some ways, this PR is just a setup for that, even though we're introducing a pretty complex new data structure here.

AlexWaygood

This looks cool! I haven't done a deep review of the code for correctness -- this is mainly a docs review :-)

crates/ty_python_semantic/src/types.rs

crates/ty_python_semantic/src/types/constraints.rs

crates/ty_python_semantic/src/types/display.rs

AlexWaygood · 2025-08-28T12:33:10Z

Should this PR still have "WIP" in its title? 😄

crates/ty_python_semantic/src/types/constraints.rs

dcreager · 2025-08-28T12:59:43Z

Should this PR still have "WIP" in its title? 😄

Nope! Removed

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

* main: Fix mdtest ignore python code blocks (#20139) [ty] add support for cyclic legacy generic protocols (#20125) [ty] add cycle detection for find_legacy_typevars (#20124) Use new diff rendering format in tests (#20101) [ty] Fix 'too many cycle iterations' for unions of literals (#20137) [ty] No boundness analysis for implicit instance attributes (#20128) Bump 0.12.11 (#20136) [ty] Benchmarks for problematic implicit instance attributes cases (#20133) [`pyflakes`] Fix `allowed-unused-imports` matching for top-level modules (`F401`) (#20115) Move GitLab output rendering to `ruff_db` (#20117) [ty] Evaluate reachability of non-definitely-bound to Ambiguous (#19579) [ty] Introduce a representation for the top/bottom materialization of an invariant generic (#20076) [`flake8-async`] Implement `blocking-http-call-httpx` (`ASYNC212`) (#20091) [ty] print diagnostics with fully qualified name to disambiguate some cases (#19850) [`ruff`] Preserve relative whitespace in multi-line expressions (`RUF033`) (#19647)

carljm

This is really clean and elegant!

A few thoughts, none of them blocking this PR:

There will be a lot of has_relation_to checks where we collect constraints but never evaluate them for anything other than always-satisfied or never-satisfied. Will there be opportunities to improve performance on those checks if we know up-front that all we care about is always, or all we care about is never? It seems like it could potentially allow us to short-circuit a lot of work. (Something to explore in a future PR, not now.)
My recursion spidey-sense tingles a bit about the fact that we use a ConstraintSet in evaluating has_relation_to, and building a ConstraintSet involves a lot of is_subtype_of checks on upper and lower bound types. Is there potential for stack overflow here? Do we need anything additional to prevent that? Can we get into a situation where evaluating a subtype relation causes us to build a constraint set that requires evaluating the original subtype relation? If so, our CycleDetector on has_relation_to wouldn't help, because it would be separate is_subtype_of checks.
Possibly related to (2): the spec says typevar bounds/constraints cannot be generic, but there's been recent discussion of lifting that requirement, and it sounds like Pyrefly will experiment with that. It seems to me that we're well-positioned for that as well (you'd just end up adding constraints on the nested typevar, too), but maybe something to consider.

crates/ty_python_semantic/src/types/constraints.rs

carljm · 2025-08-28T21:53:00Z

crates/ty_python_semantic/src/types/constraints.rs

+                    // If two clauses cancel out to 0, that does NOT cause the entire set to become
+                    // 0.  We need to keep whatever clauses have already been added to the result,
+                    // and also need to copy over any later clauses that we hadn't processed yet.
+                    self.clauses.extend(existing_clauses);
+                    return;


Correctness here depends on the invariant that a single new clause can only ever simplify-to-never with one existing clause (i.e. it can't cancel out two different existing clauses.) How do we know that to be the case here? Below with the Simplified case, in contrast, we explicitly handle the possibility that the new clause may simplify with a later clause.

Any existing clauses must already be simplified relative to each other. So I think that for a new clause to cancel out more than one existing clause, it would have to do it in multiple steps, in a confluent way. So the new clause would "partially" simplify against the first existing clause that we encounter (i.e. simplify a bit but not all the way to 0). (That would trigger the Simplified branch below, where we carry the simplified result over to check against later existing clauses.) Then that partially simplified clause would simplify the "rest of the way" to 0 when we encounter the second (relevant) existing clause. And the "confluent" part means that it would need to happen regardless of the order that the two existing clauses appear in the original result.

I have not done a proof that ☝️ holds, but that's my intuition for why it should™ work.

crates/ty_python_semantic/src/types/constraints.rs

carljm · 2025-08-28T22:24:52Z

crates/ty_python_semantic/src/types/constraints.rs

+        //     # `1 ∪ (T ≤ int)`
+        //     # simplifies via saturation to
+        //     # `T ≤ int`
+        //     x: A[U] | A[V]


I could be missing something here: I can see how we can abstractly say that these constraints apply here, but concretely I don't think this code would ever result in us creating a ConstraintSet at all? There is no assignment here (where we'd create a ConstraintSet ephemerally in has_relation_to_impl, just in order to check if its always_satisfiable), nor is there a call to a generic function or constructor, where we'd create a ConstraintSet across multiple assignability checks (for each argument) and then solve it in order to generate a specialization.

I think to the extent that there is value in having Python examples (I'm not convinced that it's useful in code at this level of abstraction), it should ideally be examples where we would actually have to exercise the code in question in order to arrive at a correct type-checking answer in the Python example. I'm not quite seeing that in these examples; they are more like re-stating the set theory with a different syntax.

That said, I also don't think we should spend more time right now on improving these examples, so I'm fine leaving them as-is; this is more of a thought for future.

carljm · 2025-08-28T22:32:15Z

crates/ty_python_semantic/src/types/constraints.rs

+        if self.subsumes_via_intersection(db, &other) {
+            return Simplifiable::Simplified(other);
+        }
+        if other.subsumes_via_intersection(db, &self) {
+            return Simplifiable::Simplified(self);
+        }


We do this via two merged iterations, but I think it can be easily done with a single iteration and a tri-valued return?

Maybe doesn't matter in practice, depends how hot this ends up being in practice, and how many multi-constraint clauses we see.

Added a TODO to consider this

crates/ty_python_semantic/src/types/constraints.rs

Co-authored-by: Carl Meyer <carl@astral.sh>

dcreager

There will be a lot of has_relation_to checks where we collect constraints but never evaluate them for anything other than always-satisfied or never-satisfied. Will there be opportunities to improve performance on those checks if we know up-front that all we care about is always, or all we care about is never? It seems like it could potentially allow us to short-circuit a lot of work. (Something to explore in a future PR, not now.)

I don't think this would give correct results. This is related to my comment from last week #19838 (comment), and you can see it in the draft of #20093. In that PR I've moved around the non-inferrable typevar match arms in has_relation_to, because we no longer have to be careful about doing some typevar checks before we handle the connectives, and others after. We can rely on how we combine the constraints from the recursive calls to let partially satisfiable recursive constraint sets either (a) "build up" towards 1, or (b) "cancel out" towards 0. Doing so requires having the full constraint sets available, so that we can look at their structure to see what they do when unioned or intersected together. Doing that on bool loses that detail, leading to wrong answers.

My recursion spidey-sense tingles a bit about the fact that we use a ConstraintSet in evaluating has_relation_to, and building a ConstraintSet involves a lot of is_subtype_of checks on upper and lower bound types. Is there potential for stack overflow here? Do we need anything additional to prevent that? Can we get into a situation where evaluating a subtype relation causes us to build a constraint set that requires evaluating the original subtype relation? If so, our CycleDetector on has_relation_to wouldn't help, because it would be separate is_subtype_of checks.

If the bounds of a constraint don't contain any typevars (a "concrete" type), then I think we're okay, since calculating subtyping of two concrete types can only produce true, false, and combinations of those. (If there are no typevars in the type, then there's nothing to create an AtomicConstraint for.) And so we never hit any of the new logic for combining and simplifying constraints.

If there are bounds that do contain typevars, we do have to worry about this — and the way POPL15 etc solve this is by introducing an ordering on typevars, and saying that typevar bounds can only reference other typevars that are smaller according to that ordering. That ensures that you don't get cycles in the "bounds graph". I figure we'll just use Salsa IDs as our ordering when we get to that part.

Possibly related to (2): the spec says typevar bounds/constraints cannot be generic, but there's been recent discussion of lifting that requirement, and it sounds like Pyrefly will experiment with that. It seems to me that we're well-positioned for that as well (you'd just end up adding constraints on the nested typevar, too), but maybe something to consider.

I think we will already have to support typevars that have constraints involving other typevars, to handle things like calling a generic function (and inferring its specialization) from inside another (such that the constraints of the calling function are needed to figure out the valid specializations of the called function). So at that point it should be no problem to have typevar bounds mention other typevars, since that would just translate into a constraint that can already contain other typevars. (Modulo the bit above about using an artificial ordering to keep the bounds graph acyclic.)

crates/ty_python_semantic/src/types/constraints.rs

dcreager · 2025-08-28T23:23:35Z

crates/ty_python_semantic/src/types/constraints.rs

+                    // If two clauses cancel out to 0, that does NOT cause the entire set to become
+                    // 0.  We need to keep whatever clauses have already been added to the result,
+                    // and also need to copy over any later clauses that we hadn't processed yet.
+                    self.clauses.extend(existing_clauses);
+                    return;


Any existing clauses must already be simplified relative to each other. So I think that for a new clause to cancel out more than one existing clause, it would have to do it in multiple steps, in a confluent way. So the new clause would "partially" simplify against the first existing clause that we encounter (i.e. simplify a bit but not all the way to 0). (That would trigger the Simplified branch below, where we carry the simplified result over to check against later existing clauses.) Then that partially simplified clause would simplify the "rest of the way" to 0 when we encounter the second (relevant) existing clause. And the "confluent" part means that it would need to happen regardless of the order that the two existing clauses appear in the original result.

I have not done a proof that ☝️ holds, but that's my intuition for why it should™ work.

dcreager · 2025-08-28T23:27:55Z

crates/ty_python_semantic/src/types/constraints.rs

+        if self.subsumes_via_intersection(db, &other) {
+            return Simplifiable::Simplified(other);
+        }
+        if other.subsumes_via_intersection(db, &self) {
+            return Simplifiable::Simplified(self);
+        }


Added a TODO to consider this

crates/ty_python_semantic/src/types/constraints.rs

This PR adds an implementation of constraint sets. An individual constraint restricts the specialization of a single typevar to be within a particular lower and upper bound: the typevar can only specialize to types that are a supertype of the lower bound, and a subtype of the upper bound. (Note that lower and upper bounds are fully static; we take the bottom and top materializations of the bounds to remove any gradual forms if needed.) Either bound can be “closed” (where the bound is a valid specialization), or “open” (where it is not). You can then build up more complex constraint sets using union, intersection, and negation operations. We use a disjunctive normal form (DNF) representation, just like we do for types: a _constraint set_ is the union of zero or more _clauses_, each of which is the intersection of zero or more individual constraints. Note that the constraint set that contains no clauses is never satisfiable (`⋃ {} = 0`); and the constraint set that contains a single clause, which contains no constraints, is always satisfiable (`⋃ {⋂ {}} = 1`). One thing to note is that this PR does not change the logic of the actual assignability checks, and in particular, we still aren't ever trying to create an "individual constraint" that constrains a typevar. Technically we're still operating only on `bool`s, since we only ever instantiate `C::always_satisfiable` (i.e., `true`) and `C::unsatisfiable` (i.e., `false`) in the `has_relation_to` methods. So if you thought that astral-sh#19838 introduced an unnecessarily complex stand-in for `bool`, well here you go, this one is worse! (But still seemingly not yielding a performance regression!) The next PR in this series, astral-sh#20093, is where we will actually create some non-trivial constraint sets and use them in anger. That said, the PR does go ahead and update the assignability checks to use the new `ConstraintSet` type instead of `bool`. That part is fairly straightforward since we had already updated the assignability checks to use the `Constraints` trait; we just have to actively choose a different impl type. (For the `is_whatever` variants, which still return a `bool`, we have to convert the constraint set, but the explicit `is_always_satisfiable` calls serve as nice documentation of our intent.) --------- Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Carl Meyer <carl@astral.sh>

…#20306) The constraint representation that we added in #19997 was subtly wrong, in that it didn't correctly model that type assignability is a _partial_ order — it's possible for two types to be incomparable, with neither a subtype of the other. That means the negation of a constraint like `T ≤ t` (typevar `T` must be a subtype of `t`) is **_not_** `t < T`, but rather `t < T ∨ T ≁ t` (using ≁ to mean "not comparable to"). That means we need to update our constraint representation to be an enum, so that we can track both _range_ constraints (upper/lower bound on the typevar), and these new _incomparable_ constraints. Since we need an enum now, that also lets us simplify how we were modeling range constraints. Before, we let the lower/upper bounds be either open (<) or closed (≤). Now, range constraints are always closed, and we add a third kind of constraint for _not equivalent_ (≠). We can translate an open upper bound `T < t` into `T ≤ t ∧ T ≠ t`. We already had the logic for doing adding _clauses_ to a _set_ by doing a pairwise simplification. We copy that over to where we add _constraints_ to a _clause_. To calculate the intersection or union of two constraints, the new enum representation makes it easy to break down all of the possibilities into a small number of cases: intersect range with range, intersect range with not-equivalent, etc. I've done the math [here](https://dcreager.net/theory/constraints/) to show that the simplifications for each of these cases is correct.

AlexWaygood added the ty Multi-file analysis & type inference label Aug 19, 2025

dcreager force-pushed the dcreager/dummy-constraint-sets branch from 594112e to cba78ca Compare August 21, 2025 01:44

dcreager mentioned this pull request Aug 21, 2025

[ty] Perform assignability etc checks using new Constraints trait #19838

Merged

Base automatically changed from dcreager/relation-with-constraints to main August 21, 2025 13:30

dcreager force-pushed the dcreager/dummy-constraint-sets branch 4 times, most recently from e34afe1 to d5c49ba Compare August 26, 2025 02:11

dcreager mentioned this pull request Aug 26, 2025

[ty] WIP: Check typevar assignability using constraint sets #20093

Draft

dcreager force-pushed the dcreager/dummy-constraint-sets branch 2 times, most recently from c1441d2 to c70e3bc Compare August 27, 2025 01:59

dcreager added 13 commits August 27, 2025 15:15

add constraint sets

b644025

use constraint sets instead of bool everywhere!

3dea816

negative constraints

650655b

simplify IntersectionResult

a5fa906

new clause might subsume more than one existing

d9ff400

add some nice ascii art

f95056a

call binding is "ever assignable", not "always"!

ae2bf01

display impl for constraint sets

f06c0f6

simplify constraint subsumes

6eee16c

IntersectionResult → Simplified

cba0393

rework union_clause

443e05d

sort clause constraints by typevar

7a92ea8

simplify clauses via union

7e8b1da

dcreager force-pushed the dcreager/dummy-constraint-sets branch from c70e3bc to f06c0f6 Compare August 27, 2025 19:15

better docs

b6861d6

dcreager requested review from AlexWaygood, carljm and sharkdp as code owners August 28, 2025 00:51

AlexWaygood reviewed Aug 28, 2025

View reviewed changes

crates/ty_python_semantic/src/types/constraints.rs Show resolved Hide resolved

crates/ty_python_semantic/src/types/constraints.rs Outdated Show resolved Hide resolved

dcreager changed the title ~~[ty] WIP: Add constraint set implementation~~ [ty] Add constraint set implementation Aug 28, 2025

dcreager and others added 11 commits August 28, 2025 09:07

Apply suggestions from code review

8a85909

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

revert one const

d1d4545

reflow

a49e14d

no Copy for AtomicConstraints

b6e42a0

document smallvec choices

b24d95d

expect not allow dead code

80a6b61

clarify docs about rendering constraints

dac9742

better doc comment

c36e1ee

add python example

aa84703

add python examples for union simplification

6be181c

carljm approved these changes Aug 28, 2025

View reviewed changes

dcreager and others added 3 commits August 28, 2025 19:32

Apply suggestions from code review

b66e3ab

Co-authored-by: Carl Meyer <carl@astral.sh>

consider one-pass subsumption check

b37f939

types are partially ordered

5257624

dcreager commented Aug 28, 2025

View reviewed changes

dcreager merged commit a8039f8 into main Aug 29, 2025
38 checks passed

dcreager deleted the dcreager/dummy-constraint-sets branch August 29, 2025 00:04

dcreager mentioned this pull request Sep 8, 2025

[ty] Use partial-order-friendly representation of typevar constraints #20306

Merged

Conversation

dcreager commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcreager commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Diagnostic diff on typing conformance tests

Uh oh!

github-actions bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

codspeed-hq bot commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging #19997 will not alter performance

Summary

Uh oh!

dcreager commented Aug 28, 2025

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AlexWaygood commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

dcreager commented Aug 28, 2025

Uh oh!

carljm left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dcreager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dcreager commented Aug 19, 2025 •

edited

Loading

github-actions bot commented Aug 19, 2025 •

edited

Loading

github-actions bot commented Aug 19, 2025 •

edited

Loading

`mypy_primer` results

codspeed-hq bot commented Aug 21, 2025 •

edited

Loading

carljm left a comment •

edited

Loading