Skip to content

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Oct 21, 2025

This PR adds another useful simplification when rendering constraint sets: T = int instead of T = int ∧ T ≠ str. (The "smaller" constraint T = int implies the "larger" constraint T ≠ str. Constraint set clauses are intersections, and if one constraint in a clause implies another, we can throw away the "larger" constraint.)

While we're here, we also normalize the bounds of a constraint, so that we equate e.g. T ≤ int | str with T ≤ str | int, and change the ordering of BDD variables so that all constraints with the same typevar are ordered adjacent to each other.

Lastly, we also add a new display_graph helper method that prints out the full graph structure of a BDD.

@dcreager
Copy link
Member Author

Pushing this up for initial CI checks and to checkpoint my work. Will add some mdtests highlighting the rendering change tomorrow

@dcreager dcreager added internal An internal refactor or improvement ty Multi-file analysis & type inference labels Oct 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 21, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@github-actions
Copy link
Contributor

github-actions bot commented Oct 21, 2025

mypy_primer results

No ecosystem changes detected ✅
No memory usage changes detected ✅

@dcreager dcreager force-pushed the dcreager/constraint-display branch from 4c937b5 to 671582a Compare October 21, 2025 13:56
reveal_type(t1 | t2)
# revealed: ty_extensions.ConstraintSet[(U@_ = bool) ∨ (U@_ = str)]
reveal_type(u1 | u2)
# revealed: ty_extensions.ConstraintSet[((T@_ = bool) ∧ (U@_ = bool)) ∨ ((T@_ = bool) ∧ (U@_ = str)) ∨ ((T@_ = str) ∧ (U@_ = bool)) ∨ ((T@_ = str) ∧ (U@_ = str))]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, this rendered as

((T@_ = bool) ∧ (U@_ = bool) ∧ (U@_ ≠ str)) ∨ ((T@_ = bool) ∧ (U@_ = str)) ∨ ((T@_ = str) ∧ (T@_ ≠ bool) ∧ (U@_ = bool) ∧ (U@_ ≠ str)) ∨ (T@_ = str)

and the display_graph rendering for this BDD is

  (T@_ = str)
  ┡━₁ (U@_ = str)
  │   ┡━₁ always
  │   └─₀ (U@_ = bool)
  │       ┡━₁ always
  │       └─₀ never
  └─₀ (T@_ = bool)
      ┡━₁ (U@_ = str)
      │   ┡━₁ always
      │   └─₀ (U@_ = bool)
      │       ┡━₁ always
      │       └─₀ never
      └─₀ never

@dcreager dcreager marked this pull request as ready for review October 21, 2025 14:47
Comment on lines +1240 to +1248
// For a positive and negative constraint, the ranges have to be disjoint, and the
// positive range implies the negative range.
//
// |---------------|...self...|---|
// ..|---other---|................|
(
ConstraintAssignment::Positive(self_constraint),
ConstraintAssignment::Negative(other_constraint),
) => self_constraint.intersect(db, other_constraint).is_none(),
Copy link
Member Author

@dcreager dcreager Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the big one that lets us simplify T = int ∧ T ≠ str to T = int when displaying.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!


## Other simplifications

When rendering a constraint set, we transform the internal BDD representation into a DNF formula
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have stumbled across this terminology in one of your PRs before 😅

When you say "render" here, are you just talking about "display rendering"? I.e., this "rendering" should have no impact on semantics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean "render" in the sense of "create a visual representation of", like you'd render an image. It doesn't change the semantics at all, just how the constraint sets are displayed. I can change this to "display"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of have the same question as Micha in #21009 (review) in that case. If we do this simplification only when displaying these constraints, isn't there a risk that we'll be seeing something different to what is "actually happening behind the scenes" when we try to debug these constraints? The simplification logic isn't trivial! It feels like it would be safer to always apply these simplifications, so that we can be sure that what we see when we debug these constraints is actually what's being used in our subtyping/assignability logic by ty in production

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we're talking about two completely different representations, that are used for different purposes.

As an example, one of the display simplifications that this PR adds would simplify T = int ∧ T ≠ str into T = int. If you know that T is an int, it doesn't add any information to say that it is also not a str.

We could create a BDD for T = int ∧ T ≠ str. Depending on what ordering Salsa assigns IDs, the BDD structure you get is either

(T = int)
  ┡━₁ (T = str)
  │   ┡━₁ never
  │   └─₀ always
  └─₀ never

or

(T = str)
  ┡━₁ never
  └─₀ (T = int)
      ┡━₁ always
      └─₀ never

The equivalent transformation that you'd perform on the BDD would be something like "for any path through the BDD where T = int is true, you don't need to check T = str, because it must always be false".

We actually do already perform that transformation! But it's not necessarily going to simplify the BDD. It might actually make it more complex! For instance, if the BDD is checking other stuff as well, there might be other reasons we need a node that checks T = str. Since every node must have two outgoing edges, it might not be possible to remove it even if we wanted to.

(And going further into the weeds as an aside, in the BDD structure, we actually don't really ever want to throw information away, since it makes it easier to compare two BDDs without having to know in advance which particular constraints they check. The better analogue of the transformation we perform is "it's impossible for T = int to be true and T = str to be false, so when comparing two BDDs ignore any path with those assignments".)

So even though we have performed these transformations on the BDD already, we might still end up with constraints in the rendered formula that are redundant. And so I wanted to be able to print out the simplest formula that faithfully captures the BDD.

that we can be sure that what we see when we debug these constraints is actually what's being used in our subtyping/assignability logic by ty in production

The other possibility here is that we would avoid performing any display substitutions at all. Then the formula that we print out would be a more direct representation of the BDD structure. But I was finding that very confusing, because even simple cases end up looking more complex than they need to. (A prime example is that half of the time x ∨ y will display as x ∨ y, and half the time as (x ∧ ¬y) ∨ y.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. That makes sense to me :-)

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit out of my comfort zone here, so please forgive any dumb questions 😅

This PR adds another useful simplification when rendering constraint sets: T = int instead of T = int ∧ T ≠ str. (The "smaller" constraint T = int implies the "larger" constraint T ≠ str. Constraint set clauses are intersections, and if one constraint in a clause implies another, we can throw away the "larger" constraint.)

What's the reason why we only do this during rendering and don't apply this as a general simplification?

/// for a single typevar are guaranteed to be adjacent in the BDD structure. There are several
/// simplifications that we perform that operate on constraints with the same typevar, and this
/// ensures that we can find all candidate simplifications more easily.
fn ordering(self, db: &'db dyn Db) -> impl Ord {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever :)

@github-actions
Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dcreager dcreager merged commit 766ed5b into main Oct 22, 2025
41 checks passed
@dcreager dcreager deleted the dcreager/constraint-display branch October 22, 2025 17:39
dcreager added a commit that referenced this pull request Oct 22, 2025
* main: (65 commits)
  [ty] Some more simplifications when rendering constraint sets (#21009)
  [ty] Make `attributes.md` mdtests faster (#21030)
  [ty] Set `INSTA_FORCE_PASS` and `INSTA_OUTPUT` environment variables from mdtest.py (#21029)
  [ty] Fall back to `Divergent` for deeply nested specializations (#20988)
  [`ruff`] Autogenerate TypeParam nodes (#21028)
  [ty] Add assertions to ensure that we never call `KnownClass::Tuple.to_instance()` or similar (#21027)
  [`ruff`] Auto generate ast Pattern nodes (#21024)
  [`flake8-simplify`] Skip `SIM911` when unknown arguments are present (#20697)
  Render a diagnostic for syntax errors introduced in formatter tests (#21021)
  [ty] Support goto-definition on vendored typeshed stubs (#21020)
  [ty] Implement go-to for binary and unary operators (#21001)
  [ty] Avoid ever-growing default types (#20991)
  [syntax-errors] Name is parameter and global (#20426)
  [ty] Disable panicking mdtest (#21016)
  [ty] Fix completions at end of file (#20993)
  [ty] Fix out-of-order semantic token for function with regular argument after kwargs (#21013)
  [ty] Fix auto import for files with `from __future__` import (#20987)
  [`fastapi`] Handle ellipsis defaults in FAST002 autofix (`FAST002`) (#20810)
  [`ruff`] Skip autofix for keyword and `__debug__` path params (#20960)
  [`flake8-bugbear`] Skip `B905` and `B912` if <2 iterables and no starred arguments (#20998)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants