[ty] Build constraint set sequent maps lazily by dcreager · Pull Request #22577 · astral-sh/ruff

dcreager · 2026-01-14T18:04:45Z

Before, when building a SequentMap for a constraint set, we would immediately iterate through all of the constraints in the set, and compare each pair of them looking for intersection/implication relationships. It turns out that we often don't need to examine every pair when walking the BDD tree of a constraint set. Instead, we can visit each constraint as we encounter it for the first time in our BDD walk. We do still need to collect all of the constraints in the BDD to ensure that they remain ordered in a consistent way, but we can track that separately and without having to immediately build up the actual sequents.

astral-sh-bot · 2026-01-14T18:06:25Z

Typing conformance results

No changes detected ✅

astral-sh-bot · 2026-01-14T18:07:42Z

`mypy_primer` results

Changes were detected when running on open source projects

scikit-build-core (https://github.com/scikit-build/scikit-build-core)
- src/scikit_build_core/build/wheel.py:99:20: error[no-matching-overload] No overload of bound method `__init__` matches arguments
- Found 47 diagnostics
+ Found 46 diagnostics

static-frame (https://github.com/static-frame/static-frame)
- static_frame/core/index.py:580:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@loc, TVDtype@Index]`, found `InterGetItemLocReduces[Bottom[Series[Any, Any]] | Any, TVDtype@Index]`
+ static_frame/core/index.py:580:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@loc, TVDtype@Index]`, found `InterGetItemLocReduces[Any | Bottom[Series[Any, Any]], TVDtype@Index]`
- static_frame/core/node_selector.py:526:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@InterfaceSelectQuartet, Any]`, found `InterGetItemLocReduces[Unknown | Bottom[Series[Any, Any]], Any]`
+ static_frame/core/node_selector.py:526:16: error[invalid-return-type] Return type does not match returned value: expected `InterGetItemLocReduces[TVContainer_co@InterfaceSelectQuartet, Any]`, found `InterGetItemLocReduces[Bottom[Series[Any, Any]] | Unknown, Any]`

Memory usage changes were detected when running on open source projects

trio (https://github.com/python-trio/trio)
-     struct fields = ~11MB
+     struct fields = ~12MB

codspeed-hq · 2026-01-14T18:26:11Z

Merging this PR will improve performance by 5.14%

⚡ 1 improved benchmark
✅ 22 untouched benchmarks
⏩ 30 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	WallTime	`pydantic`	8 s	7.6 s	+5.14%

_{Comparing dcreager/lazy-sequent-map (30edce9) with main (6a2cc89)}

30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

MichaReiser · 2026-01-15T14:19:59Z

crates/ty_python_semantic/src/types/constraints.rs

+impl PartialEq for SequentMap<'_> {
+    fn eq(&self, _other: &Self) -> bool {
+        false
+    }
+}


I think this is the same as setting [no_eq] on the query (salsa will not do any backdating, meaning all queries reading the sequent_map of a particular interior node will re-run even if it creates the exact same SeqMap. Are there any other fields that we could base Eq on (e.g., the ones that don't change :)).

Looks very straightforward otherwise :)

I did confirm that this is equivalent to setting #[no_eq] on the query method. And if I do that, I can remove the PartialEq impl entirely.

But does that mean we would get a separate SequentMap each time we call the tracked query? My intent is that there will be one created for each interior node. (And the updated performance numbers suggests that's what's happening.) I'm okay with a different SequentMap being created for that interior node if it appears again in a later revision, since I think it's correct to invalidate that cache then.

Although maybe we do want to reuse the cache in later revisions? The interior node should entirely determine the contents of the BDD, and walking the BDD later on should yield the same results. Okay I think you've convinced me. (Assuming I understand your suggestion correctly.) To do this I can add the InteriorNode as a field of SequentMap, to record which node the sequent map belongs to, and then have that be the only field that PartialEq checks.

But does that mean we would get a separate SequentMap each time we call the tracked query? My intent is that there will be one created for each interior node. (And the updated performance numbers suggests that's what's happening.) I'm okay with a different SequentMap being created for that interior node if it appears again in a later revision, since I think it's correct to invalidate that cache then.

No, you get the same instance within the same revision and the instance is cached for as long as no data read by the ::sequent_map() query changes.

Taking exported_names as an example here because it's easier to explain the concept. Salsa re-exeuctes the exported_names query every time the file's AST changes. When Salsa's done, it compares the result from running exported_names the last time with the newly computed result. If the two results are equal, then the query didn't change (even though the AST changed). This allows Salsa to reuse the cached result for a query that only depends exported_names (or only depends on queries that all haven't changed).

If you set no_eq, then you opt out of this optimization and Salsa will always assume that the result changed when any of the query's inputs changed. Which is probably fine in your case.

The one thing we need to be careful is that the internal mutability code doesn't access db because a query reading a cached result wouldn't see all its dependencies, breaking Salsa's cache invalidation.

The one thing we need to be careful is that the internal mutability code doesn't access db because a query reading a cached result wouldn't see all its dependencies, breaking Salsa's cache invalidation.

Is this part true in general? That might be a deal-breaker for this approach, because the interior mutability code will definitely need to access the db.

Is this part true in general? That might be a deal-breaker for this approach, because the interior mutability code will definitely need to access the db.

It depends on what you read, but how salsa tracks dependencies is something I'd consider internal to Salsa (or at least requires a lot of documentation)

I don't think we add read dependencies for interned structs, but we used to (CC: @ibraheemdev). But calling any salsa query, reading a tracked field of a tracked struct, or reading any input makes this approach unsound.

Creating any new interned values I think would be unsound.

I guess so is reading because reading an interned value with low durability must propagate to the outer query. So it's not just about the dependencies, it's also about the query's metadata that need to be reflected accordingly

Okay that tells me I need to rethink this...

Yeah, interior mutability will not play well with Salsa here. If the interior mutability code creates an interned value without the sequent_map query having a dependency on that interned value, the interned value may be garbage collected, and later calls to sequent_map will read stale data.

I found a different way to do this that keeps the performance win but doesn't require interior mutability

crates/ty_python_semantic/src/types/constraints.rs

MichaReiser · 2026-01-15T14:29:38Z

Cool to see that the internal mutability is good for performance :)

astral-sh-bot · 2026-01-15T15:07:53Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2026-01-19T08:31:14Z

crates/ty_python_semantic/src/types/constraints.rs

-            "create sequent map",
-        );
-
+    fn path_assignments(self, db: &'db dyn Db) -> PathAssignments<'db> {


Nit: Maybe for a separate PR: Would it make sense to maybe use SmallVec here? (what's a "typcial" size of `constraints?)

Good idea, done! (I don't have specific numbers but it's definitely true that most constraint sets will have a smallish number of constraints. I chose 8 more or less at random)

MichaReiser · 2026-01-19T08:34:36Z

crates/ty_python_semantic/src/types/constraints.rs

-        cycle_initial=sequent_map_cycle_initial,
-        heap_size=ruff_memory_usage::heap_size,
-    )]
-    fn sequent_map(self, db: &'db dyn Db) -> SequentMap<'db> {


The old sequent_map query had cycle handling, but not all queries calling path_assignments have. Was it only the SequentMap::add call that could result in cycles? If so, are there any queries where we need to add cycle handling, now that the cycle is no longer "contained" by the sequent_map query?

Was it only the SequentMap::add call that could result in cycles?

Yes, because of the subtype checks that we have to perform to compare the lower/upper bounds of each constraint. That could cause a cycle if we had to create a sequent map while we were in the middle of inferring the lower/upper bound type.

I was using the mdtests + ecosystem tests to verify that the cycle handler isn't needed with this change. My intuition for why is that (a) the lazy processing delays the add calls enough that we're no longer in the middle of inferring the types of the lower/upper bounds, and (b) we might not have to analyze certain constraints at all anymore, since we only look at a constraint once we actually encounter it when walking a BDD tree.

MichaReiser · 2026-01-19T08:39:07Z

Nice. I don't have a lot of context on the BDD work but the cachng makes sense to me. Probably something that would also benefit from within-same-revision LRU caching, to cap the memory usage (see prefect).

AlexWaygood added the ty Multi-file analysis & type inference label Jan 14, 2026

dcreager added the internal An internal refactor or improvement label Jan 14, 2026

dcreager mentioned this pull request Jan 14, 2026

[ty] Track constraint set "support" separately; go back to fully reduced BDDs #22578

Draft

dcreager force-pushed the dcreager/lazy-sequent-map branch from 1987266 to df741d7 Compare January 15, 2026 14:16

MichaReiser reviewed Jan 15, 2026

View reviewed changes

crates/ty_python_semantic/src/types/constraints.rs Outdated Show resolved Hide resolved

dcreager marked this pull request as ready for review January 15, 2026 15:19

dcreager requested review from AlexWaygood, carljm and sharkdp as code owners January 15, 2026 15:19

carljm removed their request for review January 16, 2026 00:49

dcreager added 5 commits January 17, 2026 07:54

pass around sequent maps mutably

0c84416

move sequent map into PathAssignments

385a073

build sequent map lazily

2fe8d39

even lazier

99c8906

share sequent map work more safely

6555491

dcreager force-pushed the dcreager/lazy-sequent-map branch from 5b240fb to 6555491 Compare January 17, 2026 17:53

document the caching

2d07e1b

dcreager force-pushed the dcreager/lazy-sequent-map branch from 122bdf4 to 2d07e1b Compare January 17, 2026 19:57

MichaReiser approved these changes Jan 19, 2026

View reviewed changes

use a smallvec

30edce9

dcreager merged commit 3b5d0d5 into main Jan 20, 2026
49 checks passed

dcreager deleted the dcreager/lazy-sequent-map branch January 20, 2026 21:15

Conversation

dcreager commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astral-sh-bot bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astral-sh-bot bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

codspeed-hq bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 5.14%

Performance Changes

Footnotes

Uh oh!

MichaReiser Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaReiser commented Jan 15, 2026

Uh oh!

astral-sh-bot bot commented Jan 15, 2026

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichaReiser commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dcreager commented Jan 14, 2026 •

edited

Loading

astral-sh-bot bot commented Jan 14, 2026 •

edited

Loading

astral-sh-bot bot commented Jan 14, 2026 •

edited

Loading

`mypy_primer` results

codspeed-hq bot commented Jan 14, 2026 •

edited

Loading

MichaReiser Jan 15, 2026 •

edited

Loading

MichaReiser Jan 15, 2026 •

edited

Loading

MichaReiser Jan 15, 2026 •

edited

Loading

`ruff-ecosystem` results