Skip to content

[ty] Use distributed versions of AND and OR on constraint sets#22614

Merged
dcreager merged 8 commits intomainfrom
dcreager/distributed-ops
Jan 23, 2026
Merged

[ty] Use distributed versions of AND and OR on constraint sets#22614
dcreager merged 8 commits intomainfrom
dcreager/distributed-ops

Conversation

@dcreager
Copy link
Member

@dcreager dcreager commented Jan 16, 2026

There are some pathological examples where we create a constraint set which is the AND or OR of several smaller constraint sets. For example, when calling a function with many overloads, where the argument is a typevar, we create an OR of the typevar specializing to a type compatible with the respective parameter of each overload.

Most functions have a small number of overloads. But there are some examples of methods with 15-20 overloads (pydantic, numpy, our own auto-generated __getitem__ for large tuple literals). For those cases, it is helpful to be more clever about how we construct the final result.

Before, we would just step through the Iterator of elements and accumulate them into a result constraint set. That results in an O(n) number of calls to the underlying and or or operator — each of which might have to construct a large temporary BDD tree.

AND and OR are both associative, so we can do better! We now invoke the operator in a "tree" shape (described in more detail in the doc comment). We still have to perform the same number of calls, but more of the calls operate on smaller BDDs, resulting in a much smaller amount of overall work.

@dcreager dcreager added internal An internal refactor or improvement performance Potential performance improvement ty Multi-file analysis & type inference labels Jan 16, 2026
@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 16, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 16, 2026

mypy_primer results

Changes were detected when running on open source projects
pydantic (https://github.com/pydantic/pydantic)
- pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:949:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:989:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1032:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1072:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1115:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1154:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`
+ pydantic/fields.py:1194:5: error[invalid-parameter-default] Default value of type `PydanticUndefinedType` is not assignable to annotated parameter type `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`
- pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, Divergent] | ((dict[str, Divergent], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, Divergent], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`
+ pydantic/fields.py:1573:13: error[invalid-argument-type] Argument is incorrect: Expected `dict[str, int | float | str | ... omitted 3 union elements] | ((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) | None`, found `Top[dict[Unknown, Unknown]] | (((dict[str, int | float | str | ... omitted 3 union elements], /) -> None) & ~Top[dict[Unknown, Unknown]]) | None`

discord.py (https://github.com/Rapptz/discord.py)
- discord/app_commands/checks.py:390:42: error[invalid-assignment] Object of type `Coroutine[Any, Any, Cooldown | None] | Cooldown | None` is not assignable to `Cooldown | None`
- Found 539 diagnostics
+ Found 538 diagnostics

prefect (https://github.com/PrefectHQ/prefect)
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:461:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:461:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:535:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:535:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:610:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:610:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:685:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:685:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:760:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:760:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:835:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, None | Unknown]` is not awaitable
+ src/integrations/prefect-dbt/prefect_dbt/cli/commands.py:835:21: error[invalid-await] `Unknown | None | Coroutine[Any, Any, Unknown | None]` is not awaitable
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:94:28: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/integrations/prefect-dbt/prefect_dbt/core/settings.py:94:28: error[invalid-assignment] Object of type `T@resolve_block_document_references | int | dict[str, Any] | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/integrations/prefect-dbt/prefect_dbt/core/settings.py:99:28: error[invalid-assignment] Object of type `T@resolve_variables | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/integrations/prefect-dbt/prefect_dbt/core/settings.py:99:28: error[invalid-assignment] Object of type `int | T@resolve_variables | float | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/_internal/concurrency/api.py:83:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@call_soon_in_new_thread]`, found `() -> T@call_soon_in_new_thread | Awaitable[T@call_soon_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:87:16: error[invalid-return-type] Return type does not match returned value: expected `Call[T@call_soon_in_new_thread]`, found `Call[Awaitable[T@call_soon_in_new_thread]]`
- src/prefect/_internal/concurrency/api.py:99:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@call_soon_in_loop_thread]`, found `() -> T@call_soon_in_loop_thread | Awaitable[T@call_soon_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:103:16: error[invalid-return-type] Return type does not match returned value: expected `Call[T@call_soon_in_loop_thread]`, found `Call[Awaitable[T@call_soon_in_loop_thread]]`
- src/prefect/_internal/concurrency/api.py:137:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@wait_for_call_in_loop_thread]`, found `(() -> Awaitable[T@wait_for_call_in_loop_thread]) | Call[T@wait_for_call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:146:20: error[invalid-return-type] Return type does not match returned value: expected `T@wait_for_call_in_loop_thread`, found `Awaitable[T@wait_for_call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:154:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@wait_for_call_in_new_thread]`, found `(() -> T@wait_for_call_in_new_thread) | Call[T@wait_for_call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:160:16: error[invalid-return-type] Return type does not match returned value: expected `T@wait_for_call_in_new_thread`, found `Awaitable[T@wait_for_call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:166:46: error[invalid-argument-type] Argument to function `call_soon_in_new_thread` is incorrect: Expected `() -> Awaitable[T@call_in_new_thread]`, found `(() -> T@call_in_new_thread) | Call[T@call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:174:47: error[invalid-argument-type] Argument to function `call_soon_in_loop_thread` is incorrect: Expected `() -> Awaitable[T@call_in_loop_thread]`, found `(() -> Awaitable[T@call_in_loop_thread]) | Call[T@call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:189:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@wait_for_call_in_loop_thread]`, found `(() -> Awaitable[T@wait_for_call_in_loop_thread]) | Call[T@wait_for_call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:198:20: error[invalid-return-type] Return type does not match returned value: expected `T@wait_for_call_in_loop_thread`, found `Awaitable[T@wait_for_call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:206:29: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@wait_for_call_in_new_thread]`, found `(() -> T@wait_for_call_in_new_thread) | Call[T@wait_for_call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:212:16: error[invalid-return-type] Return type does not match returned value: expected `T@wait_for_call_in_new_thread`, found `Awaitable[T@wait_for_call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:219:46: error[invalid-argument-type] Argument to function `call_soon_in_new_thread` is incorrect: Expected `() -> Awaitable[T@call_in_new_thread]`, found `() -> T@call_in_new_thread | Awaitable[T@call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:220:16: error[invalid-return-type] Return type does not match returned value: expected `T@call_in_new_thread`, found `Awaitable[T@call_in_new_thread]`
- src/prefect/_internal/concurrency/api.py:230:33: error[invalid-argument-type] Argument to function `cast_to_call` is incorrect: Expected `() -> Awaitable[T@call_in_loop_thread]`, found `() -> T@call_in_loop_thread | Awaitable[T@call_in_loop_thread]`
- src/prefect/_internal/concurrency/api.py:233:47: error[invalid-argument-type] Argument to function `call_soon_in_loop_thread` is incorrect: Expected `() -> Awaitable[T@call_in_loop_thread]`, found `() -> T@call_in_loop_thread | Awaitable[T@call_in_loop_thread]`
- src/prefect/cli/deploy/_core.py:86:21: error[invalid-assignment] Object of type `T@resolve_block_document_references | dict[str, Any]` is not assignable to `dict[str, Any]`
+ src/prefect/cli/deploy/_core.py:86:21: error[invalid-assignment] Object of type `T@resolve_block_document_references | int | dict[str, Any] | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/cli/deploy/_core.py:87:21: error[invalid-assignment] Object of type `T@resolve_variables` is not assignable to `dict[str, Any]`
+ src/prefect/cli/deploy/_core.py:87:21: error[invalid-assignment] Object of type `int | T@resolve_variables | float | ... omitted 4 union elements` is not assignable to `dict[str, Any]`
- src/prefect/concurrency/_leases.py:89:53: error[invalid-argument-type] Argument to bound method `add_done_callback` is incorrect: Expected `(Future[CoroutineType[Any, Any, None]], /) -> object`, found `def handle_lease_renewal_failure(future: Future[None]) -> Unknown`
- src/prefect/deployments/runner.py:795:70: warning[possibly-missing-attribute] Attribute `__name__` may be missing on object of type `Unknown | (((...) -> Any) & ((*args: object, **kwargs: object) -> object))`
+ src/prefect/deployments/runner.py:795:70: warning[possibly-missing-attribute] Attribute `__name__` may be missing on object of type `Unknown | ((...) -> Any)`
- src/prefect/deployments/steps/core.py:137:38: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any]`
+ src/prefect/deployments/steps/core.py:137:38: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | int | dict[str, Any] | ... omitted 4 union elements`
+ src/prefect/flow_engine.py:812:32: error[invalid-await] `Unknown | R@FlowRunEngine | Coroutine[Any, Any, R@FlowRunEngine]` is not awaitable
+ src/prefect/flow_engine.py:1401:24: error[invalid-await] `Unknown | R@AsyncFlowRunEngine | Coroutine[Any, Any, R@AsyncFlowRunEngine]` is not awaitable
+ src/prefect/flow_engine.py:1482:43: error[invalid-argument-type] Argument to function `next` is incorrect: Expected `SupportsNext[Unknown]`, found `Unknown | R@run_generator_flow_sync`
+ src/prefect/flow_engine.py:1490:21: warning[possibly-missing-attribute] Attribute `throw` may be missing on object of type `Unknown | R@run_generator_flow_sync`
+ src/prefect/flow_engine.py:1524:44: warning[possibly-missing-attribute] Attribute `__anext__` may be missing on object of type `Unknown | R@run_generator_flow_async`
+ src/prefect/flow_engine.py:1531:25: warning[possibly-missing-attribute] Attribute `throw` may be missing on object of type `Unknown | R@run_generator_flow_async`
- src/prefect/flows.py:286:34: error[unresolved-attribute] Object of type `((**P@Flow) -> R@Flow) & ((*args: object, **kwargs: object) -> object)` has no attribute `__name__`
+ src/prefect/flows.py:286:34: error[unresolved-attribute] Object of type `(**P@Flow) -> R@Flow` has no attribute `__name__`
- src/prefect/flows.py:404:68: error[unresolved-attribute] Object of type `((**P@Flow) -> R@Flow) & ((*args: object, **kwargs: object) -> object)` has no attribute `__name__`
+ src/prefect/flows.py:404:68: error[unresolved-attribute] Object of type `(**P@Flow) -> R@Flow` has no attribute `__name__`
- src/prefect/flows.py:1750:53: warning[unused-ignore-comment] Unused blanket `type: ignore` directive
- src/prefect/utilities/asyncutils.py:198:16: error[invalid-return-type] Return type does not match returned value: expected `R@run_coro_as_sync | None`, found `CoroutineType[Any, Any, R@run_coro_as_sync | None]`
- src/prefect/utilities/asyncutils.py:207:20: error[invalid-return-type] Return type does not match returned value: expected `R@run_coro_as_sync | None`, found `CoroutineType[Any, Any, R@run_coro_as_sync | None]`
- src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `T@resolve_block_document_references | dict[str, Any]` on object of type `dict[str, Any]`
+ src/prefect/utilities/templating.py:320:13: error[invalid-assignment] Invalid subscript assignment with key of type `object` and value of type `T@resolve_block_document_references | int | dict[str, Any] | ... omitted 4 union elements` on object of type `dict[str, Any]`
- src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[Unknown | T@resolve_block_document_references | dict[str, Any]]`
+ src/prefect/utilities/templating.py:323:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_block_document_references | dict[str, Any]`, found `list[Unknown | T@resolve_block_document_references | int | ... omitted 5 union elements]`
- src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, Unknown | T@resolve_variables]`
+ src/prefect/utilities/templating.py:437:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `dict[object, Unknown | int | T@resolve_variables | ... omitted 5 union elements]`
- src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[Unknown | T@resolve_variables]`
+ src/prefect/utilities/templating.py:442:16: error[invalid-return-type] Return type does not match returned value: expected `T@resolve_variables`, found `list[Unknown | int | T@resolve_variables | ... omitted 5 union elements]`
- src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | dict[str, Any]`
+ src/prefect/workers/base.py:232:13: error[invalid-argument-type] Argument is incorrect: Expected `T@resolve_variables`, found `T@resolve_block_document_references | int | dict[str, Any] | ... omitted 4 union elements`
- src/prefect/workers/base.py:234:20: error[invalid-argument-type] Argument expression after ** must be a mapping type: Found `T@resolve_variables`
+ src/prefect/workers/base.py:234:20: error[invalid-argument-type] Argument expression after ** must be a mapping type: Found `int | T@resolve_variables | float | ... omitted 4 union elements`
- Found 5407 diagnostics
+ Found 5391 diagnostics

core (https://github.com/home-assistant/core)
- homeassistant/util/variance.py:47:12: error[invalid-return-type] Return type does not match returned value: expected `(**_P@ignore_variance) -> _R@ignore_variance`, found `_Wrapped[_P@ignore_variance, int | _R@ignore_variance | float | datetime, _P@ignore_variance, _R@ignore_variance | int | float | datetime]`
- Found 14470 diagnostics
+ Found 14469 diagnostics

No memory usage changes detected ✅

@dcreager dcreager force-pushed the dcreager/distributed-ops branch from 2fc3eac to 13ad151 Compare January 16, 2026 10:06
Comment on lines 721 to 723
/// In particular, we use the IDs that salsa assigns to each constraint as it is created. This
/// tends to ensure that constraints that are close to each other in the source are also close
/// to each other in the BDD structure.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, I'm updating the BDD variable ordering to be less clever. For the pathological example from #21902 this has a huge benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in a separate PR so that we better understand where the performance improvements are coming from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: #22777

(I have not addressed the other comments below yet; did this first to see what the performance looks like before considering a fallback for smaller vecs/etc)

Comment on lines +4027 to +4028
└─₀ <5> (U = str) 1/4
┡━₁ <2> SHARED
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also update the tree display to make it more obvious where we're sharing tree structure

@dcreager dcreager marked this pull request as ready for review January 16, 2026 10:09
@codspeed-hq
Copy link

codspeed-hq bot commented Jan 16, 2026

CodSpeed Performance Report

Merging this PR will not alter performance

Comparing dcreager/distributed-ops (003ea9b) with main (f516d47)

Summary

✅ 23 untouched benchmarks
⏩ 30 skipped benchmarks1

Footnotes

  1. 30 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@MichaReiser
Copy link
Member

From the summary, I expect this to improve performance and reduce memory usage. Both don't seem to be the case. Do you have an understanding where the performance regression comes from? Could we keep using the "old approach" when not dealing with many overloads?

}
}
result
let inputs: Vec<_> = self.map(|element| f(element).node).collect();
Copy link
Member

@MichaReiser MichaReiser Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be that the collect calls are expensive? Given that distributed_or and distributed_and are very small methods, would it make sense to pass the f through and apply the mapping in distributed_or/and?

Another alternative is to use a SmallVec instead. But I wonder if part of the perf regression simply comes from writing all the constraints to a vec

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed up a new version that doesn't collect into a vec first. I want to see how that affects the perf numbers; if it works well I plan to add some better documentation comments describing how it works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work well! Pushed up some documentation of the approach.

@dcreager
Copy link
Member Author

From the summary, I expect this to improve performance and reduce memory usage. Both don't seem to be the case. Do you have an understanding where the performance regression comes from? Could we keep using the "old approach" when not dealing with many overloads?

Me too! Maybe the collecting is the culprit, as you suggest? Or maybe because we're not short circuiting anymore? I have an idea that might help with both.

@AlexWaygood
Copy link
Member

AlexWaygood commented Jan 16, 2026

Wow, nice job getting this from a 5x perf regression to a 4% perf improvement!!

@dcreager dcreager force-pushed the dcreager/distributed-ops branch from 02ef331 to 7cf9fe2 Compare January 16, 2026 14:44
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

It might make sense to specialize distribute_or and distribute_and (or when_any) for &[T] and ExactSizeIterator as we see a perf regression on many projects (while small). Unless the perf regression is related to the ordering change. I suggest splitting that change into its own PR so that we have a better understanding where the regression is coming from.

}
}
result
let node = Node::distributed_or(db, self.map(|element| f(element).node));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR does improve the performance by a fair bit but it also regresses performance by about 1-2 on most other projects.

It would be lovely if we could use specialization to specialize when_any and when_all for ExactSizeIterator so that we could use the old implementation if there are only very few items. But, that's unlikely an option any time soon unless we migrate to nightly Rust.

I went through some when_any usages and:

  • We could implement when_any for &[T] and &FxOrderSet
  • We could add a when_any_exact method (or rename when_any to when_any_iter to advertise the ExactSizeIterator version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An ExactSizeIterator is required to return the exact size from size_hint, too, so I added a fallback that checks if the max size hint is <= 4, and uses the old implementation if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(That way I didn't have to worry about specialization or adding a new method for exact-sized things)

Comment on lines 721 to 723
/// In particular, we use the IDs that salsa assigns to each constraint as it is created. This
/// tends to ensure that constraints that are close to each other in the source are also close
/// to each other in the BDD structure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in a separate PR so that we better understand where the performance improvements are coming from?

@dcreager dcreager force-pushed the dcreager/distributed-ops branch 2 times, most recently from afc3e00 to ffd9920 Compare January 20, 2026 22:06
@ibraheemdev
Copy link
Member

Looks like the last commit regressed performance again?

///
/// ```text
/// linear: (((((a ∨ b) ∨ c) ∨ d) ∨ e) ∨ f) ∨ g
/// tree: ((a ∨ b) ∨ (c ∨ d)) ∨ ((e ∨ f) ∨ g)
Copy link
Member

@ibraheemdev ibraheemdev Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why this is algorithmically cheaper. If the constraint sets are all disjoint, this is equivalent (modulo node ordering, which we can really control here). The worst case is also equivalent. In any case, both BDD constructions operate on the same set of pairs of nodes — so we end up ORing two medium sized constraint sets instead of a large constraint set with a small one, which should end up being equivalent? I may be missing something here, given that the benchmarks agree this is an improvement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My analysis is admittedly a bit hand-wavy:

If the constraints are all disjoint, then the size of each constraint set (i.e. the number of internal nodes) grows linearly: |a| = 1, |ab| = 2, |abc| = 3, etc.

Each time we invoke a BDD operator, it costs something like O(m + n) time.

So with the linear shape (before this PR), you'd end up with a total cost of

(((((a ∨ b) ∨ c) ∨ d) ∨ e) ∨ f) ∨ g

ab        2
ab c      3
abc d     4
abcd e    5
abcde f   6
abcdef g  7
         27

and with the tree shape, you get

((a ∨ b) ∨ (c ∨ d)) ∨ ((e ∨ f) ∨ g)

ab        2
cd        2
ab cd     4
ef        2
ef g      3
abcd efg  7
         20

I think that works out to an overall cost of O(n^2) for the linear shape, and O(n log n) for the tree shape.

Copy link
Member

@ibraheemdev ibraheemdev Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I suspect there is something more nuanced here, which is that if a, b, c, d, ... are in source order (i.e., a is the lowest-order variable in the BDD ordering), then applying (((((a ∨ b) ∨ c) ∨ d) ∨ e) ∨ f) ∨ g is O(1) at each step (after #22777 and assuming disjointness). Even assuming an arbitrary order, the probability that the final step has to traverse the entire LHS is the probability that g is the lowest-order variable, while in the distributed approach, you are required to at least traverse one side of the tree entirely, and even the best case has quite low probability (that all the nodes on the LHS are higher-order than the RHS, or vice versa).

That being said, I'm happy to defer to the benchmark results here.

// number of elements seen so far. To do that, whenever the last two elements of the vector
// have the same depth, we apply the operator once to combine those two elements, adding
// the result back to the vector with an incremented depth. (That might let us combine the
// result with the _next_ intermediate result in the vector, and so on.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume the input constraint sets are of equal or similar size (though I suppose that is mostly true currently)? Should we try sorting by size here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming my analysis above is correct, I think you're right that sorting would give us a better guarantee of always doing the optimally cheapest ordering of operations. But my worry is that the cost of tracking/calculating the node size, and then doing the sort, would counteract any gain that we'd get.

@dcreager
Copy link
Member Author

Looks like the last commit regressed performance again?

When I try to reproduce the timing numbers, I don't get the same results as codspeed:

static_frame colour_science
main 998.6 3036.
always 982.4 2956.
smart2 998.2 3371.
smart4 1012.  3650.

"main" is the current main branch. "always" is this PR as of commit ffd9920 — that is, always applying the new tree-shaped traversal, and not falling back on the old logic for small iterator sizes. "smart2" and "smart4" have the fallback logic for iterators of length <= 4 and <= 2, respectively.

From my testing, "always" seems like a clear winner, at least for these two repos. I'm going to repush the PR to that state to see if maybe there was some weird temporary inconsistency in the codspeed results?

@dcreager dcreager force-pushed the dcreager/distributed-ops branch from 18f5b2a to 7c30063 Compare January 21, 2026 18:16
@dcreager dcreager force-pushed the dcreager/distributed-ops branch from 7c30063 to 003ea9b Compare January 21, 2026 18:48
@dcreager dcreager merged commit 2643fb0 into main Jan 23, 2026
49 checks passed
@dcreager dcreager deleted the dcreager/distributed-ops branch January 23, 2026 18:42
carljm added a commit that referenced this pull request Jan 30, 2026
* main: (62 commits)
  [`refurb`] Do not add `abc.ABC` if already present (`FURB180`) (#22234)
  [ty] Add a new `assert-type-unspellable-subtype` diagnostic (#22815)
  [ty] Avoid duplicate syntax errors for `await` outside functions (#22826)
  [ty] Fix unary operator false-positive for constrained TypeVars (#22783)
  [ty] Fix binary operator false-positive for constrained TypeVars (#22782)
  [ty] Fix false-positive `unsupported-operator` for "symmetric" TypeVars (#22756)
  [`pydocstyle`] Clarify which quote styles are allowed (`D300`) (#22825)
  [ty] Use distributed versions of AND and OR on constraint sets (#22614)
  [ty] Add support for dict literals and dict() calls as default values for parameters with TypedDict types (#22161)
  Document `-` stdin convention in CLI help text (#22817)
  [ty] Make `infer_subscript_expression_types` a method on `Type` (#22731)
  [ty] Simplify `OverloadLiteral::spans` and `OverloadLiteral::parameter_span` (#22823)
  [ty] Require both `*args` and `**kwargs` when calling a `ParamSpec` callable (#22820)
  [ty] Handle tagged errors in conformance (#22746)
  Add `--color` cli option to force colored output (#22806)
  Identify notebooks by LSP didOpen instead of `.ipynb` file extension (#22810)
  [ty] Fix docstring rendering for literal blocks after doctests (#22676)
  [ty] Update salsa to fix out-of-order query validation (#22498)
  [ty] Inline cycle initial and recovery functions (#22814)
  [ty] Pass the generic context through the decorator (#22544)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement performance Potential performance improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants