Skip to content

[ty] Optimize TDD atom ordering#20098

Merged
sharkdp merged 2 commits intomainfrom
david/fix-1091
Aug 27, 2025
Merged

[ty] Optimize TDD atom ordering#20098
sharkdp merged 2 commits intomainfrom
david/fix-1091

Conversation

@sharkdp
Copy link
Contributor

@sharkdp sharkdp commented Aug 26, 2025

Summary

While looking at some logging output that I added to ReachabilityConstraintBuilder::add_and_constraint in order to debug astral-sh/ty#1091, I noticed that it seemed to suggest that the TDD was built in an imbalanced way for code like the following, where we have a sequence of non-nested if conditions:

def f(t1, t2, t3, t4, …):
    x = 0
    if t1:
        x = 1
    if t2:
        x = 2
    if t3:
        x = 3
    if t4:
        x = 4

To understand this a bit better, I added some code to the ReachabilityConstraintBuilder to render the resulting TDD. On main, we get a tree that looks like the following, where you can see a pattern of N sub-trees that grow linearly with N (number of if statements). This results in an overall tree structure that has N² nodes (see graph below):

normal order

If we zoom in to one of these subgraphs, we can see what the problem is. When we add new constraints that represent combinations like t1 AND ~t2 AND ~t3 AND t4 AND …, they start with the evaluation of "early" conditions (t1, t2, …). This means that we have to create new subgraphs for each new if condition because there is little sharing with the previous structure. We evaluate the Boolean condition in a right-associative way: t1 AND (~t2 AND (~t3 AND t4))):

If we change the ordering of TDD atoms, we can change that to a left-associative evaluation: (((t1 AND ~t2) AND ~t3) AND t4) …. This means that we can re-use previous subgraphs (t1 AND ~t2), which results in a much more compact graph structure overall (note how "late" conditions are now at the top, and "early" conditions are further down in the graph):

reverse order

If we count the number of TDD nodes for a growing number if if statements, we can see that this change results in a slower growth. It's worth noting that the growth is still superlinear, though:

plot

On the actual code from the referenced ticket (the t_main.py file reduced to its main function, with the main function limited to 2000 lines instead of 11000 to allow the version on main to run to completion), the effect is much more dramatic. Instead of 26 million TDD nodes (main), we now only create 250 thousand (this branch), which is slightly less than 1%.

The change in this PR allows us to build the semantic index and type-check the problematic t_main.py file in astral-sh/ty#1091 in 9 seconds. This is still not great, but an obvious improvement compared to running out of memory after minutes of execution.

An open question remains whether this change is beneficial for all kinds of code patterns, or just this linear sequence of if statements. It does not seem unreasonable to think that referring to "earlier" conditions is generally a good idea, but I learned from Doug that it's generally not possible to find a TDD-construction heuristic that is non-pathological for all kinds of inputs. Fortunately, it seems like this change here results in performance improvements across all of our benchmarks, which should increase the confidence in this change:

Benchmark Improvement
hydra-zen +13%
DateType +5%
sympy (walltime) +4%
attrs +4%
pydantic (walltime) +2%
pandas (walltime) +2%
altair (walltime) +2%
static-frame +2%
anyio +1%
freqtrade +1%
colour-science +1%
tanjun +1%

closes astral-sh/ty#1091

@sharkdp sharkdp added the ty Multi-file analysis & type inference label Aug 26, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 26, 2025

Diagnostic diff on typing conformance tests

No changes detected when running ty on typing conformance tests ✅

@github-actions
Copy link
Contributor

github-actions bot commented Aug 26, 2025

mypy_primer results

No ecosystem changes detected ✅

Memory usage changes were detected when running on open source projects
sphinx (https://github.com/sphinx-doc/sphinx)
- TOTAL MEMORY USAGE: ~273MB
+ TOTAL MEMORY USAGE: ~260MB

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 26, 2025

CodSpeed Instrumentation Performance Report

Merging #20098 will improve performances by 13.81%

Comparing david/fix-1091 (8e988ae) with main (5663426)

Summary

⚡ 2 improvements
✅ 40 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
attrs 375.1 ms 360.4 ms +4.09%
hydra-zen 803.6 ms 706.1 ms +13.81%

@sharkdp sharkdp changed the title [ty] Reverse TDD node ordering [ty] Optimize TDD atom ordering Aug 27, 2025
@sharkdp sharkdp marked this pull request as ready for review August 27, 2025 10:16
@AlexWaygood AlexWaygood added the performance Potential performance improvement label Aug 27, 2025
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome investigation and results!

@AlexWaygood AlexWaygood removed their request for review August 27, 2025 16:14
Co-authored-by: Douglas Creager <dcreager@dcreager.net>
@sharkdp sharkdp merged commit 4b80f5f into main Aug 27, 2025
38 checks passed
@sharkdp sharkdp deleted the david/fix-1091 branch August 27, 2025 18:42
carljm added a commit to leandrobbraga/ruff that referenced this pull request Aug 27, 2025
* main:
  [`ruff`] Preserve relative whitespace in multi-line expressions (`RUF033`) (astral-sh#19647)
  [ty] Optimize TDD atom ordering (astral-sh#20098)
  [`airflow`] Extend `AIR311` and `AIR312` rules (astral-sh#20082)
  [ty] Preserve qualifiers when accessing attributes on unions/intersections (astral-sh#20114)
  [ty] Fix the inferred interface of specialized generic protocols (astral-sh#19866)
  [ty] Infer slightly more precise types for comprehensions (astral-sh#20111)
  [ty] Add more tests for protocols (astral-sh#20095)
  [ty] don't eagerly unpack aliases in user-authored unions (astral-sh#20055)
  [`flake8-use-pathlib`] Update links to the table showing the correspondence between `os` and `pathlib` (astral-sh#20103)
  [`flake8-use-pathlib`] Make `PTH100` fix unsafe because it can change behavior (astral-sh#20100)
  [`flake8-use-pathlib`] Delete unused `Rule::OsSymlink` enabled check (astral-sh#20099)
  [ty] Add search paths info to unresolved import diagnostics (astral-sh#20040)
  [`flake8-logging-format`] Add auto-fix for f-string logging calls (`G004`) (astral-sh#19303)
  Add a `ScopeKind` for the `__class__` cell (astral-sh#20048)
  Fix incorrect D413 links in docstrings convention FAQ (astral-sh#20089)
  [ty] Refactor inlay hints structure to use separate parts (astral-sh#20052)
@zanieb zanieb added the great writeup A wonderful example of a quality contribution label Aug 28, 2025
second-ed pushed a commit to second-ed/ruff that referenced this pull request Sep 9, 2025
## Summary

While looking at some logging output that I added to
`ReachabilityConstraintBuilder::add_and_constraint` in order to debug
astral-sh/ty#1091, I noticed that it seemed to
suggest that the TDD was built in an imbalanced way for code like the
following, where we have a sequence of non-nested `if` conditions:

```py
def f(t1, t2, t3, t4, …):
    x = 0
    if t1:
        x = 1
    if t2:
        x = 2
    if t3:
        x = 3
    if t4:
        x = 4
    …
```

To understand this a bit better, I added some code to the
`ReachabilityConstraintBuilder` to render the resulting TDD. On `main`,
we get a tree that looks like the following, where you can see a pattern
of N sub-trees that grow linearly with N (number of `if` statements).
This results in an overall tree structure that has N² nodes (see graph
below):

<img alt="normal order"
src="https://github.com/user-attachments/assets/aab40ce9-e82a-4fcd-823a-811f05f15f66"
/>

If we zoom in to one of these subgraphs, we can see what the problem is.
When we add new constraints that represent combinations like `t1 AND ~t2
AND ~t3 AND t4 AND …`, they start with the evaluation of "early"
conditions (`t1`, `t2`, …). This means that we have to create new
subgraphs for each new `if` condition because there is little sharing
with the previous structure. We evaluate the Boolean condition in a
right-associative way: `t1 AND (~t2 AND (~t3 AND t4)))`:

<img width="500" align="center"
src="https://github.com/user-attachments/assets/31ea7182-9e00-4975-83df-d980464f545d"
/>

If we change the ordering of TDD atoms, we can change that to a
left-associative evaluation: `(((t1 AND ~t2) AND ~t3) AND t4) …`. This
means that we can re-use previous subgraphs `(t1 AND ~t2)`, which
results in a much more compact graph structure overall (note how "late"
conditions are now at the top, and "early" conditions are further down
in the graph):

<img alt="reverse order"
src="https://github.com/user-attachments/assets/96a6b7c1-3d35-4192-a917-0b2d24c6b144"
/>

If we count the number of TDD nodes for a growing number if `if`
statements, we can see that this change results in a slower growth. It's
worth noting that the growth is still superlinear, though:

<img width="800" height="600" alt="plot"
src="https://github.com/user-attachments/assets/22e8394f-e74e-4a9e-9687-0d41f94f2303"
/>

On the actual code from the referenced ticket (the `t_main.py` file
reduced to its main function, with the main function limited to 2000
lines instead of 11000 to allow the version on `main` to run to
completion), the effect is much more dramatic. Instead of 26 million TDD
nodes (`main`), we now only create 250 thousand (this branch), which is
slightly less than 1%.

The change in this PR allows us to build the semantic index and
type-check the problematic `t_main.py` file in
astral-sh/ty#1091 in 9 seconds. This is still
not great, but an obvious improvement compared to running out of memory
after *minutes* of execution.

An open question remains whether this change is beneficial for all kinds
of code patterns, or just this linear sequence of `if` statements. It
does not seem unreasonable to think that referring to "earlier"
conditions is generally a good idea, but I learned from Doug that it's
generally not possible to find a TDD-construction heuristic that is
non-pathological for all kinds of inputs. Fortunately, it seems like
this change here results in performance improvements across *all of our
benchmarks*, which should increase the confidence in this change:

| Benchmark           | Improvement |
|---------------------|-------------------------|
| hydra-zen           | +13%                    |
| DateType            | +5%                     |
| sympy (walltime)    | +4%                     |
| attrs               | +4%                     |
| pydantic (walltime) | +2%                     |
| pandas (walltime)   | +2%                     |
| altair (walltime)   | +2%                     |
| static-frame        | +2%                     |
| anyio               | +1%                     |
| freqtrade           | +1%                     |
| colour-science      | +1%                     |
| tanjun              | +1%                     |

closes astral-sh/ty#1091

---------

Co-authored-by: Douglas Creager <dcreager@dcreager.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

great writeup A wonderful example of a quality contribution performance Potential performance improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OOM when trying to process Tauon's codebase

5 participants