Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions crates/ty/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ toml = { workspace = true }

[features]
default = []
tdd-stats = ["ty_python_semantic/tdd-stats"]

[lints]
workspace = true
74 changes: 73 additions & 1 deletion crates/ty/docs/environment.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,79 @@ If set to `"1"` or `"true"`, ty will enable flamegraph profiling.
This creates a `tracing.folded` file that can be used to generate flame graphs
for performance analysis.

### `TY_TDD_STATS_REPORT`

Controls reporting of TDD (ternary decision diagram) size statistics after `ty check`.

This is a developer-focused diagnostic mode and is only available when ty is built
with the `tdd-stats` cargo feature.
Without this feature, no TDD stats collection code is compiled into the binary.

Accepted values:

- `0`: Disable TDD stats output (default when unset).
- `1` or `short`: Emit summary and per-file counts through tracing target `ty.tdd_stats`.
Includes both `reachability_*` and `narrowing_*` counters.
- `2` or `full`: Emit `1` output plus per-scope summaries (including histograms) and
hot-node diagnostics through tracing target `ty.tdd_stats`.

Values greater than `2` are treated as `2` (`full`).

Example:

```bash
TY_TDD_STATS_REPORT=1 TY_LOG=ty.tdd_stats=info cargo run -p ty --features tdd-stats -- check path/to/project
```

```bash
TY_TDD_STATS_REPORT=2 TY_LOG=ty.tdd_stats=info cargo run -p ty --features tdd-stats -- check path/to/project
```

For tracing filter syntax and logging tips, see [Tracing](./tracing.md).

#### How to read `tdd_stats_summary` and `tdd_stats_file`

`short` and `full` both emit project-level and per-file summary lines on the `ty.tdd_stats` target:

```text
INFO tdd_stats_summary verbose=... files=... max_root_nodes=... reachability_roots=... reachability_nodes=... reachability_max_depth=... narrowing_roots=... narrowing_nodes=... narrowing_max_depth=...
INFO tdd_stats_file file=... max_root_nodes=... reachability_roots=... reachability_nodes=... reachability_max_depth=... narrowing_roots=... narrowing_nodes=... narrowing_max_depth=...
```

Field meanings:

- `verbose`: Effective verbosity level (`1` for short, `2` for full).
- `files`: Number of analyzed files with non-empty stats (summary line only).
- `max_root_nodes`: Largest interior-node count among single roots in scope.
- `reachability_roots` / `narrowing_roots`: Unique root-constraint ID counts split by family.
- `reachability_nodes` / `narrowing_nodes`: Interior-node visits split by family.
- `reachability_max_depth` / `narrowing_max_depth`: Maximum TDD depth observed in each family.

#### How to read `tdd_stats_hot_node` (full mode)

In `full` mode, ty emits `tdd_stats_hot_node` lines on the `ty.tdd_stats` target:

```text
INFO tdd_stats_hot_node file=... scope_id=... kind=... subtree_nodes=... root_uses=... score=... roots=...
```

Field meanings:

- `kind`: Which root family this hotspot is attributed to (`reachability` or `narrowing`).
- `subtree_nodes`: Number of interior nodes reachable from `constraint` (subtree size).
- `root_uses`: Number of root constraints whose TDD includes this interior node.
- `score`: Hotness score, computed as `subtree_nodes * root_uses`.
- `roots`: Up to five sample roots that include this node.
- `line:column` means source location was resolved from an AST node.
- `unknown` is fallback when source location could not be resolved.

Practical interpretation:

- Higher `score` means a larger subtree reused by many roots, hence a likely hotspot.
- If multiple top rows share very similar `roots`, they are often one clustered hotspot, not unrelated issues.
- Use `subtree_nodes` to spot deep/large structures and `root_uses` to spot broad fanout; both can dominate runtime.
- In `short` mode, compare `reachability_*` vs `narrowing_*` first to decide which family to investigate in `full` mode.

### `TY_MAX_PARALLELISM`

Specifies an upper limit for the number of tasks ty is allowed to run in parallel.
Expand Down Expand Up @@ -84,4 +157,3 @@ Path to user-level configuration directory on Unix systems.
### `_CONDA_ROOT`

Used to determine the root install path of Conda.

7 changes: 7 additions & 0 deletions crates/ty/docs/mypy_primer.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ mypy_primer \
This will show the diagnostics diff for the `black` project between the `main` branch and your `my/feature` branch. To run the
diff for all projects we currently enable in CI, use `--project-selector "/($(paste -s -d'|' crates/ty_python_semantic/resources/primer/good.txt))\$"`.

If you're investigating performance regressions, you can also enable TDD stats while running `mypy_primer`
by setting `TY_TDD_STATS_REPORT` and `TY_LOG=ty.tdd_stats=info` in the environment (for `tdd-stats` builds).
For baseline comparisons, `TY_TDD_STATS_REPORT=1` is usually easiest to diff.
Switch to `TY_TDD_STATS_REPORT=2` when you need scope-level histograms and hot-node details.
(`short`/`full` are aliases for `1`/`2`.)
See [`TY_TDD_STATS_REPORT`](./environment.md#ty_tdd_stats_report) and [Tracing](./tracing.md) for details.

You can also take a look at the [full list of ecosystem projects]. Note that some of them might still need a `ty_paths` configuration
option to work correctly.

Expand Down
21 changes: 21 additions & 0 deletions crates/ty/docs/tracing.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,27 @@ whether one if its children has the file `x.py`.
**Note**: Salsa currently logs the entire memoized values. In our case, the source text and parsed AST.
This very quickly leads to extremely long outputs.

#### Show TDD stats traces

`tdd-stats` builds can emit TDD-size diagnostics on the `ty.tdd_stats` tracing target.
This is useful for analyzing TDD blowups and hot nodes.

```bash
TY_TDD_STATS_REPORT=2 TY_LOG=ty.tdd_stats=info cargo run -p ty --features tdd-stats -- check path/to/project
```

For quick regressions checks, start with:

```bash
TY_TDD_STATS_REPORT=1 TY_LOG=ty.tdd_stats=info cargo run -p ty --features tdd-stats -- check path/to/project
```

`short` mode reports `reachability_*` and `narrowing_*` counters, which makes old/new diffs easier to scan.
Use `full` when you need per-scope histograms and `tdd_stats_hot_node` output.
(`short`/`full` are aliases for `1`/`2`.)

See [`TY_TDD_STATS_REPORT`](./environment.md#ty_tdd_stats_report) for modes and output interpretation.

## Tracing and Salsa

Be mindful about using `tracing` in Salsa queries, especially when using `warn` or `error` because it isn't guaranteed
Expand Down
230 changes: 230 additions & 0 deletions crates/ty/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ use ty_project::metadata::settings::TerminalSettings;
use ty_project::watch::ProjectWatcher;
use ty_project::{CollectReporter, Db, suppress_all_diagnostics, watch};
use ty_project::{ProjectDatabase, ProjectMetadata};
#[cfg(feature = "tdd-stats")]
use ty_python_semantic::semantic_index::tdd_stats_for_file;
use ty_server::run_server;
use ty_static::EnvVars;

Expand Down Expand Up @@ -179,6 +181,10 @@ fn run_check(args: CheckCommand) -> anyhow::Result<ExitStatus> {
}
Err(_) => {}
}
drop(stdout);

#[cfg(feature = "tdd-stats")]
write_tdd_stats_report(&db, printer);

std::mem::forget(db);

Expand All @@ -189,6 +195,230 @@ fn run_check(args: CheckCommand) -> anyhow::Result<ExitStatus> {
}
}

#[cfg(feature = "tdd-stats")]
#[derive(Copy, Clone, Debug, Eq, PartialEq)]
enum TddStatsReportMode {
Off = 0,
Short,
Full,
}

#[cfg(feature = "tdd-stats")]
impl TddStatsReportMode {
const MAX_LEVEL: u8 = TddStatsReportMode::Full as u8;

const fn from_verbose(verbose: u8) -> Self {
match verbose {
0 => TddStatsReportMode::Off,
1 => TddStatsReportMode::Short,
_ => TddStatsReportMode::Full,
}
}

const fn verbose(self) -> u8 {
self as u8
}
}

#[cfg(feature = "tdd-stats")]
fn tdd_stats_report_mode() -> TddStatsReportMode {
match std::env::var(EnvVars::TY_TDD_STATS_REPORT) {
Ok(raw) => {
let raw = raw.trim();
if raw.eq_ignore_ascii_case("short") {
return TddStatsReportMode::Short;
}

if raw.eq_ignore_ascii_case("full") {
return TddStatsReportMode::Full;
}

match raw.parse::<u8>() {
Ok(verbose) if verbose <= TddStatsReportMode::MAX_LEVEL => {
TddStatsReportMode::from_verbose(verbose)
}
Ok(verbose) => {
tracing::warn!(
"Value for `TY_TDD_STATS_REPORT` is capped at {} (full), got `{verbose}`.",
TddStatsReportMode::MAX_LEVEL
);
TddStatsReportMode::Full
}
Err(_) => {
tracing::warn!(
"Unknown value for `TY_TDD_STATS_REPORT`: `{raw}`. Valid values are `0`, `1`, `2`, `short`, and `full`."
);
TddStatsReportMode::Off
}
}
}
Err(_) => TddStatsReportMode::Off,
}
}

#[cfg(feature = "tdd-stats")]
fn write_tdd_stats_report(db: &ProjectDatabase, _printer: Printer) {
use ty_python_semantic::semantic_index::tdd_stats::FileTddStatsSummary;

enum FileSummaryIter<'a> {
Full(std::slice::Iter<'a, FileTddStatsSummary>),
Short(std::iter::Take<std::slice::Iter<'a, FileTddStatsSummary>>),
}

impl<'a> Iterator for FileSummaryIter<'a> {
type Item = &'a FileTddStatsSummary;

fn next(&mut self) -> Option<Self::Item> {
match self {
FileSummaryIter::Full(iter) => iter.next(),
FileSummaryIter::Short(iter) => iter.next(),
}
}
}

let mode = tdd_stats_report_mode();
if matches!(mode, TddStatsReportMode::Off) {
return;
}

let project = db.project();
let files = project.files(db);

let mut summaries = Vec::new();
for file in &files {
if !project.should_check_file(db, file) {
continue;
}
let summary = tdd_stats_for_file(db, file);
if summary.total_roots > 0 {
summaries.push(summary);
}
}

if summaries.is_empty() {
return;
}

summaries.sort();
let max_root_nodes = summaries
.iter()
.map(|summary| summary.max_interior_nodes)
.max()
.unwrap_or(0);
let tdd_pool_nodes: usize = summaries.iter().map(|summary| summary.tdd_pool_nodes).sum();
let tdd_pool_roots: usize = summaries.iter().map(|summary| summary.tdd_pool_roots).sum();
let reachability_roots: usize = summaries
.iter()
.map(|summary| summary.reachability_roots)
.sum();
let reachability_interior_nodes: usize = summaries
.iter()
.map(|summary| summary.reachability_interior_nodes)
.sum();
let reachability_max_depth = summaries
.iter()
.map(|summary| summary.reachability_max_depth)
.max()
.unwrap_or(0);
let narrowing_roots: usize = summaries
.iter()
.map(|summary| summary.narrowing_roots)
.sum();
let narrowing_interior_nodes: usize = summaries
.iter()
.map(|summary| summary.narrowing_interior_nodes)
.sum();
let narrowing_max_depth = summaries
.iter()
.map(|summary| summary.narrowing_max_depth)
.max()
.unwrap_or(0);

tracing::info!(
target: "ty.tdd_stats",
verbose = mode.verbose(),
files = summaries.len(),
max_root_nodes,
tdd_pool_roots,
tdd_pool_nodes,
reachability_roots,
reachability_nodes = reachability_interior_nodes,
reachability_max_depth,
narrowing_roots,
narrowing_nodes = narrowing_interior_nodes,
narrowing_max_depth,
"tdd_stats_summary"
);

let is_full = mode.verbose() >= TddStatsReportMode::Full.verbose();
let file_summaries = if is_full {
FileSummaryIter::Full(summaries.iter())
} else {
FileSummaryIter::Short(summaries.iter().take(20))
};

for summary in file_summaries {
tracing::info!(
target: "ty.tdd_stats",
file = %summary.file_path,
max_root_nodes = summary.max_interior_nodes,
tdd_pool_roots = summary.tdd_pool_roots,
tdd_pool_nodes = summary.tdd_pool_nodes,
reachability_roots = summary.reachability_roots,
reachability_nodes = summary.reachability_interior_nodes,
reachability_max_depth = summary.reachability_max_depth,
narrowing_roots = summary.narrowing_roots,
narrowing_nodes = summary.narrowing_interior_nodes,
narrowing_max_depth = summary.narrowing_max_depth,
"tdd_stats_file"
);

if is_full {
let mut scopes = summary.scopes.clone();
scopes.sort();
for scope in &scopes {
let mut histogram = String::new();
for bin in &scope.histogram {
if !histogram.is_empty() {
histogram.push(' ');
}
let _ = write!(&mut histogram, "{}=>{}", bin.interior_nodes, bin.count);
}
tracing::info!(
target: "ty.tdd_stats",
file = %summary.file_path,
scope_id = scope.scope_id.as_u32(),
root_count = scope.root_count,
total_nodes = scope.total_interior_nodes,
max_root_nodes = scope.max_interior_nodes,
tdd_pool_roots = scope.tdd_pool_roots,
tdd_pool_nodes = scope.tdd_pool_nodes,
reachability_max_depth = scope.reachability_max_depth,
narrowing_max_depth = scope.narrowing_max_depth,
node_histogram = %histogram,
"tdd_stats_scope"
);

let mut hot_nodes = scope.hot_nodes.clone();
hot_nodes.sort();
for hot in hot_nodes.iter().take(20) {
tracing::info!(
target: "ty.tdd_stats",
file = %summary.file_path,
scope_id = scope.scope_id.as_u32(),
kind = hot.kind,
subtree_nodes = hot.subtree_interior_nodes,
root_uses = hot.root_uses,
score = hot.score,
roots = %hot.sample_roots.join(" | "),
"tdd_stats_hot_node"
);
}
}
}
}
}

#[derive(Copy, Clone)]
pub enum ExitStatus {
/// Checking was successful and there were no errors.
Expand Down
1 change: 1 addition & 0 deletions crates/ty_python_semantic/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ quickcheck_macros = { workspace = true }
schemars = ["dep:schemars", "dep:serde_json"]
serde = ["ruff_db/serde", "dep:serde", "ruff_python_ast/serde"]
testing = []
tdd-stats = []

[[test]]
name = "mdtest"
Expand Down
Loading
Loading