[ty] Handle tagged errors in conformance by WillDuke · Pull Request #22746 · astral-sh/ruff

WillDuke · 2026-01-19T21:38:32Z

Summary

This PR adds support for tagged errors in the conformance suite, which may allow multiple errors or only one error on a single line depending on the presence of a "+" symbol in an error tag. Tags are collected from expected diagnostics and added to ty diagnostics on corresponding lines. Diagnostics are compared as groups by tag if present or by line.

Diagnostics matching tagged errors are checked to ensure errors were raised on the correct number of distinct lines.
This means that the classification doesn't penalize ty for raising multiple diagnostics on the same line even in cases where ty returns duplicate diagnostics.

All diagnostics associated with a given tag are rendered together in the details table, but the statistics table counts diagnostics individually.

I've also updated the render step so that tagged diagnostics and diagnostics raised on the same line are now shown in the same cell in the table. The benefit here (I think) is that you'll be able to see all of the diagnostics removed when a line that raises multiple false positives is fixed.

Test Plan

I ran the following locally:

uv run --no-project scripts/conformance.py --tests-path ../typing/conformance/ --old-ty uvx ty@0.0.6

Details

Typing conformance results improved 🎉

The percentage of diagnostics emitted that were expected errors increased from 73.76% to 76.88%. The percentage of expected errors that received a diagnostic increased from 63.93% to 68.92%.

Summary

Metric	Old	New	Diff	Outcome
True Positives	686	745	+59	⏫ (✅)
False Positives	244	224	-20	⏬ (✅)
False Negatives	387	336	-51	⏬ (✅)
Total Diagnostics	930	969	+39	⏫
Precision	73.76%	76.88%	+3.12%	⏫ (✅)
Recall	63.93%	68.92%	+4.98%	⏫ (✅)

True positives added

Location	Name	Message
aliases_explicit.py:79:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:80:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:81:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:82:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:83:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:84:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:85:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:86:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:88:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:89:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:90:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_explicit.py:91:1	invalid-type-form	Invalid right-hand side for `typing.TypeAlias` assignment
aliases_newtype.py:50:38	invalid-newtype	invalid base for `typing.NewType`: A `NewType` base cannot be generic
annotations_forward_refs.py:41:10	invalid-type-form	Function calls are not allowed in type expressions
annotations_forward_refs.py:42:10	invalid-type-form	List literals are not allowed in this context in a type expression: Did you mean `tuple[int, str]`?
annotations_forward_refs.py:43:10	invalid-type-form	Tuple literals are not allowed in this context in a type expression: Did you mean `tuple[int, str]`?
annotations_forward_refs.py:44:10	invalid-type-form	List comprehensions are not allowed in type expressions
annotations_forward_refs.py:45:10	invalid-type-form	Dict literals are not allowed in type expressions
annotations_forward_refs.py:46:10	invalid-type-form	Function calls are not allowed in type expressions
annotations_forward_refs.py:48:10	invalid-type-form	`if` expressions are not allowed in type expressions
annotations_forward_refs.py:50:11	invalid-type-form	Boolean literals are not allowed in this context in a type expression
annotations_forward_refs.py:51:11	invalid-type-form	Int literals are not allowed in this context in a type expression
annotations_forward_refs.py:52:11	invalid-type-form	Unary operations are not allowed in type expressions
annotations_forward_refs.py:53:11	invalid-type-form	Boolean operations are not allowed in type expressions
namedtuples_define_functional.py:16:8	missing-argument	No argument provided for required parameter `y`
namedtuples_define_functional.py:21:8	missing-argument	No arguments provided for required parameters `x`, `y`
namedtuples_define_functional.py:26:21	too-many-positional-arguments	Too many positional arguments: expected 3, got 4
namedtuples_define_functional.py:31:8 namedtuples_define_functional.py:31:18	missing-argument unknown-argument	No argument provided for required parameter `y` Argument `z` does not match any known parameter
namedtuples_define_functional.py:36:18	invalid-argument-type	Argument is incorrect: Expected `int`, found `Literal["1"]`
namedtuples_define_functional.py:37:21	too-many-positional-arguments	Too many positional arguments: expected 3, got 4
namedtuples_define_functional.py:42:18	invalid-argument-type	Argument is incorrect: Expected `int`, found `Literal["1"]`
namedtuples_define_functional.py:43:15	invalid-argument-type	Argument is incorrect: Expected `int`, found `float`
namedtuples_define_functional.py:69:1	missing-argument	No argument provided for required parameter `a`
namedtuples_usage.py:43:5	not-subscriptable	Cannot delete subscript on object of type `Point` with no `__delitem__` method
narrowing_typeguard.py:102:23	invalid-type-guard-definition	`TypeGuard` function must have a parameter to narrow
narrowing_typeguard.py:107:22	invalid-type-guard-definition	`TypeGuard` function must have a parameter to narrow
narrowing_typeguard.py:128:20	invalid-argument-type	Argument to function `takes_callable_str` is incorrect: Expected `(object, /) -> str`, found `def simple_typeguard(val: object) -> TypeGuard[int]`
narrowing_typeguard.py:148:26	invalid-argument-type	Argument to function `takes_callable_str_proto` is incorrect: Expected `CallableStrProto`, found `def simple_typeguard(val: object) -> TypeGuard[int]`
narrowing_typeis.py:105:23	invalid-type-guard-definition	`TypeIs` function must have a parameter to narrow
narrowing_typeis.py:110:22	invalid-type-guard-definition	`TypeIs` function must have a parameter to narrow
narrowing_typeis.py:169:17	invalid-argument-type	Argument to function `takes_typeguard` is incorrect: Expected `(object, /) -> TypeGuard[int]`, found `def is_int_typeis(val: object) -> TypeIs[int]`
narrowing_typeis.py:170:14	invalid-argument-type	Argument to function `takes_typeis` is incorrect: Expected `(object, /) -> TypeIs[int]`, found `def is_int_typeguard(val: object) -> TypeGuard[int]`
narrowing_typeis.py:195:27	invalid-type-guard-definition	Narrowed type `str` is not assignable to the declared parameter type `int`
narrowing_typeis.py:199:45	invalid-type-guard-definition	Narrowed type `list[int]` is not assignable to the declared parameter type `list[object]`
qualifiers_final_annotation.py:134:1 qualifiers_final_annotation.py:134:3	missing-argument unknown-argument	No arguments provided for required parameters `x`, `y` Argument `a` does not match any known parameter
qualifiers_final_annotation.py:135:3 qualifiers_final_annotation.py:135:9	invalid-argument-type invalid-argument-type	Argument is incorrect: Expected `int`, found `Literal[""]` Argument is incorrect: Expected `int`, found `Literal[""]`
typeddicts_class_syntax.py:29:5	invalid-typed-dict-statement	TypedDict class cannot have methods
typeddicts_class_syntax.py:33:5	invalid-typed-dict-statement	TypedDict class cannot have methods
typeddicts_class_syntax.py:38:5	invalid-typed-dict-statement	TypedDict class cannot have methods
typeddicts_extra_items.py:128:15	invalid-argument-type	Cannot delete required key "name" from TypedDict `MovieEI`
typeddicts_operations.py:49:11	invalid-argument-type	Cannot delete required key "name" from TypedDict `Movie`

False positives removed

Location	Name	Message
constructors_call_init.py:25:1	type-assertion-failure	Argument does not have asserted type `Class1[int \| float]`
constructors_call_init.py:75:1	type-assertion-failure	Argument does not have asserted type `Class5[int \| float]`
constructors_call_new.py:24:1	type-assertion-failure	Argument does not have asserted type `Class1[int \| float]`
namedtuples_define_class.py:121:1	type-assertion-failure	Argument does not have asserted type `Property[int \| float]`
namedtuples_define_class.py:122:1	type-assertion-failure	Argument does not have asserted type `int \| float`
namedtuples_define_class.py:123:1	type-assertion-failure	Argument does not have asserted type `int \| float`
narrowing_typeguard.py:17:9	type-assertion-failure	Argument does not have asserted type `tuple[str, str]`
narrowing_typeguard.py:32:9	type-assertion-failure	Argument does not have asserted type `set[int]`
narrowing_typeguard.py:69:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeguard.py:73:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeguard.py:77:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeguard.py:81:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeguard.py:85:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeguard.py:89:9	type-assertion-failure	Argument does not have asserted type `B`
narrowing_typeguard.py:93:9	type-assertion-failure	Argument does not have asserted type `B`
narrowing_typeis.py:72:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeis.py:76:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeis.py:80:9	type-assertion-failure	Argument does not have asserted type `int`
narrowing_typeis.py:92:9	type-assertion-failure	Argument does not have asserted type `B`
narrowing_typeis.py:96:9	type-assertion-failure	Argument does not have asserted type `B`

Optional Diagnostics Added

Location	Name	Message
namedtuples_define_functional.py:52:25	invalid-named-tuple	Duplicate field name `a` in `namedtuple()`: Field `a` already defined; will raise `ValueError` at runtime
namedtuples_define_functional.py:53:25	invalid-named-tuple	Field name `def` in `namedtuple()` cannot be a Python keyword: Will raise `ValueError` at runtime
namedtuples_define_functional.py:54:25	invalid-named-tuple	Field name `def` in `namedtuple()` cannot be a Python keyword: Will raise `ValueError` at runtime
namedtuples_define_functional.py:55:25	invalid-named-tuple	Field name `_d` in `namedtuple()` cannot start with an underscore: Will raise `ValueError` at runtime

astral-sh-bot · 2026-01-19T21:40:18Z

Typing conformance results

No changes detected ✅

astral-sh-bot · 2026-01-19T21:50:26Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

WillDuke · 2026-01-21T11:38:44Z

@AlexWaygood @MichaReiser My first pass at this was a little buggy and overcomplicated, but I think that it is in a bit better shape now!

MichaReiser

Thank you. This overall makes sense to me. I've a few small nit comments

scripts/conformance.py

Co-authored-by: Micha Reiser <micha@reiser.io>

MichaReiser

Amazing, thank you

scripts/conformance.py

MichaReiser · 2026-01-22T08:05:55Z

scripts/conformance.py

+            case _:
+                raise ValueError(f"Invalid source: {source}")


I'm surprised that this last case is needed here? Does ty complain without it?

Just defensive programming while I was writing this!

scripts/conformance.py

WillDuke · 2026-01-22T21:40:15Z

It occurred to me that we can count the diagnostics properly and still render the results together so that you can get the full context for a tagged classification. If a tagged error group has diagnostics on more lines than allowed, I count the line with the most diagnostics towards the true positives and label the rest as false positives. Now the number of true positives and false positives add up to the total diagnostics, and the summary sentence is more coherent.

WillDuke · 2026-01-22T21:54:05Z

I've also updated the table to count optional diagnostics as true positives where present and true negatives where absent. Happily, the total diagnostics in the summary table is now the same as the length of the JSON array output from ty after filtering out warnings.

MichaReiser · 2026-01-23T14:43:12Z

scripts/conformance.py



+@dataclass(kw_only=True, slots=True)
+class Evaluation:


This class feels pretty heavy only to support the case where one group has both true and false positives.

I was wondering if we could change classify to return an iterable of (Classification, int) instead. Most groups return exactly one, with the exception of the many case where ty emits too many diagnostics, in which case we return two.

With the last set of changes, we now count the diagnostics individually. So if ty emits 5 diagnostics on the same line where a "# E" is present, we're counting them all as true positives. Similarly, if ty raises 3 diagnostics on one line of a tagged group (no '+') and 1 on each of the other lines, we count the 3 diagnostics as true positives and the remainder as false positives.

Happy to keep iterating on it though if this doesn't make sense.

* main: (62 commits) [`refurb`] Do not add `abc.ABC` if already present (`FURB180`) (#22234) [ty] Add a new `assert-type-unspellable-subtype` diagnostic (#22815) [ty] Avoid duplicate syntax errors for `await` outside functions (#22826) [ty] Fix unary operator false-positive for constrained TypeVars (#22783) [ty] Fix binary operator false-positive for constrained TypeVars (#22782) [ty] Fix false-positive `unsupported-operator` for "symmetric" TypeVars (#22756) [`pydocstyle`] Clarify which quote styles are allowed (`D300`) (#22825) [ty] Use distributed versions of AND and OR on constraint sets (#22614) [ty] Add support for dict literals and dict() calls as default values for parameters with TypedDict types (#22161) Document `-` stdin convention in CLI help text (#22817) [ty] Make `infer_subscript_expression_types` a method on `Type` (#22731) [ty] Simplify `OverloadLiteral::spans` and `OverloadLiteral::parameter_span` (#22823) [ty] Require both `*args` and `**kwargs` when calling a `ParamSpec` callable (#22820) [ty] Handle tagged errors in conformance (#22746) Add `--color` cli option to force colored output (#22806) Identify notebooks by LSP didOpen instead of `.ipynb` file extension (#22810) [ty] Fix docstring rendering for literal blocks after doctests (#22676) [ty] Update salsa to fix out-of-order query validation (#22498) [ty] Inline cycle initial and recovery functions (#22814) [ty] Pass the generic context through the decorator (#22544) ...

WillDuke added 2 commits January 18, 2026 12:43

wip

e8777ca

[ty] Handle tagged errors in conformance

8efd65d

AlexWaygood added ci Related to internal CI tooling ty Multi-file analysis & type inference labels Jan 19, 2026

AlexWaygood requested review from AlexWaygood and MichaReiser and removed request for MichaReiser January 20, 2026 12:46

WillDuke marked this pull request as draft January 20, 2026 20:20

clean up and fix false positives bug

6e31fc8

WillDuke marked this pull request as ready for review January 21, 2026 09:21

WillDuke added 2 commits January 21, 2026 09:42

use distinct lines rather than the number of diagnostics

5933ec1

use new classification

8145bb3

MichaReiser approved these changes Jan 21, 2026

View reviewed changes

WillDuke and others added 6 commits January 21, 2026 17:19

format nested if-else expression

7d8f966

Co-authored-by: Micha Reiser <micha@reiser.io>

iterate through group once

36a59ea

remove unused code in compute_stats

06b9da5

remove None from diagnostic types

5755034

use a match in compute_stats

5fdaa6b

improve readability of classify

1e9d33d

MichaReiser approved these changes Jan 22, 2026

View reviewed changes

count diagnostics properly

b158965

count optional diagnostics

6e88957

MichaReiser reviewed Jan 23, 2026

View reviewed changes

MichaReiser merged commit 58bffa4 into astral-sh:main Jan 23, 2026
42 checks passed

Conversation

WillDuke commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Typing conformance results improved 🎉

Summary

True positives added

False positives removed

Optional Diagnostics Added

Uh oh!

astral-sh-bot bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Typing conformance results

Uh oh!

astral-sh-bot bot commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

WillDuke commented Jan 21, 2026

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichaReiser Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

WillDuke Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

WillDuke commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WillDuke commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

WillDuke Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WillDuke commented Jan 19, 2026 •

edited

Loading

astral-sh-bot bot commented Jan 19, 2026 •

edited

Loading

astral-sh-bot bot commented Jan 19, 2026 •

edited

Loading

`ruff-ecosystem` results

WillDuke commented Jan 22, 2026 •

edited

Loading

WillDuke commented Jan 22, 2026 •

edited

Loading