Skip to content

[ty] Handle tagged errors in conformance#22746

Merged
MichaReiser merged 13 commits intoastral-sh:mainfrom
WillDuke:wld/handle-tagged-errors-conformance
Jan 23, 2026
Merged

[ty] Handle tagged errors in conformance#22746
MichaReiser merged 13 commits intoastral-sh:mainfrom
WillDuke:wld/handle-tagged-errors-conformance

Conversation

@WillDuke
Copy link
Contributor

@WillDuke WillDuke commented Jan 19, 2026

Summary

This PR adds support for tagged errors in the conformance suite, which may allow multiple errors or only one error on a single line depending on the presence of a "+" symbol in an error tag. Tags are collected from expected diagnostics and added to ty diagnostics on corresponding lines. Diagnostics are compared as groups by tag if present or by line.

Diagnostics matching tagged errors are checked to ensure errors were raised on the correct number of distinct lines.
This means that the classification doesn't penalize ty for raising multiple diagnostics on the same line even in cases where ty returns duplicate diagnostics.

All diagnostics associated with a given tag are rendered together in the details table, but the statistics table counts diagnostics individually.

I've also updated the render step so that tagged diagnostics and diagnostics raised on the same line are now shown in the same cell in the table. The benefit here (I think) is that you'll be able to see all of the diagnostics removed when a line that raises multiple false positives is fixed.

Test Plan

I ran the following locally:

uv run --no-project scripts/conformance.py --tests-path ../typing/conformance/ --old-ty uvx ty@0.0.6
Details

Typing conformance results improved 🎉

The percentage of diagnostics emitted that were expected errors increased from 73.76% to 76.88%. The percentage of expected errors that received a diagnostic increased from 63.93% to 68.92%.

Summary

Metric Old New Diff Outcome
True Positives 686 745 +59 ⏫ (✅)
False Positives 244 224 -20 ⏬ (✅)
False Negatives 387 336 -51 ⏬ (✅)
Total Diagnostics 930 969 +39
Precision 73.76% 76.88% +3.12% ⏫ (✅)
Recall 63.93% 68.92% +4.98% ⏫ (✅)

True positives added

Location Name Message
aliases_explicit.py:79:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:80:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:81:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:82:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:83:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:84:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:85:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:86:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:88:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:89:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:90:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_explicit.py:91:1 invalid-type-form Invalid right-hand side for typing.TypeAlias assignment
aliases_newtype.py:50:38 invalid-newtype invalid base for typing.NewType: A NewType base cannot be generic
annotations_forward_refs.py:41:10 invalid-type-form Function calls are not allowed in type expressions
annotations_forward_refs.py:42:10 invalid-type-form List literals are not allowed in this context in a type expression: Did you mean tuple[int, str]?
annotations_forward_refs.py:43:10 invalid-type-form Tuple literals are not allowed in this context in a type expression: Did you mean tuple[int, str]?
annotations_forward_refs.py:44:10 invalid-type-form List comprehensions are not allowed in type expressions
annotations_forward_refs.py:45:10 invalid-type-form Dict literals are not allowed in type expressions
annotations_forward_refs.py:46:10 invalid-type-form Function calls are not allowed in type expressions
annotations_forward_refs.py:48:10 invalid-type-form if expressions are not allowed in type expressions
annotations_forward_refs.py:50:11 invalid-type-form Boolean literals are not allowed in this context in a type expression
annotations_forward_refs.py:51:11 invalid-type-form Int literals are not allowed in this context in a type expression
annotations_forward_refs.py:52:11 invalid-type-form Unary operations are not allowed in type expressions
annotations_forward_refs.py:53:11 invalid-type-form Boolean operations are not allowed in type expressions
namedtuples_define_functional.py:16:8 missing-argument No argument provided for required parameter y
namedtuples_define_functional.py:21:8 missing-argument No arguments provided for required parameters x, y
namedtuples_define_functional.py:26:21 too-many-positional-arguments Too many positional arguments: expected 3, got 4
namedtuples_define_functional.py:31:8
namedtuples_define_functional.py:31:18
missing-argument
unknown-argument
No argument provided for required parameter y
Argument z does not match any known parameter
namedtuples_define_functional.py:36:18 invalid-argument-type Argument is incorrect: Expected int, found Literal["1"]
namedtuples_define_functional.py:37:21 too-many-positional-arguments Too many positional arguments: expected 3, got 4
namedtuples_define_functional.py:42:18 invalid-argument-type Argument is incorrect: Expected int, found Literal["1"]
namedtuples_define_functional.py:43:15 invalid-argument-type Argument is incorrect: Expected int, found float
namedtuples_define_functional.py:69:1 missing-argument No argument provided for required parameter a
namedtuples_usage.py:43:5 not-subscriptable Cannot delete subscript on object of type Point with no __delitem__ method
narrowing_typeguard.py:102:23 invalid-type-guard-definition TypeGuard function must have a parameter to narrow
narrowing_typeguard.py:107:22 invalid-type-guard-definition TypeGuard function must have a parameter to narrow
narrowing_typeguard.py:128:20 invalid-argument-type Argument to function takes_callable_str is incorrect: Expected (object, /) -> str, found def simple_typeguard(val: object) -> TypeGuard[int]
narrowing_typeguard.py:148:26 invalid-argument-type Argument to function takes_callable_str_proto is incorrect: Expected CallableStrProto, found def simple_typeguard(val: object) -> TypeGuard[int]
narrowing_typeis.py:105:23 invalid-type-guard-definition TypeIs function must have a parameter to narrow
narrowing_typeis.py:110:22 invalid-type-guard-definition TypeIs function must have a parameter to narrow
narrowing_typeis.py:169:17 invalid-argument-type Argument to function takes_typeguard is incorrect: Expected (object, /) -> TypeGuard[int], found def is_int_typeis(val: object) -> TypeIs[int]
narrowing_typeis.py:170:14 invalid-argument-type Argument to function takes_typeis is incorrect: Expected (object, /) -> TypeIs[int], found def is_int_typeguard(val: object) -> TypeGuard[int]
narrowing_typeis.py:195:27 invalid-type-guard-definition Narrowed type str is not assignable to the declared parameter type int
narrowing_typeis.py:199:45 invalid-type-guard-definition Narrowed type list[int] is not assignable to the declared parameter type list[object]
qualifiers_final_annotation.py:134:1
qualifiers_final_annotation.py:134:3
missing-argument
unknown-argument
No arguments provided for required parameters x, y
Argument a does not match any known parameter
qualifiers_final_annotation.py:135:3
qualifiers_final_annotation.py:135:9
invalid-argument-type
invalid-argument-type
Argument is incorrect: Expected int, found Literal[""]
Argument is incorrect: Expected int, found Literal[""]
typeddicts_class_syntax.py:29:5 invalid-typed-dict-statement TypedDict class cannot have methods
typeddicts_class_syntax.py:33:5 invalid-typed-dict-statement TypedDict class cannot have methods
typeddicts_class_syntax.py:38:5 invalid-typed-dict-statement TypedDict class cannot have methods
typeddicts_extra_items.py:128:15 invalid-argument-type Cannot delete required key "name" from TypedDict MovieEI
typeddicts_operations.py:49:11 invalid-argument-type Cannot delete required key "name" from TypedDict Movie

False positives removed

Location Name Message
constructors_call_init.py:25:1 type-assertion-failure Argument does not have asserted type Class1[int | float]
constructors_call_init.py:75:1 type-assertion-failure Argument does not have asserted type Class5[int | float]
constructors_call_new.py:24:1 type-assertion-failure Argument does not have asserted type Class1[int | float]
namedtuples_define_class.py:121:1 type-assertion-failure Argument does not have asserted type Property[int | float]
namedtuples_define_class.py:122:1 type-assertion-failure Argument does not have asserted type int | float
namedtuples_define_class.py:123:1 type-assertion-failure Argument does not have asserted type int | float
narrowing_typeguard.py:17:9 type-assertion-failure Argument does not have asserted type tuple[str, str]
narrowing_typeguard.py:32:9 type-assertion-failure Argument does not have asserted type set[int]
narrowing_typeguard.py:69:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeguard.py:73:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeguard.py:77:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeguard.py:81:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeguard.py:85:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeguard.py:89:9 type-assertion-failure Argument does not have asserted type B
narrowing_typeguard.py:93:9 type-assertion-failure Argument does not have asserted type B
narrowing_typeis.py:72:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeis.py:76:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeis.py:80:9 type-assertion-failure Argument does not have asserted type int
narrowing_typeis.py:92:9 type-assertion-failure Argument does not have asserted type B
narrowing_typeis.py:96:9 type-assertion-failure Argument does not have asserted type B

Optional Diagnostics Added

Location Name Message
namedtuples_define_functional.py:52:25 invalid-named-tuple Duplicate field name a in namedtuple(): Field a already defined; will raise ValueError at runtime
namedtuples_define_functional.py:53:25 invalid-named-tuple Field name def in namedtuple() cannot be a Python keyword: Will raise ValueError at runtime
namedtuples_define_functional.py:54:25 invalid-named-tuple Field name def in namedtuple() cannot be a Python keyword: Will raise ValueError at runtime
namedtuples_define_functional.py:55:25 invalid-named-tuple Field name _d in namedtuple() cannot start with an underscore: Will raise ValueError at runtime

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 19, 2026

Typing conformance results

No changes detected ✅

@astral-sh-bot
Copy link

astral-sh-bot bot commented Jan 19, 2026

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood added ci Related to internal CI tooling ty Multi-file analysis & type inference labels Jan 19, 2026
@AlexWaygood AlexWaygood requested review from AlexWaygood and MichaReiser and removed request for MichaReiser January 20, 2026 12:46
@WillDuke WillDuke marked this pull request as draft January 20, 2026 20:20
@WillDuke WillDuke marked this pull request as ready for review January 21, 2026 09:21
@WillDuke
Copy link
Contributor Author

@AlexWaygood @MichaReiser My first pass at this was a little buggy and overcomplicated, but I think that it is in a bit better shape now!

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This overall makes sense to me. I've a few small nit comments

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you

Comment on lines 259 to 260
case _:
raise ValueError(f"Invalid source: {source}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that this last case is needed here? Does ty complain without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just defensive programming while I was writing this!

@WillDuke
Copy link
Contributor Author

WillDuke commented Jan 22, 2026

It occurred to me that we can count the diagnostics properly and still render the results together so that you can get the full context for a tagged classification. If a tagged error group has diagnostics on more lines than allowed, I count the line with the most diagnostics towards the true positives and label the rest as false positives. Now the number of true positives and false positives add up to the total diagnostics, and the summary sentence is more coherent.

@WillDuke
Copy link
Contributor Author

WillDuke commented Jan 22, 2026

I've also updated the table to count optional diagnostics as true positives where present and true negatives where absent. Happily, the total diagnostics in the summary table is now the same as the length of the JSON array output from ty after filtering out warnings.



@dataclass(kw_only=True, slots=True)
class Evaluation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class feels pretty heavy only to support the case where one group has both true and false positives.

I was wondering if we could change classify to return an iterable of (Classification, int) instead. Most groups return exactly one, with the exception of the many case where ty emits too many diagnostics, in which case we return two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the last set of changes, we now count the diagnostics individually. So if ty emits 5 diagnostics on the same line where a "# E" is present, we're counting them all as true positives. Similarly, if ty raises 3 diagnostics on one line of a tagged group (no '+') and 1 on each of the other lines, we count the 3 diagnostics as true positives and the remainder as false positives.

Happy to keep iterating on it though if this doesn't make sense.

@MichaReiser MichaReiser merged commit 58bffa4 into astral-sh:main Jan 23, 2026
42 checks passed
carljm added a commit that referenced this pull request Jan 30, 2026
* main: (62 commits)
  [`refurb`] Do not add `abc.ABC` if already present (`FURB180`) (#22234)
  [ty] Add a new `assert-type-unspellable-subtype` diagnostic (#22815)
  [ty] Avoid duplicate syntax errors for `await` outside functions (#22826)
  [ty] Fix unary operator false-positive for constrained TypeVars (#22783)
  [ty] Fix binary operator false-positive for constrained TypeVars (#22782)
  [ty] Fix false-positive `unsupported-operator` for "symmetric" TypeVars (#22756)
  [`pydocstyle`] Clarify which quote styles are allowed (`D300`) (#22825)
  [ty] Use distributed versions of AND and OR on constraint sets (#22614)
  [ty] Add support for dict literals and dict() calls as default values for parameters with TypedDict types (#22161)
  Document `-` stdin convention in CLI help text (#22817)
  [ty] Make `infer_subscript_expression_types` a method on `Type` (#22731)
  [ty] Simplify `OverloadLiteral::spans` and `OverloadLiteral::parameter_span` (#22823)
  [ty] Require both `*args` and `**kwargs` when calling a `ParamSpec` callable (#22820)
  [ty] Handle tagged errors in conformance (#22746)
  Add `--color` cli option to force colored output (#22806)
  Identify notebooks by LSP didOpen instead of `.ipynb` file extension (#22810)
  [ty] Fix docstring rendering for literal blocks after doctests (#22676)
  [ty] Update salsa to fix out-of-order query validation (#22498)
  [ty] Inline cycle initial and recovery functions (#22814)
  [ty] Pass the generic context through the decorator (#22544)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Related to internal CI tooling ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants