[ty] Handle tagged errors in conformance#22746
Conversation
Typing conformance resultsNo changes detected ✅ |
|
|
@AlexWaygood @MichaReiser My first pass at this was a little buggy and overcomplicated, but I think that it is in a bit better shape now! |
MichaReiser
left a comment
There was a problem hiding this comment.
Thank you. This overall makes sense to me. I've a few small nit comments
Co-authored-by: Micha Reiser <micha@reiser.io>
scripts/conformance.py
Outdated
| case _: | ||
| raise ValueError(f"Invalid source: {source}") |
There was a problem hiding this comment.
I'm surprised that this last case is needed here? Does ty complain without it?
There was a problem hiding this comment.
Just defensive programming while I was writing this!
|
It occurred to me that we can count the diagnostics properly and still render the results together so that you can get the full context for a tagged classification. If a tagged error group has diagnostics on more lines than allowed, I count the line with the most diagnostics towards the true positives and label the rest as false positives. Now the number of true positives and false positives add up to the total diagnostics, and the summary sentence is more coherent. |
|
I've also updated the table to count optional diagnostics as true positives where present and true negatives where absent. Happily, the |
|
|
||
|
|
||
| @dataclass(kw_only=True, slots=True) | ||
| class Evaluation: |
There was a problem hiding this comment.
This class feels pretty heavy only to support the case where one group has both true and false positives.
I was wondering if we could change classify to return an iterable of (Classification, int) instead. Most groups return exactly one, with the exception of the many case where ty emits too many diagnostics, in which case we return two.
There was a problem hiding this comment.
With the last set of changes, we now count the diagnostics individually. So if ty emits 5 diagnostics on the same line where a "# E" is present, we're counting them all as true positives. Similarly, if ty raises 3 diagnostics on one line of a tagged group (no '+') and 1 on each of the other lines, we count the 3 diagnostics as true positives and the remainder as false positives.
Happy to keep iterating on it though if this doesn't make sense.
* main: (62 commits) [`refurb`] Do not add `abc.ABC` if already present (`FURB180`) (#22234) [ty] Add a new `assert-type-unspellable-subtype` diagnostic (#22815) [ty] Avoid duplicate syntax errors for `await` outside functions (#22826) [ty] Fix unary operator false-positive for constrained TypeVars (#22783) [ty] Fix binary operator false-positive for constrained TypeVars (#22782) [ty] Fix false-positive `unsupported-operator` for "symmetric" TypeVars (#22756) [`pydocstyle`] Clarify which quote styles are allowed (`D300`) (#22825) [ty] Use distributed versions of AND and OR on constraint sets (#22614) [ty] Add support for dict literals and dict() calls as default values for parameters with TypedDict types (#22161) Document `-` stdin convention in CLI help text (#22817) [ty] Make `infer_subscript_expression_types` a method on `Type` (#22731) [ty] Simplify `OverloadLiteral::spans` and `OverloadLiteral::parameter_span` (#22823) [ty] Require both `*args` and `**kwargs` when calling a `ParamSpec` callable (#22820) [ty] Handle tagged errors in conformance (#22746) Add `--color` cli option to force colored output (#22806) Identify notebooks by LSP didOpen instead of `.ipynb` file extension (#22810) [ty] Fix docstring rendering for literal blocks after doctests (#22676) [ty] Update salsa to fix out-of-order query validation (#22498) [ty] Inline cycle initial and recovery functions (#22814) [ty] Pass the generic context through the decorator (#22544) ...
Summary
This PR adds support for tagged errors in the conformance suite, which may allow multiple errors or only one error on a single line depending on the presence of a "+" symbol in an error tag. Tags are collected from expected diagnostics and added to
tydiagnostics on corresponding lines. Diagnostics are compared as groups by tag if present or by line.Diagnostics matching tagged errors are checked to ensure errors were raised on the correct number of distinct lines.
This means that the classification doesn't penalize
tyfor raising multiple diagnostics on the same line even in cases wheretyreturns duplicate diagnostics.All diagnostics associated with a given tag are rendered together in the details table, but the statistics table counts diagnostics individually.
I've also updated the render step so that tagged diagnostics and diagnostics raised on the same line are now shown in the same cell in the table. The benefit here (I think) is that you'll be able to see all of the diagnostics removed when a line that raises multiple false positives is fixed.
Test Plan
I ran the following locally:
Details
Typing conformance results improved 🎉
The percentage of diagnostics emitted that were expected errors increased from 73.76% to 76.88%. The percentage of expected errors that received a diagnostic increased from 63.93% to 68.92%.
Summary
True positives added
typing.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.TypeAliasassignmenttyping.NewType: ANewTypebase cannot be generictuple[int, str]?tuple[int, str]?ifexpressions are not allowed in type expressionsyx,ynamedtuples_define_functional.py:31:18
unknown-argument
yArgument
zdoes not match any known parameterint, foundLiteral["1"]int, foundLiteral["1"]int, foundfloataPointwith no__delitem__methodTypeGuardfunction must have a parameter to narrowTypeGuardfunction must have a parameter to narrowtakes_callable_stris incorrect: Expected(object, /) -> str, founddef simple_typeguard(val: object) -> TypeGuard[int]takes_callable_str_protois incorrect: ExpectedCallableStrProto, founddef simple_typeguard(val: object) -> TypeGuard[int]TypeIsfunction must have a parameter to narrowTypeIsfunction must have a parameter to narrowtakes_typeguardis incorrect: Expected(object, /) -> TypeGuard[int], founddef is_int_typeis(val: object) -> TypeIs[int]takes_typeisis incorrect: Expected(object, /) -> TypeIs[int], founddef is_int_typeguard(val: object) -> TypeGuard[int]stris not assignable to the declared parameter typeintlist[int]is not assignable to the declared parameter typelist[object]qualifiers_final_annotation.py:134:3
unknown-argument
x,yArgument
adoes not match any known parameterqualifiers_final_annotation.py:135:9
invalid-argument-type
int, foundLiteral[""]Argument is incorrect: Expected
int, foundLiteral[""]MovieEIMovieFalse positives removed
Class1[int | float]Class5[int | float]Class1[int | float]Property[int | float]int | floatint | floattuple[str, str]set[int]intintintintintBBintintintBBOptional Diagnostics Added
ainnamedtuple(): Fieldaalready defined; will raiseValueErrorat runtimedefinnamedtuple()cannot be a Python keyword: Will raiseValueErrorat runtimedefinnamedtuple()cannot be a Python keyword: Will raiseValueErrorat runtime_dinnamedtuple()cannot start with an underscore: Will raiseValueErrorat runtime