[red-knot] Add diagnostic for class-object access to pure instance variables #16036

mishamsk · 2025-02-08T04:31:32Z

Summary

This is at best a band-aid solution for #15963 . See my first comment below for discussion.

As of time of writing:

Fixes unbound class attribute access. Doesn't return a bound symbol anymore and thus causes a [unresolved-attribute] diagnostic
Intentionally as a separate commit:
- ChangesType::ClassLiteral representation to type[ClassName] matching mypy (this was already the case for Type::SubclassOf)
- Changes representation of a union of 2+ types to types.UnionType[TypeOne, TypeTwo, ...] also matching mypy

I didn't not add a separate new lint, despite the issue description. I am not sure why [unresolved-attribute] is not appropriate. Unbound attributes indeed do not exist at runtime, so the language seems perfectly appropriate.

Test Plan

cargo nextest run -p red_knot_python_semantic --no-fail-fast

mishamsk · 2025-02-08T05:00:00Z

@sharkdp first, sorry for taking so long, I got distracted by work ;-)

second, sorry for a long read for such a small thing. However, since we can't just talk it over - I'll dump why I think it is a band aid and PR marked as draft.

Disclaimer: by no means I am implying there is anything wrong with knot code. I totally understand that you are moving crazy fast and this is one huge undertaking, so I totally get it when something is not "perfect" or a temporary solution.

The reason it took me multiple hours to understand what's happening is that I got confused by the fact that knot uses a notion of "Symbol" for both real symbols and unbound declarations, which are not symbols at runtime. They only end up in annotations:

In [1]: class C:
   ...:     not_a_symbol: int
   ...:

In [2]: dir(C)
Out[2]:
['__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__']

In [3]: C.__annotations__
Out[3]: {'not_a_symbol': int}

Even though I quickly found the passage rgd Declaration that is used for both types of statements, the fact that they both create a symbol wasn't readily apparent.

Further, confusing the matter was the fact that red_knot_python_semantic::symbol::Symbol (not entirely helpfully matching the red_knot_python_semantic::semantic_index::symbol::Symbol) returns the enum Boundness for both types of "symbols". So pure (unbound) declarations will be Symbol::Type(ty, Boundness::Bound).

Ideas

Symbol::Type

Tbh and imho symbols and declarations should be split. At least in this context. But I felt like doing so would cause too many changes in too many places. Something I would only do if you confirm such an idea. There are multiple ways this can be done:

Very invasive - create something separate from symbols for declarations and match them to bindings / use sites when necessary. Probably too much of a change and disruption
Extend Boundness and add two more values: Declared and PossiblyDeclared
In the same vain - extend Symbol::Type tuple by adding some enum with the sub-kind of the symbol. This won't solve the confusing "Boundness::Bound unbound declared symbol" thing though
Maybe there is even better way

Split `symbol` function

red_knot_python_semantic::types::symbol function is used in multiple places to get the symbol type & boundness. But not all of them, e.g. Class::own_instance_member does it's own but similar thing. In the same vain, it was confusing me a bit. I'd split it or at least extract a more narrower common part out of it.

E.g. why would we need to check for TYPE_CHECKING on every invocation, if the only place where it may be triggered from is known_module_symbol (which calls symbol through global_symbol) and inversely why looking up from global_symbol would need to check for dunder __slots__.

Again, really sorry for a long dump, especially if there are very good reasons for all of this that I just failed to identify.

mishamsk · 2025-02-08T05:09:53Z

well. here is one false-positive from this change: https://github.com/astral-sh/ruff/actions/runs/13212332102/job/36887424507?pr=16036#step:7:94

ClassVar in stubs/ABCs needs to be handled as bound... I'll look into it later

sharkdp · 2025-02-10T14:56:42Z

As mentioned on Discord, I'll split my reply into a few comments

Intentionally as a separate commit:

ChangesType::ClassLiteral representation to type[ClassName] matching mypy (this was already the case for Type::SubclassOf)

Changes representation of a union of 2+ types to types.UnionType[TypeOne, TypeTwo, ...] also matching mypy

Commenting on this first. This should definitely be discussed elsewhere (ideally as a ticket first), but I don't think that we want either of these changes.

Unlike mypy, we deliberately distinguish Literal[C] from type[C], as they denote different types. We don't know yet if we really want to use Literal[C] as the spelling of that type, but I don't think we want to mix up Literal[C] and type[C].

The difference between the two is:

Literal[C] is the type of the class object C, for a class C: …. It is a singleton type with one single inhabitant: the class object C.
type[C] is the type of all class objects S, where S is a subclass of C, or C itself.

More concretely, both C and D are of type type[C], but only C is of type Literal[C]:

class C: ...
class D(C): ...

The union-type representation change would be more stylistic, but I don't see any argument for choosing types.UnionType[TypeOne, TypeTwo] over TypeOne | TypeTwo?

mishamsk · 2025-02-10T15:20:30Z

but I don't think that we want either of these changes

Yeah, that's why put them in a separate commit. I had a hunch ;-) I just saw comments like "we want to improve the message". I'll drop this commit, no prob.

The difference between the two

yes, I do understand that and I saw that you use type[] only for Type::SubclassOf. I just thought that Literal is a bit unconventional coming from mypy/pyright. If you are planning to stray far from conventions, maybe changing the outer message would be an interesting approach. Instead of

Type `Literal[A]` has no attribute `non_existent`

generate

Class `A` has no attribute `non_existent`

The union-type representation change would be more stylistic, but I don't see any argument for choosing types.UnionType[TypeOne, TypeTwo] over TypeOne | TypeTwo?

same here. I went "conventional" way (mypy). If you keep Literal, it would be Literal[TypeOne, TypeTwo] I guess

sharkdp · 2025-02-10T15:44:37Z

I just saw comments like "we want to improve the message".

Class A has no attribute non_existent

I should have described this better in the TODO comment:

ruff/crates/red_knot_python_semantic/resources/mdtest/attributes.md

Lines 53 to 56 in f7819e5

    
           # TODO: we already show an error here but the message might be improved? 
        
           # mypy shows no error here, but pyright raises "reportAttributeAccessIssue" 
        
           # error: [unresolved-attribute] "Type `Literal[C]` has no attribute `inferred_from_value`" 
        
           reveal_type(C.inferred_from_value)  # revealed: Unknown

I didn't specifically want to talk about the Literal[C] part here, but I agree that this could also be improved. I was rather thinking about the following: This specific diagnostic here is raised if someone tries to access a pure instance variable on the class object (not for a non-existent attribute). So I was thinking of giving them a hint and explaining that in more detail: "The attribute inferred_from_value can only be accessed on instances of class C, but not on the class object itself."

sharkdp · 2025-02-10T15:52:22Z

Disclaimer: by no means I am implying there is anything wrong with knot code. I totally understand that you are moving crazy fast and this is one huge undertaking, so I totally get it when something is not "perfect" or a temporary solution.

The reason it took me multiple hours to understand what's happening is that I got confused by the fact that knot uses a notion of "Symbol" for both real symbols and unbound declarations, which are not symbols at runtime. They only end up in annotations:

Further, confusing the matter was the fact that red_knot_python_semantic::symbol::Symbol (not entirely helpfully matching the red_knot_python_semantic::semantic_index::symbol::Symbol) returns the enum Boundness for both types of "symbols". So pure (unbound) declarations will be Symbol::Type(ty, Boundness::Bound).

Even though I quickly found the passage rgd Declaration that is used for both types of statements, the fact that they both create a symbol wasn't readily apparent.

It's completely fine to call this out. We know that the Symbol API can be improved, and we had a long discussion about the name — and we are aware that the clash with the term "symbol" in the semantic index builder is unfortunate. We also know that our handling of "boundness" and "declaredness" needs to be improved. For more details, see this issue and the explanations and comments in (this test suite](https://github.com/astral-sh/ruff/blob/main/crates/red_knot_python_semantic/resources/mdtest/boundness_declaredness/public.md).

That said, I would appreciate if we could move this discussion into a new ticket, to discuss it separately from the change in question here.

sharkdp · 2025-02-10T19:24:44Z

I now had a more detailed look at the ideas you proposed and they do look very interesting to me. They definitely go in the right direction. What I would prefer is if we could decouple any refactoring from this functional change here. Your "bandaid" solution doesn't look too bad to me, given our current API restrictions.

It would be great if you could remove the second commit from this branch, then we could focus on the functional change here and try to understand the errors in stub files better.

codspeed-hq · 2025-02-11T00:21:29Z

CodSpeed Performance Report

Merging #16036 will not alter performance

_{Comparing mishamsk:15963-fix-diagnostic-for-pure-instance-variables-access-from-class (005fd6d) with main (e7a6c19)}

Summary

✅ 32 untouched benchmarks

mishamsk · 2025-02-11T00:24:41Z

@sharkdp first rdg the PR:

reverted class literal display code
added ABC, ABCMeta to known classes
added handling of stubs & classvars to own_class_member with some tests
- had to extract the code from symbol as I needed to get qualifiers, not just type... this is possibly a performance hit, since symbol used salsa query internally... something had to give (either big changes across the board, or this). To my defense own_instance_member is also re-creating the logic instead of using a shared query

Am I right, that you also want me to create a separate diagnostic for this specific access patter with the message you suggested? Didn't have time for this yet.

Also, I didn't do extensive validation that ABC's are properly caught in all possible scenarios. But at least based on two tests they do.

Now back to the general design. Or should I move this to astral-sh/ty#229?

I saw astral-sh/ty#229 and related comments, but not https://github.com/astral-sh/ruff/blob/main/crates/red_knot_python_semantic/resources/mdtest/boundness_declaredness/public.md - thanks for the link. I think the table is awesome, it literally suggest the perfect way to model this.

I'd refactor symbol to:

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) enum BindingStatus<'db> {
    Bound(Type<'db>),
    PossiblyUnbound(Type<'db>),
    Unbound,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) enum DeclarationStatus<'db> {
    Declared(TypeAndQualifiers<'db>),
    PossiblyDeclared(TypeAndQualifiers<'db>),
    Undeclared,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) enum Symbol<'db> {
    Known(BindingStatus<'db>, DeclarationStatus<'db>),
    /// Or Make Symbol a tuple since Unknown == BindingStatus::Unbound & DeclarationStatus::Undeclared
    Unknown,
}

return this from symbol query and then let the caller apply whatever logic is appropriate. Symbol can have methods that resolve types between declarations & bindings based on your table from that file too.

crates/red_knot_python_semantic/src/types.rs

mishamsk · 2025-02-13T04:55:57Z

remove ABC detection
remove special casing for class member access in stubs
improve error message for invalid pure instance attr access from class in load context
added error message for invalid pure instance attr access from class in store context
switched the symbol lookup implementation inside class member function into salsa query, similar to the not-so-shared symbol function

AlexWaygood

Nice! I haven't done a full review, but I spotted a few things looking over quickly

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/types/infer.rs

carljm

Thank you!

crates/red_knot_python_semantic/resources/mdtest/attributes.md

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/resources/mdtest/attributes.md

crates/red_knot_python_semantic/resources/mdtest/expression/attribute.md

crates/red_knot_python_semantic/src/types.rs

crates/red_knot_python_semantic/src/types/infer.rs

carljm · 2025-02-13T23:36:52Z

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

AlexWaygood · 2025-02-13T23:43:44Z

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

Note that we haven't got up to date benchmark numbers on this PR right now, as the assertion in the benchmark regarding the number of diagnostics has been failing for the last few commits, so codspeed has not been running

carljm · 2025-02-13T23:52:09Z

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

I'm also questioning this conclusion -- I think if we were able to make symbol return a SymbolAndQualifiers and that were enough to let us check for classvars externally, that might help address the perf impact here, since it would mean we could do this without increasing the number of cached memos (instead we'd just be slightly increasing the data stored in each one, which is not as expensive in Salsa incremental re-validation).

MichaReiser · 2025-02-14T15:00:55Z

I haven't reviewed this PR yet because there are already a lot of folks involved. Feel free to ping me if or when a review from my side would be good.

carljm

This looks pretty good to me! I see your point about how it's sub-optimal until we address bound vs declared in Symbol, left an inline comment about that.

I think we need to get the benchmark running so we can see the perf impact of this, and evaluate the impact of tracking or not tracking symbol_by_id. But otherwise this looks good.

Sorry about the rebase conflicts; this was a particularly fast-moving week! Thanks for working on this!

crates/red_knot_python_semantic/src/symbol.rs

crates/red_knot_python_semantic/src/types/infer.rs

crates/red_knot_python_semantic/src/symbol.rs

fixes astral-sh#15963

github-actions · 2025-02-21T20:39:32Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

carljm · 2025-02-21T20:46:50Z

CodSpeed says this is a 1% cold and 3% incremental regression. That's improved from 5.5% incremental regression in the prior version. I think the current regression is acceptable and within the reasonable range for the added functionality here. It may be that we get some improvement from resolving the TODO around re-resolving bindings.

crates/red_knot_python_semantic/src/symbol.rs

AlexWaygood

thanks again, and sorry we keep creating merge conflicts for you here!

crates/red_knot_python_semantic/resources/mdtest/attributes.md

crates/red_knot_python_semantic/resources/mdtest/expression/attribute.md

crates/red_knot_python_semantic/src/types/infer.rs

sharkdp · 2025-02-24T13:55:32Z

@mishamsk I can look into making the required changes to get this merged — thank you very much for your work!

…for-pure-instance-variables-access-from-class

mishamsk · 2025-02-24T14:07:24Z

@mishamsk I can look into making the required changes to get this merged — thank you very much for your work!

thanks. I was going to say that I am happy to apply the requested changes myself, but looks like it is too late ;-)

sharkdp

Thank you very much. The final benchmarks show a -1% regression on the incremental setup.

mishamsk · 2025-02-24T17:35:55Z

@sharkdp thanks for pushing this over the edge. YaY

AlexWaygood added the ty Multi-file analysis & type inference label Feb 8, 2025

sharkdp reviewed Feb 11, 2025

View reviewed changes

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved

mishamsk marked this pull request as ready for review February 13, 2025 04:56

mishamsk requested review from AlexWaygood, MichaReiser and carljm as code owners February 13, 2025 04:56

AlexWaygood reviewed Feb 13, 2025

View reviewed changes

carljm reviewed Feb 13, 2025

View reviewed changes

carljm mentioned this pull request Feb 14, 2025

[red-knot] accessing an instance member can cause cross-module AST dependency #16172

Closed

AlexWaygood changed the title ~~[red-knot] Add diagnostic for class-object access to pure instance variables and class literal repr in diagnostics~~ [red-knot] Add diagnostic for class-object access to pure instance variables Feb 16, 2025

carljm reviewed Feb 21, 2025

View reviewed changes

carljm mentioned this pull request Feb 21, 2025

[red-knot] annotated assignments without RHS in stubs should still be bindings #16264

Closed

Add diagnostic for class-object access to pure instance variables

5f0bff7

fixes astral-sh#15963

mishamsk force-pushed the 15963-fix-diagnostic-for-pure-instance-variables-access-from-class branch from ef20b41 to 5f0bff7 Compare February 21, 2025 20:32

try re-adding salsa tracking to symbol_by_id

e4d5ea8

sharkdp reviewed Feb 24, 2025

View reviewed changes

crates/red_knot_python_semantic/src/symbol.rs Show resolved Hide resolved

AlexWaygood reviewed Feb 24, 2025

View reviewed changes

sharkdp added 3 commits February 24, 2025 14:55

Merge remote-tracking branch 'origin/main' into 15963-fix-diagnostic-…

a19fe20

…for-pure-instance-variables-access-from-class

Various minor review comments

c043b1f

Remove SymbolAndQualifiers::is_final

005fd6d

sharkdp approved these changes Feb 24, 2025

View reviewed changes

sharkdp merged commit 68991d0 into astral-sh:main Feb 24, 2025
21 checks passed

mishamsk deleted the 15963-fix-diagnostic-for-pure-instance-variables-access-from-class branch February 27, 2025 01:25

[red-knot] Add diagnostic for class-object access to pure instance variables #16036

[red-knot] Add diagnostic for class-object access to pure instance variables #16036

Uh oh!

Conversation

mishamsk commented Feb 8, 2025

Summary

Test Plan

Uh oh!

mishamsk commented Feb 8, 2025

Ideas

Symbol::Type

Split symbol function

Uh oh!

mishamsk commented Feb 8, 2025

Uh oh!

sharkdp commented Feb 10, 2025

Uh oh!

mishamsk commented Feb 10, 2025

Uh oh!

sharkdp commented Feb 10, 2025

Uh oh!

sharkdp commented Feb 10, 2025

Uh oh!

sharkdp commented Feb 10, 2025

Uh oh!

codspeed-hq bot commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #16036 will not alter performance

Summary

Uh oh!

mishamsk commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mishamsk commented Feb 13, 2025

Uh oh!

AlexWaygood left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

carljm commented Feb 13, 2025

Uh oh!

AlexWaygood commented Feb 13, 2025

Uh oh!

carljm commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MichaReiser commented Feb 14, 2025

Uh oh!

carljm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Split `symbol` function

codspeed-hq bot commented Feb 11, 2025 •

edited

Loading

mishamsk commented Feb 11, 2025 •

edited

Loading

carljm commented Feb 13, 2025 •

edited

Loading

github-actions bot commented Feb 21, 2025 •

edited

Loading

`ruff-ecosystem` results