Skip to content

Conversation

@mishamsk
Copy link
Contributor

@mishamsk mishamsk commented Feb 8, 2025

Summary

This is at best a band-aid solution for #15963 . See my first comment below for discussion.

As of time of writing:

  • Fixes unbound class attribute access. Doesn't return a bound symbol anymore and thus causes a [unresolved-attribute] diagnostic
  • Intentionally as a separate commit:
    • ChangesType::ClassLiteral representation to type[ClassName] matching mypy (this was already the case for Type::SubclassOf)
    • Changes representation of a union of 2+ types to types.UnionType[TypeOne, TypeTwo, ...] also matching mypy

I didn't not add a separate new lint, despite the issue description. I am not sure why [unresolved-attribute] is not appropriate. Unbound attributes indeed do not exist at runtime, so the language seems perfectly appropriate.

Test Plan

cargo nextest run -p red_knot_python_semantic --no-fail-fast

@mishamsk
Copy link
Contributor Author

mishamsk commented Feb 8, 2025

@sharkdp first, sorry for taking so long, I got distracted by work ;-)

second, sorry for a long read for such a small thing. However, since we can't just talk it over - I'll dump why I think it is a band aid and PR marked as draft.

Disclaimer: by no means I am implying there is anything wrong with knot code. I totally understand that you are moving crazy fast and this is one huge undertaking, so I totally get it when something is not "perfect" or a temporary solution.

The reason it took me multiple hours to understand what's happening is that I got confused by the fact that knot uses a notion of "Symbol" for both real symbols and unbound declarations, which are not symbols at runtime. They only end up in annotations:

In [1]: class C:
   ...:     not_a_symbol: int
   ...:

In [2]: dir(C)
Out[2]:
['__annotations__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__']

In [3]: C.__annotations__
Out[3]: {'not_a_symbol': int}

Even though I quickly found the passage rgd Declaration that is used for both types of statements, the fact that they both create a symbol wasn't readily apparent.

Further, confusing the matter was the fact that red_knot_python_semantic::symbol::Symbol (not entirely helpfully matching the red_knot_python_semantic::semantic_index::symbol::Symbol) returns the enum Boundness for both types of "symbols". So pure (unbound) declarations will be Symbol::Type(ty, Boundness::Bound).

Ideas

Symbol::Type

Tbh and imho symbols and declarations should be split. At least in this context. But I felt like doing so would cause too many changes in too many places. Something I would only do if you confirm such an idea. There are multiple ways this can be done:

  • Very invasive - create something separate from symbols for declarations and match them to bindings / use sites when necessary. Probably too much of a change and disruption
  • Extend Boundness and add two more values: Declared and PossiblyDeclared
  • In the same vain - extend Symbol::Type tuple by adding some enum with the sub-kind of the symbol. This won't solve the confusing "Boundness::Bound unbound declared symbol" thing though
  • Maybe there is even better way

Split symbol function

red_knot_python_semantic::types::symbol function is used in multiple places to get the symbol type & boundness. But not all of them, e.g. Class::own_instance_member does it's own but similar thing. In the same vain, it was confusing me a bit. I'd split it or at least extract a more narrower common part out of it.

E.g. why would we need to check for TYPE_CHECKING on every invocation, if the only place where it may be triggered from is known_module_symbol (which calls symbol through global_symbol) and inversely why looking up from global_symbol would need to check for dunder __slots__.

Again, really sorry for a long dump, especially if there are very good reasons for all of this that I just failed to identify.

@mishamsk
Copy link
Contributor Author

mishamsk commented Feb 8, 2025

well. here is one false-positive from this change: https://github.com/astral-sh/ruff/actions/runs/13212332102/job/36887424507?pr=16036#step:7:94

ClassVar in stubs/ABCs needs to be handled as bound... I'll look into it later

@AlexWaygood AlexWaygood added the ty Multi-file analysis & type inference label Feb 8, 2025
@sharkdp
Copy link
Contributor

sharkdp commented Feb 10, 2025

As mentioned on Discord, I'll split my reply into a few comments

  • Intentionally as a separate commit:
    • ChangesType::ClassLiteral representation to type[ClassName] matching mypy (this was already the case for Type::SubclassOf)
    • Changes representation of a union of 2+ types to types.UnionType[TypeOne, TypeTwo, ...] also matching mypy

Commenting on this first. This should definitely be discussed elsewhere (ideally as a ticket first), but I don't think that we want either of these changes.

Unlike mypy, we deliberately distinguish Literal[C] from type[C], as they denote different types. We don't know yet if we really want to use Literal[C] as the spelling of that type, but I don't think we want to mix up Literal[C] and type[C].

The difference between the two is:

  • Literal[C] is the type of the class object C, for a class C: …. It is a singleton type with one single inhabitant: the class object C.
  • type[C] is the type of all class objects S, where S is a subclass of C, or C itself.

More concretely, both C and D are of type type[C], but only C is of type Literal[C]:

class C: ...
class D(C): ...

The union-type representation change would be more stylistic, but I don't see any argument for choosing types.UnionType[TypeOne, TypeTwo] over TypeOne | TypeTwo?

@mishamsk
Copy link
Contributor Author

but I don't think that we want either of these changes

Yeah, that's why put them in a separate commit. I had a hunch ;-) I just saw comments like "we want to improve the message". I'll drop this commit, no prob.

The difference between the two

yes, I do understand that and I saw that you use type[] only for Type::SubclassOf. I just thought that Literal is a bit unconventional coming from mypy/pyright. If you are planning to stray far from conventions, maybe changing the outer message would be an interesting approach. Instead of

Type `Literal[A]` has no attribute `non_existent`

generate

Class `A` has no attribute `non_existent`

The union-type representation change would be more stylistic, but I don't see any argument for choosing types.UnionType[TypeOne, TypeTwo] over TypeOne | TypeTwo?

same here. I went "conventional" way (mypy). If you keep Literal, it would be Literal[TypeOne, TypeTwo] I guess

@sharkdp
Copy link
Contributor

sharkdp commented Feb 10, 2025

I just saw comments like "we want to improve the message".

Class A has no attribute non_existent

I should have described this better in the TODO comment:

# TODO: we already show an error here but the message might be improved?
# mypy shows no error here, but pyright raises "reportAttributeAccessIssue"
# error: [unresolved-attribute] "Type `Literal[C]` has no attribute `inferred_from_value`"
reveal_type(C.inferred_from_value) # revealed: Unknown

I didn't specifically want to talk about the Literal[C] part here, but I agree that this could also be improved. I was rather thinking about the following: This specific diagnostic here is raised if someone tries to access a pure instance variable on the class object (not for a non-existent attribute). So I was thinking of giving them a hint and explaining that in more detail: "The attribute inferred_from_value can only be accessed on instances of class C, but not on the class object itself."

@sharkdp
Copy link
Contributor

sharkdp commented Feb 10, 2025

Disclaimer: by no means I am implying there is anything wrong with knot code. I totally understand that you are moving crazy fast and this is one huge undertaking, so I totally get it when something is not "perfect" or a temporary solution.

The reason it took me multiple hours to understand what's happening is that I got confused by the fact that knot uses a notion of "Symbol" for both real symbols and unbound declarations, which are not symbols at runtime. They only end up in annotations:

Further, confusing the matter was the fact that red_knot_python_semantic::symbol::Symbol (not entirely helpfully matching the red_knot_python_semantic::semantic_index::symbol::Symbol) returns the enum Boundness for both types of "symbols". So pure (unbound) declarations will be Symbol::Type(ty, Boundness::Bound).

Even though I quickly found the passage rgd Declaration that is used for both types of statements, the fact that they both create a symbol wasn't readily apparent.

It's completely fine to call this out. We know that the Symbol API can be improved, and we had a long discussion about the name — and we are aware that the clash with the term "symbol" in the semantic index builder is unfortunate. We also know that our handling of "boundness" and "declaredness" needs to be improved. For more details, see this issue and the explanations and comments in (this test suite](https://github.com/astral-sh/ruff/blob/main/crates/red_knot_python_semantic/resources/mdtest/boundness_declaredness/public.md).

That said, I would appreciate if we could move this discussion into a new ticket, to discuss it separately from the change in question here.

@sharkdp
Copy link
Contributor

sharkdp commented Feb 10, 2025

I now had a more detailed look at the ideas you proposed and they do look very interesting to me. They definitely go in the right direction. What I would prefer is if we could decouple any refactoring from this functional change here. Your "bandaid" solution doesn't look too bad to me, given our current API restrictions.

It would be great if you could remove the second commit from this branch, then we could focus on the functional change here and try to understand the errors in stub files better.

@codspeed-hq
Copy link

codspeed-hq bot commented Feb 11, 2025

CodSpeed Performance Report

Merging #16036 will not alter performance

Comparing mishamsk:15963-fix-diagnostic-for-pure-instance-variables-access-from-class (005fd6d) with main (e7a6c19)

Summary

✅ 32 untouched benchmarks

@mishamsk
Copy link
Contributor Author

mishamsk commented Feb 11, 2025

@sharkdp first rdg the PR:

  • reverted class literal display code
  • added ABC, ABCMeta to known classes
  • added handling of stubs & classvars to own_class_member with some tests
    • had to extract the code from symbol as I needed to get qualifiers, not just type... this is possibly a performance hit, since symbol used salsa query internally... something had to give (either big changes across the board, or this). To my defense own_instance_member is also re-creating the logic instead of using a shared query

Am I right, that you also want me to create a separate diagnostic for this specific access patter with the message you suggested? Didn't have time for this yet.

Also, I didn't do extensive validation that ABC's are properly caught in all possible scenarios. But at least based on two tests they do.

Now back to the general design. Or should I move this to astral-sh/ty#229?

I saw astral-sh/ty#229 and related comments, but not https://github.com/astral-sh/ruff/blob/main/crates/red_knot_python_semantic/resources/mdtest/boundness_declaredness/public.md - thanks for the link. I think the table is awesome, it literally suggest the perfect way to model this.

I'd refactor symbol to:

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) enum BindingStatus<'db> {
    Bound(Type<'db>),
    PossiblyUnbound(Type<'db>),
    Unbound,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) enum DeclarationStatus<'db> {
    Declared(TypeAndQualifiers<'db>),
    PossiblyDeclared(TypeAndQualifiers<'db>),
    Undeclared,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub(crate) enum Symbol<'db> {
    Known(BindingStatus<'db>, DeclarationStatus<'db>),
    /// Or Make Symbol a tuple since Unknown == BindingStatus::Unbound & DeclarationStatus::Undeclared
    Unknown,
}

return this from symbol query and then let the caller apply whatever logic is appropriate. Symbol can have methods that resolve types between declarations & bindings based on your table from that file too.

@mishamsk
Copy link
Contributor Author

  • remove ABC detection
  • remove special casing for class member access in stubs
  • improve error message for invalid pure instance attr access from class in load context
  • added error message for invalid pure instance attr access from class in store context
  • switched the symbol lookup implementation inside class member function into salsa query, similar to the not-so-shared symbol function

@mishamsk mishamsk marked this pull request as ready for review February 13, 2025 04:56
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I haven't done a full review, but I spotted a few things looking over quickly

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@carljm
Copy link
Contributor

carljm commented Feb 13, 2025

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

@AlexWaygood
Copy link
Member

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

Note that we haven't got up to date benchmark numbers on this PR right now, as the assertion in the benchmark regarding the number of diagnostics has been failing for the last few commits, so codspeed has not been running

@carljm
Copy link
Contributor

carljm commented Feb 13, 2025

I suspect the performance regression on the incremental benchmark here is not avoidable. We just need more cached query results now that we distinguish more ways to resolve symbols, and more cached query results means more re-validation cost in Salsa.

I'm also questioning this conclusion -- I think if we were able to make symbol return a SymbolAndQualifiers and that were enough to let us check for classvars externally, that might help address the perf impact here, since it would mean we could do this without increasing the number of cached memos (instead we'd just be slightly increasing the data stored in each one, which is not as expensive in Salsa incremental re-validation).

@MichaReiser
Copy link
Member

I haven't reviewed this PR yet because there are already a lot of folks involved. Feel free to ping me if or when a review from my side would be good.

@AlexWaygood AlexWaygood changed the title [red-knot] Add diagnostic for class-object access to pure instance variables and class literal repr in diagnostics [red-knot] Add diagnostic for class-object access to pure instance variables Feb 16, 2025
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good to me! I see your point about how it's sub-optimal until we address bound vs declared in Symbol, left an inline comment about that.

I think we need to get the benchmark running so we can see the perf impact of this, and evaluate the impact of tracking or not tracking symbol_by_id. But otherwise this looks good.

Sorry about the rebase conflicts; this was a particularly fast-moving week! Thanks for working on this!

@mishamsk mishamsk force-pushed the 15963-fix-diagnostic-for-pure-instance-variables-access-from-class branch from ef20b41 to 5f0bff7 Compare February 21, 2025 20:32
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@carljm
Copy link
Contributor

carljm commented Feb 21, 2025

CodSpeed says this is a 1% cold and 3% incremental regression. That's improved from 5.5% incremental regression in the prior version. I think the current regression is acceptable and within the reasonable range for the added functionality here. It may be that we get some improvement from resolving the TODO around re-resolving bindings.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks again, and sorry we keep creating merge conflicts for you here!

@sharkdp
Copy link
Contributor

sharkdp commented Feb 24, 2025

@mishamsk I can look into making the required changes to get this merged — thank you very much for your work!

@mishamsk
Copy link
Contributor Author

@mishamsk I can look into making the required changes to get this merged — thank you very much for your work!

thanks. I was going to say that I am happy to apply the requested changes myself, but looks like it is too late ;-)

Copy link
Contributor

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. The final benchmarks show a -1% regression on the incremental setup.

@sharkdp sharkdp merged commit 68991d0 into astral-sh:main Feb 24, 2025
21 checks passed
@mishamsk
Copy link
Contributor Author

@sharkdp thanks for pushing this over the edge. YaY

@mishamsk mishamsk deleted the 15963-fix-diagnostic-for-pure-instance-variables-access-from-class branch February 27, 2025 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants