Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: use binder (or similar) to track which attributes are defined. #4019

Open
ilevkivskyi opened this issue Sep 27, 2017 · 24 comments
Open
Labels

Comments

@ilevkivskyi
Copy link
Member

ilevkivskyi commented Sep 27, 2017

I couldn't find an issue for this, so here is an idea. Currently this passes mypy:

class C:
    x: int
c = C()
c.x

y: str
y

but obviously fails at runtime. Binder successfully tracks execution frames and type narrowing by assignments, it looks like it can also track each Var whether it is still just "declared" or already defined. So that we can show errors like Variable 'var' may be uninitialized here.

EDIT: updated first example

@elazarg
Copy link
Contributor

elazarg commented Sep 27, 2017

I can see how the second example can work, but how would the first one? It requires kind of type state or something sophisticated like that

@ilevkivskyi
Copy link
Member Author

but how would the first one? It requires kind of type state or something sophisticated like that

I think it will be not more complex than the second one. Every attribute has its Var in the class names symbol table. The state will be tracked per Var. IIUC currently binder tracks expressions, not Vars, but it seems not too hard to also track Vars.

@elazarg
Copy link
Contributor

elazarg commented Sep 27, 2017

I still don't get it. Maybe I misunderstood the example. How do you know that the field is not initialized?

@ilevkivskyi
Copy link
Member Author

Ah, sorry, the first example is indeed bad, since CallExpr is not bindable. It should look like this:

class C:
    x: int

c = C() # from analyzing the class body we know that 'C' instances are created
        # with field 'x' uninitialized. It is not always 100% clear, in uncertain cases we
        # assume it is initialized, but in this case we know it is not.

c.x  # this is an error therefore
c.x = 1  # now binder records that name 'x' of NameExpr('c') is initialized

@elazarg
Copy link
Contributor

elazarg commented Sep 27, 2017

It is almost always not 100% clear - method calls, any transfer of control, metaclasses a-la NamedTuple, etc.

@ilevkivskyi
Copy link
Member Author

It is almost always not 100% clear - method calls, any transfer of control, metaclasses a-la NamedTuple, etc.

My experience is quite opposite. Many classes have trivial __init__s (just look at mypy code, currently however many things are actually "initialized" to None because we can't use PEP 526, but this doesn't count, I propose to treat them as uninitialized), also we can special-case known popular synthetic types like NamedTuple and (in future) dataclasses. Also for method calls we can record which names are set by each method, so that we can track the following:

class C:
    x: int
    def start(self) -> None:
        self.x = 0

c = C()
if condition:
    c.start()
    # we know that 'c.x' is initialized here

# but not here, unless condition is know as always True statically.

We also will need to track function calls, this is true, but it is not super hard. Note, that at the end of the above example c.x may be initialized or may not, depending on condition, this is why the error says: "c.x" may be uninitialized here. The point is that if c.x is attempted to be read there, then it is not always safe.

@elazarg
Copy link
Contributor

elazarg commented Sep 27, 2017

I think that this is a volatile path to take - tracking effects type-state using inferred procedure summarization (you will need syntax in order to check libraries; this will be similar to type guards); we don't infer return types, even though it will be more useful and will involve less specialized machinery.

To be clear, I do think it will be an interesting thing to try; it's just that the high level of added complexity (to parts that are already too complicated) and effort does not seem to be worth the relatively weak guarantees, the number of false positives, and the "action from distance" involved in tracking function calls.

In other words, tracking field initialization means adding typestate-checking to mypy. It looks like a big leap from what we have right now. If we choose to do it, I'd vote for adding the more general mechanism, so it can be instantiated with other analyzes such as resource handling.

@JukkaL
Copy link
Collaborator

JukkaL commented Sep 28, 2017

Fine-grained tracking of whether attributes are defined goes against the philosophy of mypy. In particular mypy tries hard not to do non-local static analysis. I agree with @elazarg's reasoning, and it also makes incremental type checking harder. Doing this for regular variables is another thing entirely and doesn't have the same problems.

A more pragmatic approach would be to allow declaring that certain attributes (or all of them) must always be explicitly defined in __init__. Strawman example:

class A:
    x: int
    y: str

    def __init__(self) -> None:
        self.x = 0  # Complain about y not being defined here

    def f(self) -> None:
        self.z = 1  # Also complain about an attribute defined here

a = A()
del a.x  # Not allowed

We could probably implement this in a way that we'd have reasonable guarantees that attribute access can't fail for instances of A (usual caveats about introspection, Any types, etc. apply of course). Not sure how we'd enable this -- maybe through a config file option. I might want something like this in not-too-distant future.

@ilevkivskyi
Copy link
Member Author

OK, let us do this step by step. We can start from tracking global/local variables, then enable a flag to force all definitions in __init__, and then we will see.

@ethanhs
Copy link
Collaborator

ethanhs commented Sep 28, 2017

Requiring definitions in __init__ is going to be behind an option, yes? I don't like the idea of it making it required by default.

@ilevkivskyi
Copy link
Member Author

@ethanhs Yes

@ned
Copy link

ned commented Nov 4, 2017

I think it should be required by default. From PEP 526 (emphasis mine):

In particular, the value-less notation a: int allows one to annotate instance variables that should be initialized in __init__ or __new__.

So the PEP clearly intends that the annotated instance variables are initialised inside __init__ (or __new__). And I guess since this is fairly new syntax there wouldn't be much code in the wild that would break by requiring this by default.
The other thing, which I mention in the closed #4206, is that adding PEP 526 style instance variable annotations to an existing class can actually cause mypy to be silent where it would otherwise correctly throw a warning, making the annotations more dangerous than useful.

@ethanhs
Copy link
Collaborator

ethanhs commented Nov 4, 2017

It would be much better to check if any instance variables are left uninitialized than to check if all are initialized in __init__ or __new__, as to me the PEP snippet reads as a recommendation.

@ned
Copy link

ned commented Nov 4, 2017

Fair enough, that makes sense in the spirit of not wanting to warn about correct programs.

@ilevkivskyi
Copy link
Member Author

See #7756 for an important use case (problematic __init__).

@dpinol
Copy link

dpinol commented May 21, 2020

this would be a very cool feature for mypy

@JelleZijlstra
Copy link
Member

Related: #686

@ilevkivskyi
Copy link
Member Author

Updated the title to refer specifically to attributes, since variables are mostly supported now.

@markedwards
Copy link

Is this an example of this limitation:

def poc(flag: bool) -> str:
    if flag:
        foo = "abc"
    return foo.upper()

This passes with mypy 1.3.0.

@JukkaL
Copy link
Collaborator

JukkaL commented May 17, 2023

The variable check is not enabled by default. Try --enable-error-code possibly-undefined.

@markedwards
Copy link

markedwards commented May 17, 2023

Thanks, that fixes it. I'm surprised this is not included in strict = True. Is this error code documented? I can't find it.

@JukkaL
Copy link
Collaborator

JukkaL commented May 17, 2023

It still has a few known issues so it must be explicitly enabled.

@markedwards
Copy link

Is this the right place to post any issues (like false positives) encountered using possibly-undefined?

@JukkaL
Copy link
Collaborator

JukkaL commented May 18, 2023

You can create new issues. Already reported issues can be seen here: https://github.com/python/mypy/issues?q=is%3Aopen+is%3Aissue+label%3Atopic-partially-defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants