Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TYP: use overload to refine return type of set_axis #40197

Merged
merged 8 commits into from
Mar 14, 2021

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Mar 3, 2021

Trying to use overload so that mypy knows that the return type of .set_axis is depending on the value of inplace.

This seems to work - if I have

# t.py

import pandas as pd

reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], inplace=True))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], inplace=False))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4]))

then I get

$ mypy t.py 
t.py:3: note: Revealed type is 'None'
t.py:4: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:5: note: Revealed type is 'None'
t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:7: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:8: note: Revealed type is 'pandas.core.frame.DataFrame'

which seems correct.

However, running mypy pandas throws some errors, which I've ignored (similarly to how this was done in #37137, though this time there's an extra error) and put the errors as comments. @simonjayhawkins @Dr-Irv sorry to ask for your help here, but do you know how to address this?

Reading through python/mypy#6580 , my understanding is that we would need to provide exponentially many overloads for this to work without errors. However, by placing the : Literal[False] = ... overload first, then ties will be broken with that one and so the correct return type will be inferred (see the above t.py experiment)

@MarcoGorelli MarcoGorelli changed the title try typing set_axis TYP: use overload to refine return type of set_axis Mar 3, 2021
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 3, 2021

@MarcoGorelli I haven't looked at what you did, but did you take a look at the pandas type stubs created by Microsoft. See https://github.com/microsoft/python-type-stubs/blob/main/pandas/core/frame.pyi and https://github.com/microsoft/python-type-stubs/blob/main/pandas/core/generic.pyi and maybe that helps?

If those ideas don't work, then I can take a closer look.

@MarcoGorelli
Copy link
Member Author

Thanks @Dr-Irv , that's really useful! OK, they've gone down the "lots of overloads" approach, which is probably better

@MarcoGorelli MarcoGorelli marked this pull request as draft March 3, 2021 14:25
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 3, 2021 17:04
@MarcoGorelli MarcoGorelli marked this pull request as draft March 3, 2021 17:05
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 3, 2021 17:35
@gfyoung gfyoung added the Typing type annotations, mypy/pyright type checking label Mar 4, 2021
@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 4, 2021

Updating with the bool case:

import pandas as pd

reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0, inplace=False))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], inplace=True))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], inplace=False))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4], axis=0))
reveal_type(pd.DataFrame([1,2,3]).set_axis([2,3,4]))
$ mypy t.py 
t.py:5: note: Revealed type is 'None'
t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:7: note: Revealed type is 'None'
t.py:8: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:9: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:10: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:11: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:12: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'

EDIT

More thorough:

# t.py

import pandas as pd

inplace: bool

reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=False))
reveal_type(pd.DataFrame([1,2,3]).reset_index(drop=True, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, drop=False, inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=True, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=True))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(drop=True, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, drop=False, inplace=inplace))
reveal_type(pd.DataFrame([1,2,3]).reset_index(level=0, inplace=inplace, col_fill='a'))
reveal_type(pd.DataFrame([1,2,3]).reset_index(inplace=inplace))

output:

$ mypy t.py 
t.py:5: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:6: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:7: note: Revealed type is 'None'
t.py:8: note: Revealed type is 'None'
t.py:9: note: Revealed type is 'None'
t.py:10: note: Revealed type is 'None'
t.py:11: note: Revealed type is 'pandas.core.frame.DataFrame'
t.py:12: note: Revealed type is 'None'
t.py:13: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:14: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:15: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:16: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:17: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'
t.py:18: note: Revealed type is 'Union[pandas.core.frame.DataFrame, None]'

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Mar 9, 2021

Thanks @Dr-Irv , that's really useful! OK, they've gone down the "lots of overloads" approach, which is probably better

This frustration is compounded with the methods being in frame.py and series.py in order to have specific docstrings

maybe we could add the methods 'dynamically' as attributes calling the decorator functions instead of using them as function decorators to get the specific docstrings for the DataFrame and Series methods and then I think mypy would fall back to generic.py for the type definitions.

Thanks @MarcoGorelli for working on this. can you also update to reinstate the function signature overloads and lose the duplicated overload (as discussed on other PR)

edit: may need to use setattr to avoid "Cannot assign to a method" see #40265

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Mar 9, 2021

can you also update to reinstate the function signature overloads and lose the duplicated overload (as discussed on other PR)

Hey @simonjayhawkins - I've reinstated the function signature types, not sure which overload you think is duplicated though (doesn't the discussion in the other PR show that in fact that overload is necessary?)


here the last overload is even more necessary because in frame.py and in series.py there is return super().set_axis. Commenting out the last overload, that line will complain it doesn't have a valid overload variant:

pandas/core/series.py:4346: error: No overload variant of "set_axis" of "NDFrame" matches argument types "Any", "Union[str, int]", "bool"  [call-overload]
pandas/core/series.py:4346: note: Possible overload variants:
pandas/core/series.py:4346: note:     def set_axis(self, labels: Any, axis: Union[str, int] = ..., inplace: Literal[False] = ...) -> Series
pandas/core/series.py:4346: note:     def set_axis(self, labels: Any, axis: Union[str, int], inplace: Literal[True]) -> None
pandas/core/series.py:4346: note:     <1 more non-matching overload not shown>
pandas/core/frame.py:4470: error: No overload variant of "set_axis" of "NDFrame" matches argument types "Any", "Union[str, int]", "bool"  [call-overload]
pandas/core/frame.py:4470: note: Possible overload variants:
pandas/core/frame.py:4470: note:     def set_axis(self, labels: Any, axis: Union[str, int] = ..., inplace: Literal[False] = ...) -> DataFrame
pandas/core/frame.py:4470: note:     def set_axis(self, labels: Any, axis: Union[str, int], inplace: Literal[True]) -> None
pandas/core/frame.py:4470: note:     <1 more non-matching overload not shown>
Found 2 errors in 2 files (checked 1244 source files)

@simonjayhawkins
Copy link
Member

Hey @simonjayhawkins - I've reinstated the function signature types, not sure which overload you think is duplicated though (doesn't the discussion in the other PR show that in fact that overload is necessary?)

sure. that comment is now outdated.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli There doesn't seem much enthusiasm for #40202 so not sure how we are going to lock down these improvements or simplify the review process for overloads.

so lets get this in as is. merge when ready pending green

@MarcoGorelli
Copy link
Member Author

merge when ready pending green

Sure, thanks @simonjayhawkins !

@MarcoGorelli MarcoGorelli merged commit 015c0c0 into pandas-dev:master Mar 14, 2021
@MarcoGorelli MarcoGorelli deleted the type-set_axis branch March 14, 2021 15:47
sthagen added a commit to sthagen/pandas-dev-pandas that referenced this pull request Mar 14, 2021
 TYP: use overload to refine return type of set_axis (pandas-dev#40197)
@jbrockmendel jbrockmendel mentioned this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
* try typing set_axis

* fixup overloads

* remove return type from base definition

* searching in the sun for another overload

* bool

* allow defaults in bool case

* type non-overloaded arguments

Co-authored-by: Simon Hawkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants