-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-32609: [Python] Add type annotations to PyArrow #47609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a0ce53c
to
9c881b4
Compare
4591f24
to
7ed3e70
Compare
7ed3e70
to
b564265
Compare
b564265
to
127e741
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @rok, I come bearing unsolicited suggestions 😉
A lot of this is from 2 recent PRs that have had me battling the current stubs more
python/pyarrow-stubs/compute.pyi
Outdated
def field(*name_or_index: str | tuple[str, ...] | int) -> Expression: ... | ||
|
||
|
||
def scalar(value: bool | float | str) -> Expression: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on
arrow/python/pyarrow/_compute.pyx
Lines 2859 to 2869 in 13c2615
@staticmethod | |
def _scalar(value): | |
cdef: | |
Scalar scalar | |
if isinstance(value, Scalar): | |
scalar = value | |
else: | |
scalar = lib.scalar(value) | |
return Expression.wrap(CMakeScalarExpression(scalar.unwrap())) |
The Expression
version (pc.scalar
) should accept the same types as pa.scalar
right?
Ran into it the other day here where I needed to add a cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what are you suggesting. Do you mean:
diff --git i/python/pyarrow-stubs/compute.pyi w/python/pyarrow-stubs/compute.pyi
index df660e0c0c..f005c5f552 100644
--- i/python/pyarrow-stubs/compute.pyi
+++ w/python/pyarrow-stubs/compute.pyi
@@ -84,7 +84,7 @@ _R = TypeVar("_R")
def field(*name_or_index: str | tuple[str, ...] | int) -> Expression: ...
-def scalar(value: bool | float | str) -> Expression: ...
+def scalar(value: Any) -> Expression: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah I guess Any
is what you have there so that could work.
But I think it would be more helpful to use something like this to start:
https://github.com/rok/arrow/blob/6a310149ed305d7e2606066f5d0915e9c23310f4/python/pyarrow-stubs/_stubs_typing.pyi#L50
PyScalar: TypeAlias = (bool | int | float | Decimal | str | bytes |
dt.date | dt.datetime | dt.time | dt.timedelta)
Then the snippet from (#47609 (comment)) seems to imply pa.Scalar
is valid as well.
So maybe this would document it more clearly?
def scalar(value: PyScalar | lib.Scalar[Any] | None) -> Expression: ...
python/pyarrow-stubs/_compute.pyi
Outdated
def name(self) -> str: ... | ||
@property | ||
def num_kernels(self) -> int: ... | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the overloads can be generated instead of written out and maintained manually.
Took me a while to discover this without it being in the stubs 😅
@property | |
def kernels(self) -> list[ScalarKernel | VectorKernel | ScalarAggregateKernel | HashAggregateKernel]: |
I know this isn't accurate for Function
itself, but it's the type returned by FunctionRegistry.get_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to be a bit fancier, maybe add some Generic
s into the mix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at extracting compute kernel signatures from C++ (valid input types are explicitly stated at registration time).
That would probably be more useful than the route I was going for here.
In python there's only the repr
to work with, but there is quite a lot of information encoded in it
import pyarrow.compute as pc
>>> pc.get_function("array_take").kernels[:10]
[VectorKernel<(primitive, integer) -> computed>,
VectorKernel<(binary-like, integer) -> computed>,
VectorKernel<(large-binary-like, integer) -> computed>,
VectorKernel<(fixed-size-binary-like, integer) -> computed>,
VectorKernel<(null, integer) -> computed>,
VectorKernel<(Type::DICTIONARY, integer) -> computed>,
VectorKernel<(Type::EXTENSION, integer) -> computed>,
VectorKernel<(Type::LIST, integer) -> computed>,
VectorKernel<(Type::LARGE_LIST, integer) -> computed>,
VectorKernel<(Type::LIST_VIEW, integer) -> computed>]
>>> pc.get_function("min_element_wise").kernels[:10]
[ScalarKernel<varargs[uint8*] -> uint8>,
ScalarKernel<varargs[uint16*] -> uint16>,
ScalarKernel<varargs[uint32*] -> uint32>,
ScalarKernel<varargs[uint64*] -> uint64>,
ScalarKernel<varargs[int8*] -> int8>,
ScalarKernel<varargs[int16*] -> int16>,
ScalarKernel<varargs[int32*] -> int32>,
ScalarKernel<varargs[int64*] -> int64>,
ScalarKernel<varargs[float*] -> float>,
ScalarKernel<varargs[double*] -> double>]
>>> pc.get_function("approximate_median").kernels
[ScalarAggregateKernel<(any) -> double>]
Oh awesome! Thank you @dangotbanned I love unsolicited suggestions like these! I am at pydata Paris right now so I probably can't reply properly until Monday, but given your experience I'm sure these will be very useful! |
Just a mental note: @pitrou suggested to look at extracting compute kernel signatures from C++ (valid input types are explicitly stated at registration time). |
Co-authored-by: Dan Redding <[email protected]>
Co-authored-by: Dan Redding <[email protected]>
Co-authored-by: Dan Redding <[email protected]>
Co-authored-by: Dan Redding <[email protected]>
Co-authored-by: Dan Redding <[email protected]>
mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs" | ||
|
||
[tool.pyright] | ||
include = ["pyarrow"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just reminded of this issue in pyarrow-stubs
(narwhals-dev/narwhals#3203 (comment))
include = ["pyarrow"] | |
pythonPlatform = "All" | |
pythonVersion = "3.9" | |
include = ["pyarrow"] |
The problem with not addressing it at the source, is that the errors propagate downstream to anyone that enables the setting
I'm counting only 4 of these not being pyarrow-stubs
😳
Show 113 errors
narwhals/_arrow/dataframe.py
narwhals/_arrow/dataframe.py:447:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/dataframe.py:448:38 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/dataframe.py:464:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/dataframe.py:465:38 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/namespace.py
narwhals/_arrow/namespace.py:160:17 - error: Argument of type "Overload[() -> _Scalar_CoT@max_element_wise, () -> Expression]" cannot be assigned to parameter "function" of type "(_T@reduce, _S@reduce) -> _T@reduce" in function "reduce"
No overloaded function matches type "(ChunkedArrayAny, ChunkedArrayAny) -> ChunkedArrayAny" (reportArgumentType)
narwhals/_arrow/namespace.py:184:29 - error: Argument of type "ChunkedArrayAny | _Scalar_CoT@max_element_wise" cannot be assigned to parameter "native_series" of type "ChunkedArrayAny" in function "__init__"
Type "ChunkedArrayAny | Scalar[Unknown]*" is not assignable to type "ChunkedArrayAny"
"Scalar[Unknown]*" is not assignable to "ChunkedArray[Any]" (reportArgumentType)
narwhals/_arrow/series.py
narwhals/_arrow/series.py:224:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:228:51 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:232:45 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:236:48 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:240:42 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:267:46 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:274:46 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:306:27 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:306:44 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:312:27 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:312:46 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:355:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:358:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:361:50 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:365:50 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:403:25 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:415:25 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:418:25 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:531:20 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:545:35 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:546:26 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:547:33 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:549:29 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:550:32 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:551:33 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:553:29 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:554:26 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:555:33 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:557:35 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:558:32 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:559:33 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:568:44 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:664:39 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:763:46 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:765:43 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:795:16 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/series.py:863:38 - error: No overloads for "<anonymous function>" match the provided arguments
Argument types: (ChunkedArrayAny, ChunkedArrayAny | ScalarAny | None) (reportCallIssue)
narwhals/_arrow/series.py:867:33 - error: No overloads for "<anonymous function>" match the provided arguments
Argument types: (ChunkedArrayAny, ChunkedArrayAny | ScalarAny) (reportCallIssue)
narwhals/_arrow/series.py:894:31 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:896:36 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:902:31 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:904:36 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:910:32 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:912:37 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:1030:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/series.py:1161:24 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:1178:34 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series.py:1178:68 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py
narwhals/_arrow/series_dt.py:51:9 - error: Type "dict[tuple[Literal['ns'], Literal['us']] | tuple[Literal['ns'], Literal['ms']] | tuple[Literal['us'], Literal['ns']] | tuple[Literal['us'], Literal['ms']] | tuple[Literal['ms'], Literal['ns']] | tuple[Literal['ms'], Literal['us']] | tuple[Literal['s'], Literal['ns']] | tuple[Literal['s'], Literal['us']] | tuple[Literal['s'], Literal['ms']], tuple[(left: ArrayOrScalar, right: ArrayOrScalar, /) -> Any, Literal[1000]] | tuple[(left: ArrayOrScalar, right: ArrayOrScalar, /) -> Any, Literal[1000000]] | tuple[() -> Unknown, Literal[1000]] | tuple[() -> Unknown, Literal[1000000]] | tuple[() -> Unknown, Literal[1000000000]]]" is not assignable to declared type "Mapping[tuple[UnitCurrent, UnitTarget], tuple[BinOpBroadcast, IntoRhs]]"
Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
Function accepts too many positional parameters; expected 0 but received 2
Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
Function accepts too many positional parameters; expected 0 but received 2
Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
Function accepts too many positional parameters; expected 0 but received 2
Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
Function accepts too many positional parameters; expected 0 but received 2 (reportAssignmentType)
narwhals/_arrow/series_dt.py:105:34 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:107:49 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:115:41 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:118:42 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:127:43 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:130:43 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:133:48 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:137:37 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:137:52 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:137:85 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:142:25 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:142:78 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:147:48 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:181:49 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:193:49 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:204:45 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:209:31 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:223:22 - error: No overloads for "assume_timezone" match the provided arguments (reportCallIssue)
narwhals/_arrow/series_dt.py:223:41 - error: Argument of type "TimestampArray | DurationScalar[Any]" cannot be assigned to parameter "timestamps" of type "Expression" in function "assume_timezone"
Type "TimestampArray | DurationScalar[Any]" is not assignable to type "Expression"
"TimestampArray" is not assignable to "Expression" (reportArgumentType)
narwhals/_arrow/series_str.py
narwhals/_arrow/series_str.py:24:22 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:43:26 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:56:44 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:77:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:80:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:83:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:95:35 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:106:47 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_str.py:116:26 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py
narwhals/_arrow/utils.py:278:30 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:284:42 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:291:42 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:291:54 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:293:17 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:293:33 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:297:29 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:305:27 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py:387:43 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/utils.py:388:43 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/utils.py:396:29 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/utils.py:416:29 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_pandas_like/series_dt.py
narwhals/_pandas_like/series_dt.py:81:29 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_pandas_like/series_dt.py:81:44 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_pandas_like/series_dt.py:81:78 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_pandas_like/series_dt.py:227:48 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/expr.py
narwhals/_plan/arrow/expr.py:137:37 - error: Argument of type "() -> Unknown" cannot be assigned to parameter "fn_native" of type "(Any) -> Any" in function "_unary_function"
Type "() -> Unknown" is not assignable to type "(Any) -> Any"
Function accepts too many positional parameters; expected 0 but received 1 (reportArgumentType)
narwhals/_plan/arrow/expr.py:264:43 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/expr.py:269:57 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/expr.py:309:40 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/expr.py:321:40 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/functions.py
narwhals/_plan/arrow/functions.py:91:16 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/functions.py:91:30 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/functions.py:184:19 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/namespace.py
narwhals/_plan/arrow/namespace.py:123:42 - error: Argument of type "Overload[() -> _Scalar_CoT@max_element_wise, () -> Expression]" cannot be assigned to parameter "fn_native" of type "(Any, Any) -> Any" in function "_horizontal_function"
No overloaded function matches type "(Any, Any) -> Any" (reportArgumentType)
narwhals/_plan/arrow/typing.py
narwhals/_plan/arrow/typing.py:52:39 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_polars/dataframe.py
narwhals/_polars/dataframe.py:474:34 - error: No overloads for "__getitem__" match the provided arguments (reportCallIssue)
narwhals/_polars/dataframe.py:474:34 - error: Argument of type "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" cannot be assigned to parameter "key" of type "MultiIndexSelector | MultiColSelector | SingleIndexSelector | tuple[SingleIndexSelector, MultiColSelector] | tuple[MultiIndexSelector, MultiColSelector]" in function "__getitem__"
Type "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to type "MultiIndexSelector | MultiColSelector | SingleIndexSelector | tuple[SingleIndexSelector, MultiColSelector] | tuple[MultiIndexSelector, MultiColSelector]"
"tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "int"
"tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "slice[Any, Any, Any]"
"tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "range"
"tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "Sequence[int]"
Type parameter "_T_co@Sequence" is covariant, but "slice[None, None, None] | int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]" is not a subtype of "int"
Type "slice[None, None, None] | int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]" is not assignable to type "int"
"CompliantSeries[Any]" is not assignable to "int" (reportArgumentType)
narwhals/_utils.py
narwhals/_utils.py:1034:18 - error: No overloads for "__new__" match the provided arguments (reportCallIssue)
narwhals/functions.py
narwhals/functions.py:579:40 - error: Cannot access attribute "name" for class "Distribution"
Attribute "name" is unknown (reportAttributeAccessIssue)
113 errors, 0 warnings, 0 informations
The fixes are usually pretty simple and specified here
Also I used "3.9"
here, but whatever minimum python version that pyarrow
is targeting would work as well (e.g. if you're bumping to 3.10
soon use that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already dropped 3.9 #47478 we may as well bump to 3.10 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice one thanks @rok
Looking into the pyright
errors from (#47609 (comment)) some more, it seems like 3 of the most common ones are explicitly disabled in your config:
Also reportAssignmentType appears to be disabled as a result of reportGeneralTypeIssues = "none"
.
As that setting disables the majority of pyright
rules, is this a temporary thing to work-around a high number of errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is for noise reduction at the moment. Please do advise if some of these could be kept disabled. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do advise if some of these could be kept disabled. :)
Sure thing, here's our current config
Details
[tool.pyright]
pythonPlatform = "All"
# NOTE (`pyarrow-stubs` do unsafe `TypeAlias` and `TypeVar` imports)
# pythonVersion = "3.9"
reportMissingTypeArgument = "error"
reportIncompatibleMethodOverride = "error"
reportMissingImports = "none"
reportMissingModuleSource = "none"
reportPrivateImportUsage = "none"
reportUnusedExpression = "none" # handled by (https://docs.astral.sh/ruff/rules/unused-variable/)
typeCheckingMode = "basic"
include = ["narwhals", "tests"]
ignore = [
"../.venv/",
"../../../**/Lib", # stdlib
"../../../**/typeshed*" # typeshed-fallback
]
And some other rules I was thinking of disabling to allow us to switch typeCheckingMode = "basic"
-> "strict"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my advice would be:
- Try changing your config to only
typeCheckingMode = "basic"
- Set some threshold for the minimum number of times each rule can be triggered before fixing them all becomes unbearable
- Fix everything below that threshold
- Bump up the threshold, rinse/repeat
That kind of workflow means you usually fix similar issues at the same time, and don't need to spend as long trying to understand what the error even means 😂
You can also gradually enable/disable subpackages with the include
, ignore
, exclude
filters if you need more control
Co-authored-by: Dan Redding <[email protected]>
assert isinstance(chunked_struct_array.type, pa.StructType) | ||
# Cast to the proper type for type checker | ||
struct_chunked_array = cast(pa.ChunkedArray[pa.StructScalar], chunked_struct_array) | ||
result = pa.Table.from_struct_array(struct_chunked_array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is worth linking the cause of this issue?
- A new home for pyarrow-stubs? #45919 (reply in thread)
- A new home for pyarrow-stubs? #45919 (reply in thread)
I think it is due to incomplete overloads for all of:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the nitpik btw
I'm trying to put myself in the shoes of a maintainer who is skeptical about typing.
They might see needing to adjust (working) tests to satisfy a type checker as a negative against typing itself - rather than the (unfortunate) outcome of a compromise 🙂
This proposes adding type annotation to pyarrow by adopting pyarrow-stubs into pyarrow. To do so we copy pyarrow-stubs's stubfiles into
arrow/python/pyarrow-stubs/
. We remove docstrings from annotations and provide a script to include them into stubfiles at wheel-build-time. We also remove overloads from annotations to simplify this PR. We then add annotation checks for stubfiles and some test files. We make suremypy
andpyright
annotation checks pass on stubfiles. Annotation checks should be expanded until all (or most) project files are covered.PR introduces:
arrow/python/pyarrow/