Skip to content

Conversation

rok
Copy link
Member

@rok rok commented Sep 20, 2025

This proposes adding type annotation to pyarrow by adopting pyarrow-stubs into pyarrow. To do so we copy pyarrow-stubs's stubfiles into arrow/python/pyarrow-stubs/. We remove docstrings from annotations and provide a script to include them into stubfiles at wheel-build-time. We also remove overloads from annotations to simplify this PR. We then add annotation checks for stubfiles and some test files. We make sure mypy and pyright annotation checks pass on stubfiles. Annotation checks should be expanded until all (or most) project files are covered.

PR introduces:

  1. adds pyarrow-stubs into arrow/python/pyarrow/
  2. fixes pyarrow-stubs to pass mypy and pyright check
  3. adds mypy and pyright check to CI (crudely)
  4. adds a tool (update_stub_docstrings.py) to insert annotation docstrings into stubfiles

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Sep 20, 2025
@rok rok changed the title [Python] Add type annotations to PyArrow GH-32609: [Python] Add type annotations to PyArrow Sep 20, 2025
@apache apache deleted a comment from github-actions bot Sep 20, 2025
@apache apache deleted a comment from github-actions bot Sep 20, 2025
@rok rok requested review from pitrou and raulcd September 22, 2025 10:30
@rok rok force-pushed the pyarrow-stubs-2 branch 3 times, most recently from 4591f24 to 7ed3e70 Compare September 22, 2025 23:19
Copy link

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @rok, I come bearing unsolicited suggestions 😉

A lot of this is from 2 recent PRs that have had me battling the current stubs more

def field(*name_or_index: str | tuple[str, ...] | int) -> Expression: ...


def scalar(value: bool | float | str) -> Expression: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on

@staticmethod
def _scalar(value):
cdef:
Scalar scalar
if isinstance(value, Scalar):
scalar = value
else:
scalar = lib.scalar(value)
return Expression.wrap(CMakeScalarExpression(scalar.unwrap()))

The Expression version (pc.scalar) should accept the same types as pa.scalar right?

Ran into it the other day here where I needed to add a cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what are you suggesting. Do you mean:

diff --git i/python/pyarrow-stubs/compute.pyi w/python/pyarrow-stubs/compute.pyi
index df660e0c0c..f005c5f552 100644
--- i/python/pyarrow-stubs/compute.pyi
+++ w/python/pyarrow-stubs/compute.pyi
@@ -84,7 +84,7 @@ _R = TypeVar("_R")
 def field(*name_or_index: str | tuple[str, ...] | int) -> Expression: ...


-def scalar(value: bool | float | str) -> Expression: ...
+def scalar(value: Any) -> Expression: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah I guess Any is what you have there so that could work.

But I think it would be more helpful to use something like this to start:
https://github.com/rok/arrow/blob/6a310149ed305d7e2606066f5d0915e9c23310f4/python/pyarrow-stubs/_stubs_typing.pyi#L50

PyScalar: TypeAlias = (bool | int | float | Decimal | str | bytes |
                       dt.date | dt.datetime | dt.time | dt.timedelta)

Then the snippet from (#47609 (comment)) seems to imply pa.Scalar is valid as well.
So maybe this would document it more clearly?

def scalar(value: PyScalar | lib.Scalar[Any] | None) -> Expression: ...

def name(self) -> str: ...
@property
def num_kernels(self) -> int: ...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#45919 (reply in thread)

I wonder if the overloads can be generated instead of written out and maintained manually.

Took me a while to discover this without it being in the stubs 😅

Suggested change
@property
def kernels(self) -> list[ScalarKernel | VectorKernel | ScalarAggregateKernel | HashAggregateKernel]:

I know this isn't accurate for Function itself, but it's the type returned by FunctionRegistry.get_function

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to be a bit fancier, maybe add some Generics into the mix?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rok

look at extracting compute kernel signatures from C++ (valid input types are explicitly stated at registration time).

That would probably be more useful than the route I was going for here.

In python there's only the repr to work with, but there is quite a lot of information encoded in it

import pyarrow.compute as pc
>>> pc.get_function("array_take").kernels[:10]
[VectorKernel<(primitive, integer) -> computed>,
 VectorKernel<(binary-like, integer) -> computed>,
 VectorKernel<(large-binary-like, integer) -> computed>,
 VectorKernel<(fixed-size-binary-like, integer) -> computed>,
 VectorKernel<(null, integer) -> computed>,
 VectorKernel<(Type::DICTIONARY, integer) -> computed>,
 VectorKernel<(Type::EXTENSION, integer) -> computed>,
 VectorKernel<(Type::LIST, integer) -> computed>,
 VectorKernel<(Type::LARGE_LIST, integer) -> computed>,
 VectorKernel<(Type::LIST_VIEW, integer) -> computed>]
>>> pc.get_function("min_element_wise").kernels[:10]
[ScalarKernel<varargs[uint8*] -> uint8>,
 ScalarKernel<varargs[uint16*] -> uint16>,
 ScalarKernel<varargs[uint32*] -> uint32>,
 ScalarKernel<varargs[uint64*] -> uint64>,
 ScalarKernel<varargs[int8*] -> int8>,
 ScalarKernel<varargs[int16*] -> int16>,
 ScalarKernel<varargs[int32*] -> int32>,
 ScalarKernel<varargs[int64*] -> int64>,
 ScalarKernel<varargs[float*] -> float>,
 ScalarKernel<varargs[double*] -> double>]
>>> pc.get_function("approximate_median").kernels
[ScalarAggregateKernel<(any) -> double>]

@rok
Copy link
Member Author

rok commented Sep 30, 2025

Oh awesome! Thank you @dangotbanned I love unsolicited suggestions like these! I am at pydata Paris right now so I probably can't reply properly until Monday, but given your experience I'm sure these will be very useful!

@rok
Copy link
Member Author

rok commented Oct 2, 2025

Just a mental note: @pitrou suggested to look at extracting compute kernel signatures from C++ (valid input types are explicitly stated at registration time).

@rok rok force-pushed the pyarrow-stubs-2 branch from d46c7ef to 2c06c94 Compare October 10, 2025 17:52
mypy_path = "$MYPY_CONFIG_FILE_DIR/pyarrow-stubs"

[tool.pyright]
include = ["pyarrow"]
Copy link

@dangotbanned dangotbanned Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just reminded of this issue in pyarrow-stubs (narwhals-dev/narwhals#3203 (comment))

Suggested change
include = ["pyarrow"]
pythonPlatform = "All"
pythonVersion = "3.9"
include = ["pyarrow"]

The problem with not addressing it at the source, is that the errors propagate downstream to anyone that enables the setting

I'm counting only 4 of these not being pyarrow-stubs 😳

Show 113 errors

narwhals/_arrow/dataframe.py
  narwhals/_arrow/dataframe.py:447:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/dataframe.py:448:38 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/dataframe.py:464:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/dataframe.py:465:38 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_arrow/namespace.py
  narwhals/_arrow/namespace.py:160:17 - error: Argument of type "Overload[() -> _Scalar_CoT@max_element_wise, () -> Expression]" cannot be assigned to parameter "function" of type "(_T@reduce, _S@reduce) -> _T@reduce" in function "reduce"
    No overloaded function matches type "(ChunkedArrayAny, ChunkedArrayAny) -> ChunkedArrayAny" (reportArgumentType)
  narwhals/_arrow/namespace.py:184:29 - error: Argument of type "ChunkedArrayAny | _Scalar_CoT@max_element_wise" cannot be assigned to parameter "native_series" of type "ChunkedArrayAny" in function "__init__"
    Type "ChunkedArrayAny | Scalar[Unknown]*" is not assignable to type "ChunkedArrayAny"
      "Scalar[Unknown]*" is not assignable to "ChunkedArray[Any]" (reportArgumentType)
narwhals/_arrow/series.py
  narwhals/_arrow/series.py:224:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:228:51 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:232:45 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:236:48 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:240:42 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:267:46 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:274:46 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:306:27 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:306:44 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:312:27 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:312:46 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:355:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:358:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:361:50 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:365:50 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:403:25 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:415:25 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:418:25 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:531:20 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:545:35 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:546:26 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:547:33 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:549:29 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:550:32 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:551:33 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:553:29 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:554:26 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:555:33 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:557:35 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:558:32 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:559:33 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:568:44 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:664:39 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:763:46 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:765:43 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:795:16 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/series.py:863:38 - error: No overloads for "<anonymous function>" match the provided arguments
    Argument types: (ChunkedArrayAny, ChunkedArrayAny | ScalarAny | None) (reportCallIssue)
  narwhals/_arrow/series.py:867:33 - error: No overloads for "<anonymous function>" match the provided arguments
    Argument types: (ChunkedArrayAny, ChunkedArrayAny | ScalarAny) (reportCallIssue)
  narwhals/_arrow/series.py:894:31 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:896:36 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:902:31 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:904:36 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:910:32 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:912:37 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:1030:20 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/series.py:1161:24 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:1178:34 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series.py:1178:68 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/series_dt.py
  narwhals/_arrow/series_dt.py:51:9 - error: Type "dict[tuple[Literal['ns'], Literal['us']] | tuple[Literal['ns'], Literal['ms']] | tuple[Literal['us'], Literal['ns']] | tuple[Literal['us'], Literal['ms']] | tuple[Literal['ms'], Literal['ns']] | tuple[Literal['ms'], Literal['us']] | tuple[Literal['s'], Literal['ns']] | tuple[Literal['s'], Literal['us']] | tuple[Literal['s'], Literal['ms']], tuple[(left: ArrayOrScalar, right: ArrayOrScalar, /) -> Any, Literal[1000]] | tuple[(left: ArrayOrScalar, right: ArrayOrScalar, /) -> Any, Literal[1000000]] | tuple[() -> Unknown, Literal[1000]] | tuple[() -> Unknown, Literal[1000000]] | tuple[() -> Unknown, Literal[1000000000]]]" is not assignable to declared type "Mapping[tuple[UnitCurrent, UnitTarget], tuple[BinOpBroadcast, IntoRhs]]"
    Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
      Function accepts too many positional parameters; expected 0 but received 2
    Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
      Function accepts too many positional parameters; expected 0 but received 2
    Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
      Function accepts too many positional parameters; expected 0 but received 2
    Type "() -> Unknown" is not assignable to type "BinOpBroadcast"
      Function accepts too many positional parameters; expected 0 but received 2 (reportAssignmentType)
  narwhals/_arrow/series_dt.py:105:34 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:107:49 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:115:41 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:118:42 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:127:43 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:130:43 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:133:48 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:137:37 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:137:52 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:137:85 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:142:25 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:142:78 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:147:48 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:181:49 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:193:49 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:204:45 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:209:31 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:223:22 - error: No overloads for "assume_timezone" match the provided arguments (reportCallIssue)
  narwhals/_arrow/series_dt.py:223:41 - error: Argument of type "TimestampArray | DurationScalar[Any]" cannot be assigned to parameter "timestamps" of type "Expression" in function "assume_timezone"
    Type "TimestampArray | DurationScalar[Any]" is not assignable to type "Expression"
      "TimestampArray" is not assignable to "Expression" (reportArgumentType)
narwhals/_arrow/series_str.py
  narwhals/_arrow/series_str.py:24:22 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:43:26 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:56:44 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:77:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:80:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:83:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:95:35 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:106:47 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/series_str.py:116:26 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_arrow/utils.py
  narwhals/_arrow/utils.py:278:30 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:284:42 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:291:42 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:291:54 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:293:17 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:293:33 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:297:29 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:305:27 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_arrow/utils.py:387:43 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/utils.py:388:43 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/utils.py:396:29 - error: Variable not allowed in type expression (reportInvalidTypeForm)
  narwhals/_arrow/utils.py:416:29 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_pandas_like/series_dt.py
  narwhals/_pandas_like/series_dt.py:81:29 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_pandas_like/series_dt.py:81:44 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_pandas_like/series_dt.py:81:78 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_pandas_like/series_dt.py:227:48 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/expr.py
  narwhals/_plan/arrow/expr.py:137:37 - error: Argument of type "() -> Unknown" cannot be assigned to parameter "fn_native" of type "(Any) -> Any" in function "_unary_function"
    Type "() -> Unknown" is not assignable to type "(Any) -> Any"
      Function accepts too many positional parameters; expected 0 but received 1 (reportArgumentType)
  narwhals/_plan/arrow/expr.py:264:43 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_plan/arrow/expr.py:269:57 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_plan/arrow/expr.py:309:40 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_plan/arrow/expr.py:321:40 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/functions.py
  narwhals/_plan/arrow/functions.py:91:16 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_plan/arrow/functions.py:91:30 - error: Expected 0 positional arguments (reportCallIssue)
  narwhals/_plan/arrow/functions.py:184:19 - error: Expected 0 positional arguments (reportCallIssue)
narwhals/_plan/arrow/namespace.py
  narwhals/_plan/arrow/namespace.py:123:42 - error: Argument of type "Overload[() -> _Scalar_CoT@max_element_wise, () -> Expression]" cannot be assigned to parameter "fn_native" of type "(Any, Any) -> Any" in function "_horizontal_function"        
    No overloaded function matches type "(Any, Any) -> Any" (reportArgumentType)
narwhals/_plan/arrow/typing.py
  narwhals/_plan/arrow/typing.py:52:39 - error: Variable not allowed in type expression (reportInvalidTypeForm)
narwhals/_polars/dataframe.py
  narwhals/_polars/dataframe.py:474:34 - error: No overloads for "__getitem__" match the provided arguments (reportCallIssue)
  narwhals/_polars/dataframe.py:474:34 - error: Argument of type "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" cannot be assigned to parameter "key" of type "MultiIndexSelector | MultiColSelector | SingleIndexSelector | tuple[SingleIndexSelector, MultiColSelector] | tuple[MultiIndexSelector, MultiColSelector]" in function "__getitem__"
    Type "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to type "MultiIndexSelector | MultiColSelector | SingleIndexSelector | tuple[SingleIndexSelector, MultiColSelector] | tuple[MultiIndexSelector, MultiColSelector]"
      "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "int"
      "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "slice[Any, Any, Any]"
      "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "range"
      "tuple[slice[None, None, None], int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]]" is not assignable to "Sequence[int]"
        Type parameter "_T_co@Sequence" is covariant, but "slice[None, None, None] | int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]" is not a subtype of "int"
          Type "slice[None, None, None] | int | Sequence[int] | Series[Any] | CompliantSeries[Any] | ndarray[tuple[int], dtype[integer[Any]]]" is not assignable to type "int"
            "CompliantSeries[Any]" is not assignable to "int" (reportArgumentType)
narwhals/_utils.py
  narwhals/_utils.py:1034:18 - error: No overloads for "__new__" match the provided arguments (reportCallIssue)
narwhals/functions.py
  narwhals/functions.py:579:40 - error: Cannot access attribute "name" for class "Distribution"
    Attribute "name" is unknown (reportAttributeAccessIssue)
113 errors, 0 warnings, 0 informations

The fixes are usually pretty simple and specified here

Also I used "3.9" here, but whatever minimum python version that pyarrow is targeting would work as well (e.g. if you're bumping to 3.10 soon use that)

Copy link
Member Author

@rok rok Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already dropped 3.9 #47478 we may as well bump to 3.10 here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice one thanks @rok

Looking into the pyright errors from (#47609 (comment)) some more, it seems like 3 of the most common ones are explicitly disabled in your config:

Also reportAssignmentType appears to be disabled as a result of reportGeneralTypeIssues = "none".

As that setting disables the majority of pyright rules, is this a temporary thing to work-around a high number of errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for noise reduction at the moment. Please do advise if some of these could be kept disabled. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do advise if some of these could be kept disabled. :)

Sure thing, here's our current config

Details

[tool.pyright]
pythonPlatform = "All"
# NOTE (`pyarrow-stubs` do unsafe `TypeAlias` and `TypeVar` imports)
# pythonVersion = "3.9"
reportMissingTypeArgument = "error"
reportIncompatibleMethodOverride = "error"
reportMissingImports = "none"
reportMissingModuleSource = "none"
reportPrivateImportUsage = "none"
reportUnusedExpression = "none"    # handled by (https://docs.astral.sh/ruff/rules/unused-variable/)
typeCheckingMode = "basic"
include = ["narwhals", "tests"]
ignore = [
  "../.venv/",
  "../../../**/Lib",      # stdlib
  "../../../**/typeshed*" # typeshed-fallback
]

And some other rules I was thinking of disabling to allow us to switch typeCheckingMode = "basic" -> "strict"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my advice would be:

  1. Try changing your config to only typeCheckingMode = "basic"
  2. Set some threshold for the minimum number of times each rule can be triggered before fixing them all becomes unbearable
  3. Fix everything below that threshold
  4. Bump up the threshold, rinse/repeat

That kind of workflow means you usually fix similar issues at the same time, and don't need to spend as long trying to understand what the error even means 😂

You can also gradually enable/disable subpackages with the include, ignore, exclude filters if you need more control

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 13, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Oct 13, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 13, 2025
Comment on lines +950 to +953
assert isinstance(chunked_struct_array.type, pa.StructType)
# Cast to the proper type for type checker
struct_chunked_array = cast(pa.ChunkedArray[pa.StructScalar], chunked_struct_array)
result = pa.Table.from_struct_array(struct_chunked_array)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is worth linking the cause of this issue?

I think it is due to incomplete overloads for all of:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpik btw

I'm trying to put myself in the shoes of a maintainer who is skeptical about typing.

They might see needing to adjust (working) tests to satisfy a type checker as a negative against typing itself - rather than the (unfortunate) outcome of a compromise 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants