Skip to content

fix(typing): Resolve all mypy & pyright errors for _arrow#2007

Merged
dangotbanned merged 63 commits intomainfrom
typing-major-fixing-1
Feb 17, 2025
Merged

fix(typing): Resolve all mypy & pyright errors for _arrow#2007
dangotbanned merged 63 commits intomainfrom
typing-major-fixing-1

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 13, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@dangotbanned
Copy link
Member Author

@MarcoGorelli Almost out of the rabbit hole on this!

I've found some more places where (#1657 (comment)) would be pretty helpful:

def _from_native_series(
self: ArrowSeries[Any],
series: pa.Array[_ScalarT_co]
| pa.ChunkedArray[Any]
| pa.ChunkedArray[_ScalarT_co]
| pa.Array[Any],
) -> ArrowSeries[_ScalarT_co]:
return ArrowSeries(
chunked_array(series),
name=self._name,
backend_version=self._backend_version,
version=self._version,
)
@classmethod
def _from_iterable(
cls: type[Self],
data: Iterable[_ScalarT_co],
name: str,
*,
backend_version: tuple[int, ...],
version: Version,
) -> ArrowSeries[_ScalarT_co]:
return cls(
chunked_array([data]),
name=name,
backend_version=backend_version,
version=version,
)
def __narwhals_namespace__(self: Self) -> ArrowNamespace:
from narwhals._arrow.namespace import ArrowNamespace
return ArrowNamespace(
backend_version=self._backend_version, version=self._version

def cast(self: Self, dtype: DType) -> ArrowSeries[Any]:
ser = self._native_series
data_type = narwhals_to_native_dtype(dtype, self._version)
return self._from_native_series(pc.cast(ser, data_type))

Essentially anywhere that narwhals object would "change" its TypeVar, the current self.__class__(...) route breaks the typing.

So for ArrowSeries[T1], you can't go to ArrowSeries[T2] without a @classmethod that removes T1 from scope.


My brain has fully melted working on this, hope the above made sense 🫠
If not (https://typing.readthedocs.io/en/latest/spec/generics.html#scoping-rules-for-type-variables)

@MarcoGorelli
Copy link
Member

thanks for working on this

is it necessary to make ArrowSeries generic in the PyArrow type? would it work to just keep that out for now?

@dangotbanned
Copy link
Member Author

dangotbanned commented Feb 13, 2025

thanks for working on this

is it necessary to make ArrowSeries generic in the PyArrow type? would it work to just keep that out for now?

@MarcoGorelli 100% needed to resolve the issues I'm afraid 😔

Without the TypeVar, most of the @overloads end up matching Expression.
This was the main source of errors, since we don't appear to use Expression anywhere, it just introduces a huge amount of noise


Having read through a lot of the code (but not using pyarrow much personally) I am curious as to why we've not used Expression much/at all?

It seems to be available in our min version (https://arrow.apache.org/docs/11.0/python/generated/pyarrow.dataset.Expression.html)


For some context of the kinds of errors, (#1961 (comment))

Image

@dangotbanned
Copy link
Member Author

dangotbanned commented Feb 13, 2025

Comment on lines +37 to +48
Incomplete: TypeAlias = Any # pragma: no cover
"""
Marker for working code that fails on the stubs.

Common issues:
- Annotated for `Array`, but not `ChunkedArray`
- Relies on typing information that the stubs don't provide statically
- Missing attributes
- Incorrect return types
- Inconsistent use of generic/concrete types
- `_clone_signature` used on signatures that are not identical
"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been sprinkling these in with a comment when all else fails, e.g:

def diff(self: ArrowSeries[_NumericOrTemporalT]) -> ArrowSeries[_NumericOrTemporalT]:
# NOTE: stub only permits `ChunkedArray[TemporalScalar]`
# (https://github.com/zen-xu/pyarrow-stubs/blob/d97063876720e6a5edda7eb15f4efe07c31b8296/pyarrow-stubs/compute.pyi#L145-L148)
diff: Incomplete = pc.pairwise_diff
return self._from_native_series(diff(self._native_series.combine_chunks()))

If the stub issues get resolved in the future, this will be a lot easier to fix than just using Any directly

`pyright` doesn't need this, `mypy` infers this as `str` - which is too wide

> narwhals/_arrow/namespace.py:372: error: No overload variant of "binary_join_element_wise" matches argument types "Generator[ChunkedArray[StringScalar], None, None]", "str"  [call-overload]> narwhals/_arrow/namespace.py:372: note: Possible overload variants:
backend_version: tuple[int, ...],
) -> Any:
length: int, other: ArrowSeries, backend_version: tuple[int, ...]
) -> pa.BooleanArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 this doesn't have to be boolean, right? plenty of binary operations don't return booleans?

@dangotbanned
Copy link
Member Author

@MarcoGorelli I'm gonna leave you to it for now to avoid conflicts, but could you give me a shout when everything is ready please?

I'm getting warnings from pyright for some of these changes

image

@MarcoGorelli
Copy link
Member

sure, I think those pyright issues should be addressed now (and it would be really good to get pyright running in CI too)

downstream test failures should all be unrelated to this PR

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to get this in anyway, thanks @dangotbanned , really appreciate it!

@dangotbanned
Copy link
Member Author

dangotbanned commented Feb 17, 2025

happy to get this in anyway, thanks @dangotbanned , really appreciate it!

68747470733a2f2f692e666c756666792e63632f32564e3573524e4c48306864733477676c4248666e73744a6c4a5843346d674a2e676966

Thanks @MarcoGorelli, both mypy and pyright seem to be good with _arrow now 🎉

Edit

double cat was unintentional but I like it

@dangotbanned
Copy link
Member Author

sure, I think those pyright issues should be addressed now (and it would be really good to get pyright running in CI too)

downstream test failures should all be unrelated to this PR

@MarcoGorelli IIRC marimo has pyright in their CI - which could be worth a look

Comment on lines 196 to 202
def total_milliseconds(self: Self) -> ArrowSeries:
ser: ArrowSeries = self._compliant_series
arr = ser._native_series
unit = ser._type.unit
unit = ser._type.unit # type: ignore[attr-defined]
unit_to_milli_factor = {
"s": 1e3, # seconds
"ms": 1, # milli
Copy link
Member Author

@dangotbanned dangotbanned Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit dissapointed this change reintroduced the need for the #type: ignores that were fixed

Note

Just an observation

Copy link
Member Author

@dangotbanned dangotbanned Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2007 (comment)

For future reference, this commit has a solution using the "public" API 44560d1

Comment on lines +214 to +216
def is_pyarrow_chunked_array(
ser: Any | ArrowChunkedArray,
) -> TypeIs[ArrowChunkedArray]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you revert this to keeping the TypeVar?

I don't see a need to erase it, if it can be known here - especially since this is public

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, sure

@MarcoGorelli
Copy link
Member

Going to go ahead and merge this as I think it's a clear improvement, and we can address some smaller items like #2007 (comment) and #2007 (comment) as follow-ups

Thanks again Dan!

@dangotbanned dangotbanned merged commit 2b58ee7 into main Feb 17, 2025
24 of 28 checks passed
@dangotbanned dangotbanned deleted the typing-major-fixing-1 branch February 17, 2025 12:51
dangotbanned added a commit that referenced this pull request Feb 17, 2025
Pretty sure `pyarrow-stubs` is going to need an upstream fix for the imports

#2007 (comment)
@dangotbanned dangotbanned mentioned this pull request Feb 17, 2025
10 tasks
dangotbanned added a commit to dangotbanned/pyarrow-stubs that referenced this pull request Feb 19, 2025
Discovered during narwhals-dev/narwhals#2007

I worked around this with this, but seemed simple enough to upstream:
```py
def _type(self: pa.ChunkedArray[pa.Scalar[_DataType_CoT]]) -> _DataType_CoT:
    if TYPE_CHECKING:
        return self[0].type
    return self.type
```
dangotbanned added a commit that referenced this pull request Feb 22, 2025
I noted this in #2007 (comment)
Seems `preview` detects this, as I was expecting when I disabled the `pyright` diagnostic
@dangotbanned dangotbanned mentioned this pull request Apr 16, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: get mypy passing with pyarrow-stubs installed

2 participants