Skip to content

typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace, avoid unnecessary broadcast in eager when/then/otherwise#2693

Merged
MarcoGorelli merged 17 commits intonarwhals-dev:mainfrom
MarcoGorelli:try-fixup-nativeseriest
Jun 20, 2025
Merged

typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace, avoid unnecessary broadcast in eager when/then/otherwise#2693
MarcoGorelli merged 17 commits intonarwhals-dev:mainfrom
MarcoGorelli:try-fixup-nativeseriest

Conversation

@MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jun 17, 2025

This addresses part of #2666

Related

Description

EagerNamespace leaves me in covariance/contravariance hell, I'm trying to put together a smaller repro to isolate the issue i'm facing

OK, I've finally fully (I think) made sense of the issues with EagerNamespace and why I wasn't able to put NativeSeriesT there too

  • If you define a protocol P generic in T and T doesn't appear in any of P's method's arguments, then type checkers expect P to be covariant in T
  • If you define a method foo(self, a: T) -> None, then type checkers expect P to be contravariant in T. But, if you instead define it to be foo(self, a: T | Any) -> None, then that turns off the type checking for a and so type checkers expect P to be covariant in T.
  • If you then overload foo and one of the overloads matches just T, then you'll run into a bad place, as type checkers simultaneously expect P to be contravariant in T and also covariant

In this case, EagerNamespace.from_native accepts NativeSeriesT | Any, and it's the | Any that's the really problematic part. It might as well just be Any, at which point we don't need EagerNamespace to be generic in NativeSeriesT if it's not used anywhere

Having said that, there might also be a bug in MyPy? https://gist.github.com/mypy-play/021d9f12a5e02e1e837767f0e6fd0e4a

I think possible solutions on that side are:

  • keep EagerNamespace non-generic in NativeSeriesT (i.e. status-quo)
  • make EagerNamespace generic in NativeSeriesT_contra, and then type: ignore a couple of lines in _from_native_impl when ModinSeries can't be assigned to NativeSeries

@MarcoGorelli MarcoGorelli changed the title typing: restore NativeSeriesT_contra to EagerDataFrame typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace Jun 17, 2025
@MarcoGorelli MarcoGorelli changed the title typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace typing: restore NativeSeriesT to EagerDataFrame Jun 17, 2025
@MarcoGorelli MarcoGorelli force-pushed the try-fixup-nativeseriest branch from f630740 to 155f5c8 Compare June 17, 2025 16:48
@MarcoGorelli MarcoGorelli marked this pull request as ready for review June 17, 2025 17:28
@dangotbanned dangotbanned self-requested a review June 17, 2025 20:21
@dangotbanned
Copy link
Member

Thanks @MarcoGorelli - love to see all this detail in the description! 😍

Will do my best to review tomorrow, just wanted to say I really appreciate this

@dangotbanned
Copy link
Member

dangotbanned commented Jun 17, 2025

I haven't tried these out, but two ideas I had

Modin

Change these defs

class _ModinDataFrame(Protocol):
_pandas_class: type[pd.DataFrame]
class _ModinSeries(Protocol):
_pandas_class: type[pd.Series[Any]]

To this, which should make them compatible with the Native* protocols

    class _ModinDataFrame(NativeFrame, Protocol):
        _pandas_class: type[pd.DataFrame]

    class _ModinSeries(NativeSeries, Protocol):
        _pandas_class: type[pd.Series[Any]]

Variance

For the TypeVar(s) or Protocol(s) that are problematic, trying defining the TypeVar(s) using infer_variance

from typing import Protocol
from narwhals._typing_compat import TypeVar

class CompliantWhat(Protocol):
    def hello(self) -> str: ...

CompliantWhatT = TypeVar("CompliantWhatT", bound=CompliantWhat, infer_variance=True)

@MarcoGorelli
Copy link
Member Author

thanks - i tried the first one but it didn't resolve the issue

@dangotbanned dangotbanned marked this pull request as draft June 18, 2025 13:37
@dangotbanned
Copy link
Member

thanks - i tried the first one but it didn't resolve the issue

No worries!

I'm just moving this to draft while I'm experimenting (#2693 (comment))

I really should've followed up this TODO I left (e19d5e1#r2139770243)
A large chunk of the issue is that

Comment on lines +152 to +159
Protocol38[EagerDataFrameT, EagerSeriesT, EagerExprT, NativeSeriesT],
):
def _temp_invariant(self, _: NativeSeriesT, /) -> NativeSeriesT:
"""**DO NOT MERGE**.

Using as a placeholder until there's real usage of `NativeSeriesT`.
"""
return _
Copy link
Member

@dangotbanned dangotbanned Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm planning to do is de-deduplicate the broadcasting semantics that we have in these two, that was introduced in (#2662)

if otherwise is None:
when, then = align_series_full_broadcast(when, then)
res_native = then.native.where(when.native)
else:
when, then, otherwise = align_series_full_broadcast(when, then, otherwise)
res_native = then.native.where(when.native, otherwise.native)
return then._with_native(res_native)

if otherwise is None:
when, then = align_series_full_broadcast(when, then)
res_native = pc.if_else(
when.native, then.native, pa.nulls(len(when.native), then.native.type)
)
else:
when, then, otherwise = align_series_full_broadcast(when, then, otherwise)
res_native = pc.if_else(when.native, then.native, otherwise.native)
return then._with_native(res_native)

AFAICT, all we needed to do in (#2662) was replace the use of _extract_comparand - with some generic version of align_series_full_broadcast

def _extract_comparand(self, other: EagerSeriesT, /) -> Any:
"""Extract native Series, broadcasting to `len(self)` if necessary."""
...

Before we had something like this:

        otherwise = pa.nulls(len(when), then.type) if otherwise is None else otherwise
        return pc.if_else(when, then, otherwise)

So the step can just happen in the implemented EagerWhen.__call__ and depend on the uninplemented Eager*._align_series_full_broadcast

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving the coverage to fail, since I really don't want to forget about not merging that method 😅

https://github.com/narwhals-dev/narwhals/actions/runs/15735541100/job/44347151400?pr=2693

Hopefully will be able to dip back in today

This comment was marked as resolved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went down a bit of a rabbit hole, but hopefully we've now got:

  • More functional typing
  • Better docs on broadcasting
  • More shared code
  • Hopefully more performant broadcasting (by using (np|pa).repeat)

@dangotbanned dangotbanned marked this pull request as ready for review June 18, 2025 18:09
@MarcoGorelli
Copy link
Member Author

thanks @dangotbanned , awesome additions and rewrites!

this helped notice that there was a further chance to improve by avoiding an unnecessary broadcast if otherwise_value is a scalar

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MarcoGorelli

Comment on lines +177 to +180
result = self._if_then_else(when.native, then.native, otherwise.native)
elif self._otherwise_value is not None:
otherwise = when._from_scalar(self._otherwise_value)
otherwise._broadcast = True
when, then, otherwise = align(when, then, otherwise)
result = self._if_then_else(when.native, then.native, otherwise.native)
else:
when, then = align(when, then)
result = self._if_then_else(when.native, then.native, None)
result = self._if_then_else(when.native, then.native, self._otherwise_value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please 😍

@dangotbanned dangotbanned linked an issue Jun 19, 2025 that may be closed by this pull request
@dangotbanned
Copy link
Member

The PR probably needs a rename, but other than that 🚢 🚢 🚢

)
names = [s.name for s in new_series]
reshaped = align_series_full_broadcast(*new_series)
align = new_series[0]._align_full_broadcast
Copy link
Member

@dangotbanned dangotbanned Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this kind of aliasing on almost every use.
Haven't come up with a nice short name, that still conveys the message well

In this PR we make this change, which I like because the series part is embedded in the type you call it on

utils.align_series_full_broadcast
EagerSeries._align_full_broadcast

I dunno maybe something like one of these could work:

EagerSeries.align_all
EagerSeries().align_with  # <-- instance method
EagerSeries.broadcast

I wonder if the rust impl of polars has any similar utils with a name we can steal? 😄

@MarcoGorelli MarcoGorelli changed the title typing: restore NativeSeriesT to EagerDataFrame typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace, avoid unnecessary broadcast in eager when/then/otherwise Jun 20, 2025
@MarcoGorelli
Copy link
Member Author

thanks Dan!

@MarcoGorelli MarcoGorelli merged commit ac74d8a into narwhals-dev:main Jun 20, 2025
33 checks passed
dangotbanned added a commit that referenced this pull request Jun 21, 2025
- Somewhat of a resurrection of #2227
- But this time building on #2693
MarcoGorelli pushed a commit that referenced this pull request Jun 26, 2025
* chore(typing): Kinda type `pandas_like.utils.`select_columns_by_name`

- Somewhat of a resurrection of #2227
- But this time building on #2693

* chore(typing): "Fix" `pandas_like.utils.(set_index|rename)`

* chore(typing): `NativeSeriesT` in `calculate_timestamp_date`

Confused why this (seemingly unrelated) change triggered the `[has-type]` in `concat_str`

* chore: `NativeSeriesT` in `calculate_timestamp_datetime`

* fix(typing): Flip operands, use `ndarray.all`

Resolves #2714 (comment)
Did this before in https://github.com/narwhals-dev/narwhals/blob/0f37267b3d05b8f5d7a37ef8f0b43647f4afec48/narwhals/_pandas_like/utils.py#L767
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

typ: restore (if possible) NativeSeriesT to EagerNamespace

2 participants