typing: restore NativeSeriesT to EagerDataFrame and EagerNamespace, avoid unnecessary broadcast in eager when/then/otherwise#2693
Conversation
f630740 to
155f5c8
Compare
|
Thanks @MarcoGorelli - love to see all this detail in the description! 😍 Will do my best to review tomorrow, just wanted to say I really appreciate this |
|
I haven't tried these out, but two ideas I had ModinChange these defs narwhals/narwhals/_namespace.py Lines 113 to 117 in f7d9451 To this, which should make them compatible with the class _ModinDataFrame(NativeFrame, Protocol):
_pandas_class: type[pd.DataFrame]
class _ModinSeries(NativeSeries, Protocol):
_pandas_class: type[pd.Series[Any]]VarianceFor the from typing import Protocol
from narwhals._typing_compat import TypeVar
class CompliantWhat(Protocol):
def hello(self) -> str: ...
CompliantWhatT = TypeVar("CompliantWhatT", bound=CompliantWhat, infer_variance=True) |
|
thanks - i tried the first one but it didn't resolve the issue |
No worries! I'm just moving this to draft while I'm experimenting (#2693 (comment)) I really should've followed up this TODO I left (e19d5e1#r2139770243) |
narwhals/_compliant/when_then.py
Outdated
| Protocol38[EagerDataFrameT, EagerSeriesT, EagerExprT, NativeSeriesT], | ||
| ): | ||
| def _temp_invariant(self, _: NativeSeriesT, /) -> NativeSeriesT: | ||
| """**DO NOT MERGE**. | ||
|
|
||
| Using as a placeholder until there's real usage of `NativeSeriesT`. | ||
| """ | ||
| return _ |
There was a problem hiding this comment.
What I'm planning to do is de-deduplicate the broadcasting semantics that we have in these two, that was introduced in (#2662)
narwhals/narwhals/_pandas_like/namespace.py
Lines 321 to 327 in f7d9451
narwhals/narwhals/_arrow/namespace.py
Lines 274 to 282 in f7d9451
AFAICT, all we needed to do in (#2662) was replace the use of _extract_comparand - with some generic version of align_series_full_broadcast
narwhals/narwhals/_compliant/dataframe.py
Lines 432 to 434 in f7d9451
Before we had something like this:
otherwise = pa.nulls(len(when), then.type) if otherwise is None else otherwise
return pc.if_else(when, then, otherwise)So the step can just happen in the implemented EagerWhen.__call__ and depend on the uninplemented Eager*._align_series_full_broadcast
There was a problem hiding this comment.
I'm leaving the coverage to fail, since I really don't want to forget about not merging that method 😅
Hopefully will be able to dip back in today
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Went down a bit of a rabbit hole, but hopefully we've now got:
- More functional typing
- Better docs on broadcasting
- More shared code
- Hopefully more performant broadcasting (by using
(np|pa).repeat)
Maybe this breaks on an older version?
|
thanks @dangotbanned , awesome additions and rewrites! this helped notice that there was a further chance to improve by avoiding an unnecessary broadcast if |
narwhals/_compliant/when_then.py
Outdated
| result = self._if_then_else(when.native, then.native, otherwise.native) | ||
| elif self._otherwise_value is not None: | ||
| otherwise = when._from_scalar(self._otherwise_value) | ||
| otherwise._broadcast = True | ||
| when, then, otherwise = align(when, then, otherwise) | ||
| result = self._if_then_else(when.native, then.native, otherwise.native) | ||
| else: | ||
| when, then = align(when, then) | ||
| result = self._if_then_else(when.native, then.native, None) | ||
| result = self._if_then_else(when.native, then.native, self._otherwise_value) |
|
The PR probably needs a rename, but other than that 🚢 🚢 🚢 |
| ) | ||
| names = [s.name for s in new_series] | ||
| reshaped = align_series_full_broadcast(*new_series) | ||
| align = new_series[0]._align_full_broadcast |
There was a problem hiding this comment.
I did this kind of aliasing on almost every use.
Haven't come up with a nice short name, that still conveys the message well
In this PR we make this change, which I like because the series part is embedded in the type you call it on
utils.align_series_full_broadcast
EagerSeries._align_full_broadcastI dunno maybe something like one of these could work:
EagerSeries.align_all
EagerSeries().align_with # <-- instance method
EagerSeries.broadcastI wonder if the rust impl of polars has any similar utils with a name we can steal? 😄
|
thanks Dan! |
* chore(typing): Kinda type `pandas_like.utils.`select_columns_by_name` - Somewhat of a resurrection of #2227 - But this time building on #2693 * chore(typing): "Fix" `pandas_like.utils.(set_index|rename)` * chore(typing): `NativeSeriesT` in `calculate_timestamp_date` Confused why this (seemingly unrelated) change triggered the `[has-type]` in `concat_str` * chore: `NativeSeriesT` in `calculate_timestamp_datetime` * fix(typing): Flip operands, use `ndarray.all` Resolves #2714 (comment) Did this before in https://github.com/narwhals-dev/narwhals/blob/0f37267b3d05b8f5d7a37ef8f0b43647f4afec48/narwhals/_pandas_like/utils.py#L767
This addresses part of #2666
Related
NativeSeriesTtoEagerNamespace#2666Description
EagerNamespaceleaves me in covariance/contravariance hell, I'm trying to put together a smaller repro to isolate the issue i'm facingOK, I've finally fully (I think) made sense of the issues with
EagerNamespaceand why I wasn't able to putNativeSeriesTthere tooPgeneric inTandTdoesn't appear in any ofP's method's arguments, then type checkers expectPto be covariant inTfoo(self, a: T) -> None, then type checkers expectPto be contravariant inT. But, if you instead define it to befoo(self, a: T | Any) -> None, then that turns off the type checking foraand so type checkers expectPto be covariant inT.fooand one of the overloads matches justT, then you'll run into a bad place, as type checkers simultaneously expectPto be contravariant inTand also covariantIn this case,
EagerNamespace.from_nativeacceptsNativeSeriesT | Any, and it's the| Anythat's the really problematic part. It might as well just beAny, at which point we don't needEagerNamespaceto be generic inNativeSeriesTif it's not used anywhereHaving said that, there might also be a bug in MyPy? https://gist.github.com/mypy-play/021d9f12a5e02e1e837767f0e6fd0e4a
I think possible solutions on that side are:
EagerNamespacenon-generic inNativeSeriesT(i.e. status-quo)EagerNamespacegeneric inNativeSeriesT_contra, and thentype: ignorea couple of lines in_from_native_implwhenModinSeriescan't be assigned toNativeSeries