-
Notifications
You must be signed in to change notification settings - Fork 176
Description
Important
This issue is not proposing changes to nw.DataFrame
Original comment
https://github.com/narwhals-dev/narwhals/pull/2393/files#r2052510250
narwhals/narwhals/dataframe.py
Lines 934 to 938 in f47ad1f
| if rows is None: | |
| return self._with_compliant(compliant[:, columns]) | |
| if columns is None: | |
| return self._with_compliant(compliant[rows, :]) | |
| return self._with_compliant(compliant[rows, columns]) |
@MarcoGorelli how would you feel about removing
CompliantDataFrame.__getitem__(and children) entirely?At this stage - I think - we should have a fairly good idea of what method to call in
EagerDataFrame.But when we pass it down to
EagerDataFrame.__getitem__- all it knows is we have atuple- then requiring more narrowing than I think we strictly have to 🤔
Description
I forgot about that particular comment.
But kept thinking about what an alternative might look like after discovering polars._utils.slice.
An easy way to avoid falling back into the patterns prior to (#2390) is to simply remove __getitem__ from the internal classes.
If we were to put slicing behind a namespace (e.g. CompliantSlice) then these methods could be entirely separate from the class(es) they're on now.
narwhals/narwhals/_compliant/dataframe.py
Lines 401 to 417 in f47ad1f
| def _gather(self, rows: SizedMultiIndexSelector[NativeSeriesT]) -> Self: ... | |
| def _gather_slice(self, rows: _SliceIndex | range) -> Self: ... | |
| def _select_multi_index( | |
| self, columns: SizedMultiIndexSelector[NativeSeriesT] | |
| ) -> Self: ... | |
| def _select_multi_name( | |
| self, columns: SizedMultiNameSelector[NativeSeriesT] | |
| ) -> Self: ... | |
| def _select_slice_index(self, columns: _SliceIndex | range) -> Self: ... | |
| def _select_slice_name(self, columns: _SliceName) -> Self: ... | |
| def __getitem__( | |
| self, | |
| item: tuple[ | |
| SingleIndexSelector | MultiIndexSelector[EagerSeriesT], | |
| MultiIndexSelector[EagerSeriesT] | MultiColSelector[EagerSeriesT], | |
| ], | |
| ) -> Self: |
So calls might look like this:
compliant._slice.gather(...)Where _slice is a property returning a type like
narwhals/narwhals/_arrow/namespace.py
Lines 41 to 43 in f47ad1f
| @property | |
| def _dataframe(self) -> type[ArrowDataFrame]: | |
| return ArrowDataFrame |
Or the property initializes a class like:
narwhals/narwhals/_compliant/expr.py
Lines 849 to 851 in f47ad1f
| @property | |
| def cat(self) -> EagerExprCatNamespace[Self]: | |
| return EagerExprCatNamespace(self) |
Benefits
- All internal code is forced to use more explicit selection methods
- The method names can be much shorter, since they're specific to the
*Sliceclass - There's an easy way to share code between
DataFrame,Series(feat: ImproveDataFrame.__getitem__consistency #2393 (comment)) - Opting-out of slicing can be done in a granular way or full-hog
- Depending on where you place a
not_implemented😏
- Depending on where you place a