Skip to content

Alternative to CompliantDataFrame.__getitem__ #2423

@dangotbanned

Description

@dangotbanned

Important

This issue is not proposing changes to nw.DataFrame

Original comment

https://github.com/narwhals-dev/narwhals/pull/2393/files#r2052510250

if rows is None:
return self._with_compliant(compliant[:, columns])
if columns is None:
return self._with_compliant(compliant[rows, :])
return self._with_compliant(compliant[rows, columns])

@MarcoGorelli how would you feel about removing CompliantDataFrame.__getitem__ (and children) entirely?

At this stage - I think - we should have a fairly good idea of what method to call in EagerDataFrame.

But when we pass it down to EagerDataFrame.__getitem__ - all it knows is we have a tuple - then requiring more narrowing than I think we strictly have to 🤔

Description

I forgot about that particular comment.

But kept thinking about what an alternative might look like after discovering polars._utils.slice.
An easy way to avoid falling back into the patterns prior to (#2390) is to simply remove __getitem__ from the internal classes.

If we were to put slicing behind a namespace (e.g. CompliantSlice) then these methods could be entirely separate from the class(es) they're on now.

def _gather(self, rows: SizedMultiIndexSelector[NativeSeriesT]) -> Self: ...
def _gather_slice(self, rows: _SliceIndex | range) -> Self: ...
def _select_multi_index(
self, columns: SizedMultiIndexSelector[NativeSeriesT]
) -> Self: ...
def _select_multi_name(
self, columns: SizedMultiNameSelector[NativeSeriesT]
) -> Self: ...
def _select_slice_index(self, columns: _SliceIndex | range) -> Self: ...
def _select_slice_name(self, columns: _SliceName) -> Self: ...
def __getitem__(
self,
item: tuple[
SingleIndexSelector | MultiIndexSelector[EagerSeriesT],
MultiIndexSelector[EagerSeriesT] | MultiColSelector[EagerSeriesT],
],
) -> Self:

So calls might look like this:

compliant._slice.gather(...)

Where _slice is a property returning a type like

@property
def _dataframe(self) -> type[ArrowDataFrame]:
return ArrowDataFrame

Or the property initializes a class like:

@property
def cat(self) -> EagerExprCatNamespace[Self]:
return EagerExprCatNamespace(self)

Benefits

  • All internal code is forced to use more explicit selection methods
  • The method names can be much shorter, since they're specific to the *Slice class
  • There's an easy way to share code between DataFrame, Series (feat: Improve DataFrame.__getitem__ consistency #2393 (comment))
  • Opting-out of slicing can be done in a granular way or full-hog
    • Depending on where you place a not_implemented 😏

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions