refactor: Generic CompliantSelector#2064
Conversation
Less ambiguous, thinking `iter_columns` will be a better name to reserve for https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_columns.html
Column names and selecting series is the core part of `selectors`
Will investigate later
Experimenting with `pandas` only to start
All tests are passing locally, hoping to do `_arrow` next and see if this holds up
Well that went smoothly
Possible via #2060
narwhals/_selectors.py
Outdated
| def __init__(self: Self, context: _FullContext, /) -> None: | ||
| self._implementation = context._implementation | ||
| self._backend_version = context._backend_version | ||
| self._version = context._version |
There was a problem hiding this comment.
There was a problem hiding this comment.
This part was resolved in (5a36c81)
Seems 3.8 has a different issue
https://github.com/narwhals-dev/narwhals/actions/runs/13462903656/job/37622190616?pr=2064
There was a problem hiding this comment.
Important
I'm ignoring 3.8 issues for the time being.
Absolute worst-case - I'll just use regular classes.
There was a problem hiding this comment.
This could be the way to go:
Will be attempting something like (python/typing#572 (comment))
There was a problem hiding this comment.
Well it certainly is a solution (#2064 (comment))
😞
Now is always empty #2059
|
@MarcoGorelli I'm looking at the lazy backends atm Are narwhals/narwhals/_dask/selectors.py Lines 41 to 51 in f0a5fbb I'm still getting to know these backends, so maybe this is cheap to do? |
Lazy-support needs to be able to override them
- Working locally - `pyright` is very unhappy with `dx.Series` being used
- Haven't got a local install - Expecting to work the same as `_dask` in CI
| EvalSeries: TypeAlias = Callable[[FrameT], Sequence[SeriesT]] | ||
| EvalNames: TypeAlias = Callable[[FrameT], Sequence[str]] |
There was a problem hiding this comment.
I'd probably move these to nw.typing and add docs to them.
There's lots of other places they could be used to simplify these two types:
narwhals/narwhals/_pandas_like/expr.py
Line 53 in 9387ecd
narwhals/narwhals/_pandas_like/expr.py
Line 57 in 9387ecd
IMO this isn't as noisy:
narwhals/narwhals/_pandas_like/selectors.py
Lines 26 to 27 in ec5dd0c
|
love this, will do a final check later |
| def __repr__(self: Self) -> str: # pragma: no cover | ||
| s = f"depth={self._depth}, " if is_tracks_depth(self._implementation) else "" | ||
| return f"{type(self).__name__}({s}function_name={self._function_name})" |
There was a problem hiding this comment.
I was originally planning to use utils.is_tracks_depth more than just in __repr__
The idea I had was to move most of the implementation of these methods up into the protocols.
You'd only need to declare the types (e.g. ArrowSelector & ArrowExpr):
But I stopped short of that - because it should probably live in *Expr.
You'd have an even greater reduction in repetition, as it would account for all the arg mapping in:
narwhals/narwhals/_arrow/expr.py
Lines 34 to 151 in 140833c
There was a problem hiding this comment.
This PR revealed some patterns that I think could be better encoded into the Protocol definitions:
Single or multi-Implementation
- Single defines on the class
- Multi defines on the instance, and passes it around to keep track
- Currently the
*Like*classes - Although I think that should be revisted
- Currently the
Tracks depth
- As mentioned, we need that for the
daskspecial-case - All other lazy backends don't track depth
(Lazy|Eager)*
- I've made a start with that for
*Selector - Planning the same for
*Exprin
Reuse*
- I mentioned in this PR that I was thinking about renaming to
EagerOnly* - I think it would be worth having the distinction between those cases and
Polars*
There was a problem hiding this comment.
@MarcoGorelli feel free to move this into another issue - just trying to connect the dots between a lot of the stuff I've been working on 🙂
There was a problem hiding this comment.
thanks - i'm not really sure about having _depth long-term, i think we'll need a better way to do this in the future, so i'd suggest not spending too long on that specifically
There was a problem hiding this comment.
thanks @dangotbanned - just a comment about moving a comment from just being the PR to putting in the code (a practice I'd encourage in general), then feel free to merge
- Required some compatibility aliasing for `pandas`, `pyarrow` - They're faux-lazy #2064 (comment)
Thanks for the review @MarcoGorelli |
- Long-term, should probably be defined in `nw.typing` - Or just generally used in parts of each backend impl #2064 (comment)

Tracking
3.8#2084pyspark#2051CompliantDataFrame#2115)CompliantSelector#2064 (comment)CompliantSelector#2064 (comment)CompliantSelector#2064 (comment)CompliantSelector#2064 (comment)CompliantSelector#2064 (comment)DataFrame.iter_columns#2101What type of PR is this? (check all applicable)
Related issues
CompliantExpr._version#2060kwargsfromselectorfunction #2058*.selectors#2057TypeVarused in(SparkLike|DuckDB)Expr#2044Checklist
If you have comments or can explain your changes, please do so below
Expecting quite a nice reduction in copy/pasted code
_arrow_dask_duckdb_pandas_like_spark_likeInvestigatefix3.8protocol bug