feat: Adds DataFrame.iter_columns#2104
Conversation
- Will add examples after tests - Adapted from https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.iter_columns.html Will close #2101
| def test_iter_columns(constructor_eager: ConstructorEager) -> None: | ||
| df = nw.from_native(constructor_eager(data), eager_only=True) | ||
| expected = df.to_dict(as_series=True) | ||
| result = {series.name: series for series in df.iter_columns()} | ||
| assert result == expected |
There was a problem hiding this comment.
I put the test here, following the lead of iter_rows:
narwhals/tests/frame/rows_test.py
Lines 32 to 44 in b362e46
Not sure if we want anything more complex?
Maybe some consideration for a roundtrip through pyarrow and make sure the name is preserved?
Resolves #2104 (comment)
| return self._native_frame.to_dict(orient="records") | ||
|
|
||
| def iter_columns(self) -> Iterator[PandasLikeSeries]: | ||
| for _name, series in self._native_frame.items(): # noqa: PERF102 |
There was a problem hiding this comment.
See (#2064 (comment)) regarding false-positive (PERF102)
| """ | ||
| return self._compliant_frame.rows(named=named) # type: ignore[no-any-return] | ||
|
|
||
| def iter_columns(self: Self) -> Iterator[Series[Any]]: |
There was a problem hiding this comment.
iter_columns and all other methods on DataFrame that include Series[Any] in their return type (e.g get_column) could benefit from a change I have in #2064
Lines 72 to 86 in 1abc05a
On the public side, it would mean adding:
# narwhals.typing.py
SeriesT_co = TypeVar("SeriesT_co", bound="Series[Any]", covariant=True)And making changes like this:
# narwhals.dataframe.py
from narwhals.typing import SeriesT_co
class DataFrame(BaseFrame[DataFrameT], Generic[SeriesT_co]):
def get_column(self: Self, name: str) -> SeriesT_co: ...
def iter_columns(self: Self) -> Iterator[SeriesT_co]: ...
...So then you'd have things like:
DataFrame[pd.DataFrame, pd.Series]
DataFrame[pl.DataFrame, pl.Series]
DataFrame[pa.Table, pa.ChunkedArray]The link between DataFrame and Series is something I've explored some more in #2055 also
There was a problem hiding this comment.
If the v1 backport merges - it might make this process easier.
We wouldn't have the complexity of stable/unstable Series not having a TypeVar free
There was a problem hiding this comment.
I would ❤️ love ❤️ to have this
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks!
I wonder - should we just implement this at the narwhals/dataframe.py level and just return
yield from (self[col] for col in self.columns)?
This is only defined for eager frames anyway so we don't need to worry about invoking self.columns (and Polars does that anyway for this method)
|
ah wait sorry, just remembered that for this feature you worked up to it the other way round (starting at the compliant level, then deciding to expose it) sure, happy to add it like this then 👍 |
- Originally in #2064 - Will eventually support #2104 (comment)
* feat(typing): Make `CompliantDataFrame` generic - Originally in #2064 - Will eventually support #2104 (comment) * chore(typing): Update eager backends * fix: Make `PolarsSeries` compliant - Now for `PolarsDataFrame` to be compliant, `PolarsSeries.alias` needs to be present - Since `PolarsDataFrame` can be returned in all of these places, they all required the update to `_polars` * fix(typing): Resolve new `mypy` errors Originally c11dc95
Resolves: (#2149 (comment)) Doing this properly surfaced lots of issues - `Polars*` classes are *similar* to `Compliant*` - Not close enough to support typing - Lots of stuff is behind `__getattr__` - We don't need `PolarsExpr` to do a lot of the work `CompliantExpr` does - `nw.Series._compliant_series` is still a big PR away - Repeat of (#2119) - Last big hurdle to get (#2104 (comment))

What type of PR is this? (check all applicable)
Related issues
DataFrame.iter_columns#2101Checklist
If you have comments or can explain your changes, please do so below
TODO
CompliantDataFrametyping updateDataFrame.iter_columns#2104 (comment))