feat: Added struct namespace with field method.#2146
feat: Added struct namespace with field method.#2146MarcoGorelli merged 44 commits intonarwhals-dev:mainfrom
struct namespace with field method.#2146Conversation
|
|
||
| def field(self: Self, name: str) -> PandasLikeSeries: | ||
| return self._compliant_series._from_native_series( | ||
| self._compliant_series._native_series.apply(lambda x: x[name]).rename(name), |
There was a problem hiding this comment.
Instead of using apply can we use https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.Series.struct.field.html and only support this for pyarrow-backed dtypes?
There was a problem hiding this comment.
I feel the majority of Pandas users don't leverage pyarrow-backed dtypes. I added a switch to check and use the struct namespace if available and fallbacks on the apply function if it's not the case.
EdAbati
left a comment
There was a problem hiding this comment.
This looks great! Thank you for doing this 🙌🏼
I've added a couple of comments 'on the go' (I'll check the rest later)
| >>> df.with_columns(name=nw.col("user").struct.field("name")) | ||
| """ |
There was a problem hiding this comment.
Output is missing and doctest check would fail without it
There was a problem hiding this comment.
The output has to be manually generated? There is no command that I can use to generate it?
Co-authored-by: Edoardo Abati <29585319+EdAbati@users.noreply.github.com>
…/struct_namespace
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
# Conflicts: # narwhals/_duckdb/expr.py # narwhals/_spark_like/expr.py
|
Are we good to merge? Anything else to add? |
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks @osoucy !
i'll check the evaluate_output_names / alias_output_names more closely, I'm not 100% sure about those
narwhals/_arrow/expr_struct.py
Outdated
| "struct", | ||
| "field", | ||
| name=name, | ||
| evaluate_output_names=lambda _col: [name], |
There was a problem hiding this comment.
For all implementations, instead of overwriting evaluate output names, can we just use .alias?
A good test would be nw.col('a').struct.field('b').name.keep(). I think for polars the resulting column name would be 'a' we should check that we do the same
There was a problem hiding this comment.
It's odd. That's what I tried initially, but it caused some errors when evaluating (mismatching expected and actual something). I must have forgotten something. I removed the changes made to the _from_call and reuse_series_namespace_implementation functions.
narwhals/_arrow/series_struct.py
Outdated
| @@ -17,5 +17,5 @@ def __init__(self: Self, series: ArrowSeries) -> None: | |||
| def field(self: Self, name: str) -> ArrowSeries: | |||
| self._compliant_series._name = name | |||
There was a problem hiding this comment.
you can't mutate self._compliant_series, you'll need alias here too
There was a problem hiding this comment.
For the Arrow* stuff I'd suggest using .compliant and .native
They weren't available when you started the PR @osoucy
There was a problem hiding this comment.
this is fine as a follow-up
There was a problem hiding this comment.
At least the current implementation
def field(self: Self, name: str) -> ArrowSeries:
return self._compliant_series._from_native_series(
pc.struct_field(self._compliant_series.alias(name)._native_series, name),
)
avoids mutating the compliant series. Maybe I can look into @dangotbanned in a future PR?
There was a problem hiding this comment.
So we are good to merge? :)
This option is not available because the fork is created inside of my organization instead of inside my personnal account. I added you as a contributor to the organization. You should be able to push to the branch now. |
narwhals/_pandas_like/series_list.py
Outdated
| class PandasLikeSeriesListNamespace: | ||
| def __init__(self: Self, series: PandasLikeSeries) -> None: | ||
| if not hasattr(series._native_series, "list"): | ||
| msg = "Series must be of PyArrow List type to support struct namespace." |
narwhals/_expression_parsing.py
Outdated
| evaluate_output_names: Output names function. | ||
| alias_output_names: Alias output names function. |
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks @osoucy ! great feature 🙌
|
Awesome! Thanks for the guidance and review! |



What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Introduction of the strut namespace for applicable expressions and series with an initial
fieldfunction that returns a field of a struct.