From 8186f0ff6d6a23cc00c880aec072db4ac1aee7a8 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sat, 5 Apr 2025 15:25:52 +0100 Subject: [PATCH 01/14] feat: unify exception for nested ``over`` statements --- docs/how_it_works.md | 111 +++++++++++++++++-- narwhals/_expression_parsing.py | 63 +++++++---- narwhals/dataframe.py | 2 +- narwhals/expr.py | 181 ++++++++++++++----------------- narwhals/functions.py | 10 +- tests/expression_parsing_test.py | 45 ++++++-- tests/group_by_test.py | 2 +- 7 files changed, 271 insertions(+), 143 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 64e071fa88..3e2819f631 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -272,7 +272,107 @@ print((pn.col("a") + 1).mean()) For simple aggregations, Narwhals can just look at `_depth` and `function_name` and figure out which (efficient) elementary operation this corresponds to in pandas. -## Broadcasting +## Expression Metadata + +Let's try printing out a few expressions to the console to see what they show us: + +```python exec="1" result="python" session="metadata" source="above" +import narwhals as nw + +print(nw.col("a")) +print(nw.col("a").mean()) +print(nw.col("a").mean().over("b")) +``` + +Note how they tell us something but their metadata. This section is all about +making sense of what that all means, what the rules are, and what it enables. + +### Expression kinds + +Each Narwhals expression can be of one of the following kinds: + +- `LITERAL`: expressions which correspond to literal values, such as the `3` in `nw.col('a')+3`. +- `AGGREGATION`: expressions which reduce a column to a single value (e.g. `nw.col('a').mean()`). +- `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`). +- `WINDOW`: like `TRANSFORM`, but the last operation is a (row-order-dependent) + window function (`rolling_*`, `cum_*`, `diff`, `shift`, `is_*_distinct`). + For example: + + - `nw.col('a')` is not order-dependent, so it's `TRANSFORM`. + - `nw.col('a').abs()` is not order-dependent, so it's a `TRANSFORM`. + - `nw.col('a').cum_sum()`'s last operation is `cum_sum`, so it's `WINDOW`. + - `nw.col('a').cum_sum()+1`'s last operation is `__add__`, and it preserves + the input dataframe's length, so it's a `TRANSFORM`. +- `FILTRATION`: expressions which change length but don't + aggregate (e.g. `nw.col('a').drop_nulls()`). + +How these change depends on the operation. + +#### Chaining + +Say we have `expr.expr_method()`. How does `expr`'s `ExprMetadata` change? +This depends on `expr_method`. + +- Element-wise expressions such `abs`, `alias`, `cast`, `__invert__`, and + many more, preserve the input kind (unless `expr` is a `WINDOW`, in + which case it becomes a `TRANSFORM`. This is because for an expression + to be `WINDOW`, the last expression needs to be the order-dependent one). +- `rolling_*`, `cum_*`, `diff`, `shift`, `ewm_mean`, and `is_*_distinct` + are window functions and result in `WINDOW`. +- `mean`, `std`, `median`, and other aggregations result in `AGGREGATION`, + and can only be applied to `TRANSFORM` and `WINDOW`. +- `drop_nulls` and `filter` result in `FILTRATION`, and can only be applied + to `TRANSFORM` and `WINDOW`. +- `over` always results in `TRANSFORM`. This is a bit more complicated, + so we elaborate on it in the "You open a window" section below. + +#### Binary operations (e.g. `nw.col('a') + nw.col('b')`) + +How do expression kinds change under binary operations? For example, +if we do `expr1 + expr2`, then what can we say about the output kind? +The rules are: + +- If both are `LITERAL`, then the output is `LITERAL`. +- If one is a `FILTRATION`, then: + + - if the other is `LITERAL` or `AGGREGATION`, then the output is `FILTRATION`. + - else, we raise an error. +- If one is `TRANSFORM` or `WINDOW` and the other is not `FILTRATION`, + then the output is `TRANSFORM`. +- If one is `AGGREGATION` and the other is `LITERAL` or `AGGREGATION`, + the output is `AGGREGATION`. + +For n-ary operations such as `nw.sum_horizontal`, the above logic is +extended across inputs. For example, `nw.sum_horizontal(expr1, expr2, expr3)` +is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are. + +### "You open a window to another window to another window to another window" + +When we print out an expression, in addition to the expression kind, +we also see `n_opened_windows` and `n_closed_windows`. + +An open window is a non-elementwise operation which depends on row order. +For example, if we have `nw.col('a').cum_sum()`, then in which order +should the cumulative sum be computed? It's not specified. By introducing +an operation like this one, we say that we've opened a window. An open +window can only be closed if it's followed by `over` with `order_by` +specified. + +A closed window is a non-elementwise operation which doesn't depend on row +order. For example: + +- `nw.col('a').mean().over('b')`. +- `nw.col('a').cum_sum().over(order_by='i')`. + +When working with `DataFrame`s, row order is well-defined, as the dataframes +are assumed to be eager and in-memory. Therefore, it's allowed to work +with open windows. + +When working with `LazyFrame`s, on the other hand, row order is undefined. +Therefore, it's only allowed to work with expressions where `n_opened_windows` +is exactly equal to `n_closed_windows`. + +### Broadcasting When performing comparisons between columns and aggregations or scalars, we operate as if the aggregation or scalar was broadcasted to the length of the whole column. For example, if we @@ -282,14 +382,7 @@ with values `[-1, 0, 1]`. Different libraries do broadcasting differently. SQL-like libraries require an empty window function for expressions (e.g. `a - sum(a) over ()`), Polars does its own broadcasting of -length-1 Series, and pandas does its own broadcasting of scalars. Narwhals keeps track of -when to trigger a broadcast by tracking the `ExprKind` of each expression. `ExprKind` is an -`Enum` with four variants: - -- `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`). -- `AGGREGATION`: expressions which reduce a column to a single value (e.g. `nw.col('a').mean()`). -- `CHANGE_LENGTH`: expressions which change length but don't necessarily aggregate (e.g. `nw.col('a').drop_nulls()`). -- `LITERAL`: expressions which correspond to literal values, such as the `3` in `nw.col('a')+3`. +length-1 Series, and pandas does its own broadcasting of scalars. Narwhals triggers a broadcast in these situations: diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index f82ef55839..192811108f 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -121,7 +121,7 @@ class ExprKind(Enum): - LITERAL vs LITERAL -> LITERAL - FILTRATION vs (LITERAL | AGGREGATION) -> FILTRATION - FILTRATION vs (FILTRATION | TRANSFORM | WINDOW) -> raise - - (TRANSFORM | WINDOW) vs (LITERAL | AGGREGATION) -> TRANSFORM + - (TRANSFORM | WINDOW) vs (anything) -> TRANSFORM - AGGREGATION vs (LITERAL | AGGREGATION) -> AGGREGATION """ @@ -192,13 +192,20 @@ def is_multi_output( class ExprMetadata: - __slots__ = ("_expansion_kind", "_kind", "_n_open_windows") + __slots__ = ("_expansion_kind", "_kind", "_n_closed_windows", "_n_opened_windows") def __init__( - self, kind: ExprKind, /, *, n_open_windows: int, expansion_kind: ExpansionKind + self, + kind: ExprKind, + /, + *, + n_opened_windows: int, + n_closed_windows: int, + expansion_kind: ExpansionKind, ) -> None: self._kind: ExprKind = kind - self._n_open_windows = n_open_windows + self._n_opened_windows = n_opened_windows + self._n_closed_windows = n_closed_windows self._expansion_kind = expansion_kind def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no cover @@ -206,15 +213,19 @@ def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no c raise TypeError(msg) def __repr__(self) -> str: - return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, expansion_kind: {self._expansion_kind})" + return f"ExprMetadata(kind: {self._kind}, n_opened_windows: {self._n_opened_windows}, n_closed_windows: {self._n_closed_windows}, expansion_kind: {self._expansion_kind})" @property def kind(self) -> ExprKind: return self._kind @property - def n_open_windows(self) -> int: - return self._n_open_windows + def n_opened_windows(self) -> int: + return self._n_opened_windows + + @property + def n_closed_windows(self) -> int: + return self._n_closed_windows @property def expansion_kind(self) -> ExpansionKind: @@ -223,22 +234,27 @@ def expansion_kind(self) -> ExpansionKind: def with_kind(self, kind: ExprKind, /) -> ExprMetadata: """Change metadata kind, leaving all other attributes the same.""" return ExprMetadata( - kind, n_open_windows=self._n_open_windows, expansion_kind=self._expansion_kind + kind, + n_opened_windows=self._n_opened_windows, + n_closed_windows=self._n_closed_windows, + expansion_kind=self._expansion_kind, ) def with_extra_open_window(self) -> ExprMetadata: - """Increment `n_open_windows` leaving other attributes the same.""" + """Increment `n_opened_windows` leaving other attributes the same.""" return ExprMetadata( self.kind, - n_open_windows=self._n_open_windows + 1, + n_opened_windows=self._n_opened_windows + 1, expansion_kind=self._expansion_kind, + n_closed_windows=self._n_closed_windows, ) def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: - """Change metadata kind and increment `n_open_windows`.""" + """Change metadata kind and increment `n_opened_windows`.""" return ExprMetadata( kind, - n_open_windows=self._n_open_windows + 1, + n_opened_windows=self._n_opened_windows + 1, + n_closed_windows=self._n_closed_windows, expansion_kind=self._expansion_kind, ) @@ -246,14 +262,20 @@ def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: def simple_selector() -> ExprMetadata: # e.g. `nw.col('a')`, `nw.nth(0)` return ExprMetadata( - ExprKind.TRANSFORM, n_open_windows=0, expansion_kind=ExpansionKind.SINGLE + ExprKind.TRANSFORM, + n_opened_windows=0, + n_closed_windows=0, + expansion_kind=ExpansionKind.SINGLE, ) @staticmethod def multi_output_selector_named() -> ExprMetadata: # e.g. `nw.col('a', 'b')` return ExprMetadata( - ExprKind.TRANSFORM, n_open_windows=0, expansion_kind=ExpansionKind.MULTI_NAMED + ExprKind.TRANSFORM, + n_opened_windows=0, + n_closed_windows=0, + expansion_kind=ExpansionKind.MULTI_NAMED, ) @staticmethod @@ -261,7 +283,8 @@ def multi_output_selector_unnamed() -> ExprMetadata: # e.g. `nw.all()` return ExprMetadata( ExprKind.TRANSFORM, - n_open_windows=0, + n_opened_windows=0, + n_closed_windows=0, expansion_kind=ExpansionKind.MULTI_UNNAMED, ) @@ -285,7 +308,8 @@ def combine_metadata( has_transforms_or_windows = False has_aggregations = False has_literals = False - result_n_open_windows = 0 + result_n_opened_windows = 0 + result_n_closed_windows = 0 result_expansion_kind = ExpansionKind.SINGLE for i, arg in enumerate(args): @@ -307,8 +331,8 @@ def combine_metadata( result_expansion_kind = resolve_expansion_kind( result_expansion_kind, arg._metadata.expansion_kind ) - if arg._metadata.n_open_windows: - result_n_open_windows += 1 + result_n_opened_windows += arg._metadata.n_opened_windows + result_n_closed_windows += arg._metadata.n_closed_windows kind = arg._metadata.kind if kind is ExprKind.AGGREGATION: has_aggregations = True @@ -344,7 +368,8 @@ def combine_metadata( return ExprMetadata( result_kind, - n_open_windows=result_n_open_windows, + n_opened_windows=result_n_opened_windows, + n_closed_windows=result_n_closed_windows, expansion_kind=result_expansion_kind, ) diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py index f7574a95f6..be1bf9354b 100644 --- a/narwhals/dataframe.py +++ b/narwhals/dataframe.py @@ -2150,7 +2150,7 @@ def _extract_compliant(self: Self, arg: Any) -> Any: plx = self.__narwhals_namespace__() return plx.col(arg) if isinstance(arg, Expr): - if arg._metadata.n_open_windows > 0: + if arg._metadata.n_opened_windows > arg._metadata._n_closed_windows: msg = ( "Order-dependent expressions are not supported for use in LazyFrame.\n\n" "Hints:\n" diff --git a/narwhals/expr.py b/narwhals/expr.py index 1e15fe4924..74384d8289 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -62,18 +62,49 @@ def func(plx: CompliantNamespace[Any, Any]) -> CompliantExpr[Any, Any]: self._metadata = metadata def _with_callable(self, to_compliant_expr: Callable[[Any], Any]) -> Self: - # Instantiate new Expr keeping metadata unchanged. + # Instantiate new Expr keeping metadata unchanged, unless + # it's a WINDOW, in which case make it a TRANSFORM. + if self._metadata.kind.is_window(): + return self.__class__( + to_compliant_expr, self._metadata.with_kind(ExprKind.TRANSFORM) + ) return self.__class__(to_compliant_expr, self._metadata) + def _with_aggregation(self, to_compliant_expr: Callable[[Any], Any]) -> Self: + if self._metadata.kind.is_scalar_like(): + msg = "Aggregations can't be applied to scalar-like expressions." + raise InvalidOperationError(msg) + return self.__class__( + to_compliant_expr, self._metadata.with_kind(ExprKind.AGGREGATION) + ) + + def _with_order_dependent_aggregation( + self, to_compliant_expr: Callable[[Any], Any] + ) -> Self: + if self._metadata.kind.is_scalar_like(): + msg = "Aggregations can't be applied to scalar-like expressions." + raise InvalidOperationError(msg) + return self.__class__( + to_compliant_expr, + self._metadata.with_kind_and_extra_open_window(ExprKind.AGGREGATION), + ) + + def _with_filtration(self, to_compliant_expr: Callable[[Any], Any]) -> Self: + if self._metadata.kind.is_scalar_like(): + msg = "Length-changing can't be applied to scalar-like expressions." + raise InvalidOperationError(msg) + return self.__class__( + to_compliant_expr, self._metadata.with_kind(ExprKind.FILTRATION) + ) + def __repr__(self: Self) -> str: return f"Narwhals Expr\nmetadata: {self._metadata}\n" def _taxicab_norm(self: Self) -> Self: # This is just used to test out the stable api feature in a realistic-ish way. # It's not intended to be used. - return self.__class__( - lambda plx: self._to_compliant_expr(plx).abs().sum(), - self._metadata.with_kind(ExprKind.AGGREGATION), + return self._with_aggregation( + lambda plx: self._to_compliant_expr(plx).abs().sum() ) # --- convert --- @@ -379,10 +410,7 @@ def any(self: Self) -> Self: | 0 True True | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).any(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).any()) def all(self: Self) -> Self: """Return whether all values in the column are `True`. @@ -403,10 +431,7 @@ def all(self: Self) -> Self: | 0 False True | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).all(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).all()) def ewm_mean( self: Self, @@ -529,10 +554,7 @@ def mean(self: Self) -> Self: | 0 0.0 4.0 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).mean(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).mean()) def median(self: Self) -> Self: """Get median value. @@ -556,10 +578,7 @@ def median(self: Self) -> Self: | 0 3.0 4.0 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).median(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).median()) def std(self: Self, *, ddof: int = 1) -> Self: """Get standard deviation. @@ -584,9 +603,8 @@ def std(self: Self, *, ddof: int = 1) -> Self: |0 17.79513 1.265789| └─────────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).std(ddof=ddof), - self._metadata.with_kind(ExprKind.AGGREGATION), + return self._with_aggregation( + lambda plx: self._to_compliant_expr(plx).std(ddof=ddof) ) def var(self: Self, *, ddof: int = 1) -> Self: @@ -612,9 +630,8 @@ def var(self: Self, *, ddof: int = 1) -> Self: |0 316.666667 1.602222| └───────────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).var(ddof=ddof), - self._metadata.with_kind(ExprKind.AGGREGATION), + return self._with_aggregation( + lambda plx: self._to_compliant_expr(plx).var(ddof=ddof) ) def map_batches( @@ -683,10 +700,7 @@ def skew(self: Self) -> Self: | 0 0.0 1.472427 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).skew(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).skew()) def sum(self: Self) -> Expr: """Return the sum value. @@ -711,10 +725,7 @@ def sum(self: Self) -> Expr: |└────────┴────────┘| └───────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).sum(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).sum()) def min(self: Self) -> Self: """Returns the minimum value(s) from a column(s). @@ -735,10 +746,7 @@ def min(self: Self) -> Self: | 0 1 3 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).min(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).min()) def max(self: Self) -> Self: """Returns the maximum value(s) from a column(s). @@ -759,10 +767,7 @@ def max(self: Self) -> Self: | 0 20 100 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).max(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).max()) def arg_min(self: Self) -> Self: """Returns the index of the minimum value. @@ -783,9 +788,8 @@ def arg_min(self: Self) -> Self: |0 0 1| └───────────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).arg_min(), - self._metadata.with_kind_and_extra_open_window(ExprKind.AGGREGATION), + return self._with_order_dependent_aggregation( + lambda plx: self._to_compliant_expr(plx).arg_min() ) def arg_max(self: Self) -> Self: @@ -807,9 +811,8 @@ def arg_max(self: Self) -> Self: |0 1 0| └───────────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).arg_max(), - self._metadata.with_kind_and_extra_open_window(ExprKind.AGGREGATION), + return self._with_order_dependent_aggregation( + lambda plx: self._to_compliant_expr(plx).arg_max() ) def count(self: Self) -> Self: @@ -831,10 +834,7 @@ def count(self: Self) -> Self: | 0 3 2 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).count(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).count()) def n_unique(self: Self) -> Self: """Returns count of unique values. @@ -855,10 +855,7 @@ def n_unique(self: Self) -> Self: | 0 5 3 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).n_unique(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).n_unique()) def unique(self: Self) -> Self: """Return unique values of this expression. @@ -879,10 +876,7 @@ def unique(self: Self) -> Self: | 0 9 12 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).unique(), - self._metadata.with_kind(ExprKind.FILTRATION), - ) + return self._with_filtration(lambda plx: self._to_compliant_expr(plx).unique()) def abs(self: Self) -> Self: """Return absolute value of each element. @@ -1355,10 +1349,7 @@ def arg_true(self: Self) -> Self: "See https://narwhals-dev.github.io/narwhals/backcompat/ for more information.\n" ) issue_deprecation_warning(msg, _version="1.23.0") - return self.__class__( - lambda plx: self._to_compliant_expr(plx).arg_true(), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), - ) + return self._with_filtration(lambda plx: self._to_compliant_expr(plx).arg_true()) def fill_null( self: Self, @@ -1488,9 +1479,8 @@ def drop_nulls(self: Self) -> Self: | └─────┘ | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).drop_nulls(), - self._metadata.with_kind(ExprKind.FILTRATION), + return self._with_filtration( + lambda plx: self._to_compliant_expr(plx).drop_nulls() ) def sample( @@ -1527,11 +1517,10 @@ def sample( "See https://narwhals-dev.github.io/narwhals/backcompat/ for more information.\n" ) issue_deprecation_warning(msg, _version="1.23.0") - return self.__class__( + return self._with_filtration( lambda plx: self._to_compliant_expr(plx).sample( n, fraction=fraction, with_replacement=with_replacement, seed=seed - ), - self._metadata.with_kind(ExprKind.FILTRATION), + ) ) def over( @@ -1591,16 +1580,25 @@ def over( raise ValueError(msg) kind = ExprKind.TRANSFORM - n_open_windows = self._metadata.n_open_windows + n_opened_windows = self._metadata.n_opened_windows + n_closed_windows = self._metadata.n_closed_windows + if n_closed_windows: + msg = "Nested `over` statements are not allowed." + raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): - n_open_windows -= 1 - elif flat_order_by is not None and not n_open_windows: + assert n_opened_windows # noqa: S101 + n_closed_windows += 1 + elif flat_order_by is not None and not n_opened_windows: msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) + else: + n_opened_windows += 1 + n_closed_windows += 1 current_meta = self._metadata next_meta = ExprMetadata( kind, - n_open_windows=n_open_windows, + n_opened_windows=n_opened_windows, + n_closed_windows=n_closed_windows, expansion_kind=current_meta.expansion_kind, ) @@ -1685,9 +1683,8 @@ def null_count(self: Self) -> Self: | 0 1 2 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).null_count(), - self._metadata.with_kind(ExprKind.AGGREGATION), + return self._with_aggregation( + lambda plx: self._to_compliant_expr(plx).null_count() ) def is_first_distinct(self: Self) -> Self: @@ -1792,9 +1789,8 @@ def quantile( | 0 24.5 74.5 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).quantile(quantile, interpolation), - self._metadata.with_kind(ExprKind.AGGREGATION), + return self._with_aggregation( + lambda plx: self._to_compliant_expr(plx).quantile(quantile, interpolation) ) def head(self: Self, n: int = 10) -> Self: @@ -1820,10 +1816,7 @@ def head(self: Self, n: int = 10) -> Self: "See https://narwhals-dev.github.io/narwhals/backcompat/ for more information.\n" ) issue_deprecation_warning(msg, _version="1.23.0") - return self.__class__( - lambda plx: self._to_compliant_expr(plx).head(n), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), - ) + return self._with_filtration(lambda plx: self._to_compliant_expr(plx).head(n)) def tail(self: Self, n: int = 10) -> Self: r"""Get the last `n` rows. @@ -1848,10 +1841,7 @@ def tail(self: Self, n: int = 10) -> Self: "See https://narwhals-dev.github.io/narwhals/backcompat/ for more information.\n" ) issue_deprecation_warning(msg, _version="1.23.0") - return self.__class__( - lambda plx: self._to_compliant_expr(plx).tail(n), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), - ) + return self._with_filtration(lambda plx: self._to_compliant_expr(plx).tail(n)) def round(self: Self, decimals: int = 0) -> Self: r"""Round underlying floating point data by `decimals` digits. @@ -1914,10 +1904,7 @@ def len(self: Self) -> Self: | 0 2 1 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).len(), - self._metadata.with_kind(ExprKind.AGGREGATION), - ) + return self._with_aggregation(lambda plx: self._to_compliant_expr(plx).len()) def gather_every(self: Self, n: int, offset: int = 0) -> Self: r"""Take every nth value in the Series and return as new Series. @@ -1943,9 +1930,8 @@ def gather_every(self: Self, n: int, offset: int = 0) -> Self: "See https://narwhals-dev.github.io/narwhals/backcompat/ for more information.\n" ) issue_deprecation_warning(msg, _version="1.23.0") - return self.__class__( - lambda plx: self._to_compliant_expr(plx).gather_every(n=n, offset=offset), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + return self._with_filtration( + lambda plx: self._to_compliant_expr(plx).gather_every(n=n, offset=offset) ) # need to allow numeric typing @@ -2022,10 +2008,7 @@ def mode(self: Self) -> Self: | 0 1 | └──────────────────┘ """ - return self.__class__( - lambda plx: self._to_compliant_expr(plx).mode(), - self._metadata.with_kind(ExprKind.FILTRATION), - ) + return self._with_filtration(lambda plx: self._to_compliant_expr(plx).mode()) def is_finite(self: Self) -> Self: """Returns boolean values indicating which original values are finite. diff --git a/narwhals/functions.py b/narwhals/functions.py index b5f10927b6..deb8581994 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -1183,7 +1183,10 @@ def func(plx: Any) -> Any: return Expr( func, ExprMetadata( - ExprKind.AGGREGATION, n_open_windows=0, expansion_kind=ExpansionKind.SINGLE + ExprKind.AGGREGATION, + n_opened_windows=0, + n_closed_windows=0, + expansion_kind=ExpansionKind.SINGLE, ), ) @@ -1655,7 +1658,10 @@ def lit(value: Any, dtype: DType | type[DType] | None = None) -> Expr: return Expr( lambda plx: plx.lit(value, dtype), ExprMetadata( - ExprKind.LITERAL, n_open_windows=0, expansion_kind=ExpansionKind.SINGLE + ExprKind.LITERAL, + n_opened_windows=0, + n_closed_windows=0, + expansion_kind=ExpansionKind.SINGLE, ), ) diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index 0f7e30674d..4608f4e93e 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -7,21 +7,25 @@ @pytest.mark.parametrize( - ("expr", "expected"), + ("expr", "expected_open", "expected_closed"), [ - (nw.col("a"), 0), - (nw.col("a").mean(), 0), - (nw.col("a").cum_sum(), 1), - (nw.col("a").cum_sum().over(order_by="id"), 0), - ((nw.col("a").cum_sum() + 1).over(order_by="id"), 1), - (nw.col("a").cum_sum().cum_sum().over(order_by="id"), 1), - (nw.col("a").cum_sum().cum_sum(), 2), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), 1), + (nw.col("a"), 0, 0), + (nw.col("a").mean(), 0, 0), + (nw.col("a").cum_sum(), 1, 0), + (nw.col("a").cum_sum().over(order_by="id"), 1, 1), + (nw.col("a").cum_sum().abs().over(order_by="id"), 2, 1), + ((nw.col("a").cum_sum() + 1).over(order_by="id"), 2, 1), + (nw.col("a").cum_sum().cum_sum().over(order_by="id"), 2, 1), + (nw.col("a").cum_sum().cum_sum(), 2, 0), + (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1, 0), + (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), 2, 1), ], ) -def test_has_open_windows(expr: nw.Expr, expected: int) -> None: - assert expr._metadata.n_open_windows == expected +def test_has_open_windows( + expr: nw.Expr, expected_open: int, expected_closed: int +) -> None: + assert expr._metadata.n_opened_windows == expected_open + assert expr._metadata.n_closed_windows == expected_closed def test_misleading_order_by() -> None: @@ -29,3 +33,20 @@ def test_misleading_order_by() -> None: nw.col("a").mean().over(order_by="b") with pytest.raises(InvalidOperationError): nw.col("a").rank().over(order_by="b") + + +def test_double_over() -> None: + with pytest.raises(InvalidOperationError): + nw.col("a").mean().over("b").over("c") + + +def test_double_agg() -> None: + with pytest.raises(InvalidOperationError): + nw.col("a").mean().mean() + with pytest.raises(InvalidOperationError): + nw.col("a").mean().arg_max() + + +def test_filter_aggregation() -> None: + with pytest.raises(InvalidOperationError): + nw.col("a").mean().drop_nulls() diff --git a/tests/group_by_test.py b/tests/group_by_test.py index d73a77dd17..27dde67ae0 100644 --- a/tests/group_by_test.py +++ b/tests/group_by_test.py @@ -55,7 +55,7 @@ def test_invalid_group_by_dask() -> None: df_dask = dd.from_pandas(df_pandas) with pytest.raises(ValueError, match=r"Non-trivial complex aggregation found"): - nw.from_native(df_dask).group_by("a").agg(nw.col("b").mean().min()) + nw.from_native(df_dask).group_by("a").agg(nw.col("b").abs().min()) def test_group_by_iter(constructor_eager: ConstructorEager) -> None: From ac74e15fe1d6583653feca100e874bdf70577f03 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 10:09:42 +0100 Subject: [PATCH 02/14] rename exprmetadata selector staticmethods --- narwhals/_expression_parsing.py | 6 +++--- narwhals/functions.py | 12 ++++++------ narwhals/selectors.py | 16 ++++++++-------- utils/check_api_reference.py | 8 ++++---- 4 files changed, 21 insertions(+), 21 deletions(-) diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 192811108f..0f5e71c58e 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -259,7 +259,7 @@ def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: ) @staticmethod - def simple_selector() -> ExprMetadata: + def selector_simple() -> ExprMetadata: # e.g. `nw.col('a')`, `nw.nth(0)` return ExprMetadata( ExprKind.TRANSFORM, @@ -269,7 +269,7 @@ def simple_selector() -> ExprMetadata: ) @staticmethod - def multi_output_selector_named() -> ExprMetadata: + def selector_multi_named() -> ExprMetadata: # e.g. `nw.col('a', 'b')` return ExprMetadata( ExprKind.TRANSFORM, @@ -279,7 +279,7 @@ def multi_output_selector_named() -> ExprMetadata: ) @staticmethod - def multi_output_selector_unnamed() -> ExprMetadata: + def selector_multi_unnamed() -> ExprMetadata: # e.g. `nw.all()` return ExprMetadata( ExprKind.TRANSFORM, diff --git a/narwhals/functions.py b/narwhals/functions.py index deb8581994..b36236e7b2 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -1040,9 +1040,9 @@ def func(plx: Any) -> Any: return Expr( func, - ExprMetadata.simple_selector() + ExprMetadata.selector_simple() if len(flat_names) == 1 - else ExprMetadata.multi_output_selector_named(), + else ExprMetadata.selector_multi_named(), ) @@ -1080,7 +1080,7 @@ def exclude(*names: str | Iterable[str]) -> Expr: def func(plx: Any) -> Any: return plx.exclude(exclude_names) - return Expr(func, ExprMetadata.multi_output_selector_unnamed()) + return Expr(func, ExprMetadata.selector_multi_unnamed()) def nth(*indices: int | Sequence[int]) -> Expr: @@ -1120,9 +1120,9 @@ def func(plx: Any) -> Any: return Expr( func, - ExprMetadata.simple_selector() + ExprMetadata.selector_simple() if len(flat_indices) == 1 - else ExprMetadata.multi_output_selector_unnamed(), + else ExprMetadata.selector_multi_unnamed(), ) @@ -1147,7 +1147,7 @@ def all_() -> Expr: | 1 4 0.246 | └──────────────────┘ """ - return Expr(lambda plx: plx.all(), ExprMetadata.multi_output_selector_unnamed()) + return Expr(lambda plx: plx.all(), ExprMetadata.selector_multi_unnamed()) # Add underscore so it doesn't conflict with builtin `len` diff --git a/narwhals/selectors.py b/narwhals/selectors.py index d8603fe3c7..020e58fdbe 100644 --- a/narwhals/selectors.py +++ b/narwhals/selectors.py @@ -96,7 +96,7 @@ def by_dtype(*dtypes: DType | type[DType] | Iterable[DType | type[DType]]) -> Se flattened = flatten(dtypes) return Selector( lambda plx: plx.selectors.by_dtype(flattened), - ExprMetadata.multi_output_selector_unnamed(), + ExprMetadata.selector_multi_unnamed(), ) @@ -131,7 +131,7 @@ def matches(pattern: str) -> Selector: """ return Selector( lambda plx: plx.selectors.matches(pattern), - ExprMetadata.multi_output_selector_unnamed(), + ExprMetadata.selector_multi_unnamed(), ) @@ -162,7 +162,7 @@ def numeric() -> Selector: └─────┴─────┘ """ return Selector( - lambda plx: plx.selectors.numeric(), ExprMetadata.multi_output_selector_unnamed() + lambda plx: plx.selectors.numeric(), ExprMetadata.selector_multi_unnamed() ) @@ -197,7 +197,7 @@ def boolean() -> Selector: └──────────────────┘ """ return Selector( - lambda plx: plx.selectors.boolean(), ExprMetadata.multi_output_selector_unnamed() + lambda plx: plx.selectors.boolean(), ExprMetadata.selector_multi_unnamed() ) @@ -228,7 +228,7 @@ def string() -> Selector: └─────┘ """ return Selector( - lambda plx: plx.selectors.string(), ExprMetadata.multi_output_selector_unnamed() + lambda plx: plx.selectors.string(), ExprMetadata.selector_multi_unnamed() ) @@ -262,7 +262,7 @@ def categorical() -> Selector: """ return Selector( lambda plx: plx.selectors.categorical(), - ExprMetadata.multi_output_selector_unnamed(), + ExprMetadata.selector_multi_unnamed(), ) @@ -287,7 +287,7 @@ def all() -> Selector: 1 2 y True """ return Selector( - lambda plx: plx.selectors.all(), ExprMetadata.multi_output_selector_unnamed() + lambda plx: plx.selectors.all(), ExprMetadata.selector_multi_unnamed() ) @@ -348,7 +348,7 @@ def datetime( """ return Selector( lambda plx: plx.selectors.datetime(time_unit=time_unit, time_zone=time_zone), - ExprMetadata.multi_output_selector_unnamed(), + ExprMetadata.selector_multi_unnamed(), ) diff --git a/utils/check_api_reference.py b/utils/check_api_reference.py index 3a9b9624a2..a9bb2cf1cf 100644 --- a/utils/check_api_reference.py +++ b/utils/check_api_reference.py @@ -163,7 +163,7 @@ # Expr methods expr_methods = [ i - for i in nw.Expr(lambda: 0, ExprMetadata.simple_selector()).__dir__() + for i in nw.Expr(lambda: 0, ExprMetadata.selector_simple()).__dir__() if not i[0].isupper() and i[0] != "_" ] with open("docs/api-reference/expr.md") as fd: @@ -187,7 +187,7 @@ expr_methods = [ i for i in getattr( - nw.Expr(lambda: 0, ExprMetadata.simple_selector()), + nw.Expr(lambda: 0, ExprMetadata.selector_simple()), namespace, ).__dir__() if not i[0].isupper() and i[0] != "_" @@ -231,7 +231,7 @@ # Check Expr vs Series expr = [ i - for i in nw.Expr(lambda: 0, ExprMetadata.simple_selector()).__dir__() + for i in nw.Expr(lambda: 0, ExprMetadata.selector_simple()).__dir__() if not i[0].isupper() and i[0] != "_" ] series = [ @@ -253,7 +253,7 @@ expr_internal = [ i for i in getattr( - nw.Expr(lambda: 0, ExprMetadata.simple_selector()), + nw.Expr(lambda: 0, ExprMetadata.selector_simple()), namespace, ).__dir__() if not i[0].isupper() and i[0] != "_" From f1bcaa461911d729496b5429c1fcb3f1dc5dc511 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 11:20:15 +0100 Subject: [PATCH 03/14] docs fixups --- docs/how_it_works.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 3e2819f631..819032d0fd 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -296,15 +296,16 @@ Each Narwhals expression can be of one of the following kinds: - `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`). - `WINDOW`: like `TRANSFORM`, but the last operation is a (row-order-dependent) window function (`rolling_*`, `cum_*`, `diff`, `shift`, `is_*_distinct`). - For example: + aggregate (e.g. `nw.col('a').drop_nulls()`). +- `FILTRATION`: expressions which change length but don't + +For example: - `nw.col('a')` is not order-dependent, so it's `TRANSFORM`. - `nw.col('a').abs()` is not order-dependent, so it's a `TRANSFORM`. - `nw.col('a').cum_sum()`'s last operation is `cum_sum`, so it's `WINDOW`. - `nw.col('a').cum_sum()+1`'s last operation is `__add__`, and it preserves the input dataframe's length, so it's a `TRANSFORM`. -- `FILTRATION`: expressions which change length but don't - aggregate (e.g. `nw.col('a').drop_nulls()`). How these change depends on the operation. @@ -335,8 +336,9 @@ The rules are: - If both are `LITERAL`, then the output is `LITERAL`. - If one is a `FILTRATION`, then: - - if the other is `LITERAL` or `AGGREGATION`, then the output is `FILTRATION`. - - else, we raise an error. + - if the other is `LITERAL` or `AGGREGATION`, then the output is `FILTRATION`. + - else, we raise an error. + - If one is `TRANSFORM` or `WINDOW` and the other is not `FILTRATION`, then the output is `TRANSFORM`. - If one is `AGGREGATION` and the other is `LITERAL` or `AGGREGATION`, From 5b05a9a7ae1ff7296bb1da214ab65b9f48bcf7c5 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 11:21:40 +0100 Subject: [PATCH 04/14] simple -> single --- narwhals/_expression_parsing.py | 2 +- narwhals/functions.py | 4 ++-- utils/check_api_reference.py | 8 ++++---- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 0f5e71c58e..3410763a9a 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -259,7 +259,7 @@ def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: ) @staticmethod - def selector_simple() -> ExprMetadata: + def selector_single() -> ExprMetadata: # e.g. `nw.col('a')`, `nw.nth(0)` return ExprMetadata( ExprKind.TRANSFORM, diff --git a/narwhals/functions.py b/narwhals/functions.py index b36236e7b2..8daa8e0cf1 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -1040,7 +1040,7 @@ def func(plx: Any) -> Any: return Expr( func, - ExprMetadata.selector_simple() + ExprMetadata.selector_single() if len(flat_names) == 1 else ExprMetadata.selector_multi_named(), ) @@ -1120,7 +1120,7 @@ def func(plx: Any) -> Any: return Expr( func, - ExprMetadata.selector_simple() + ExprMetadata.selector_single() if len(flat_indices) == 1 else ExprMetadata.selector_multi_unnamed(), ) diff --git a/utils/check_api_reference.py b/utils/check_api_reference.py index a9bb2cf1cf..d8e2e1c406 100644 --- a/utils/check_api_reference.py +++ b/utils/check_api_reference.py @@ -163,7 +163,7 @@ # Expr methods expr_methods = [ i - for i in nw.Expr(lambda: 0, ExprMetadata.selector_simple()).__dir__() + for i in nw.Expr(lambda: 0, ExprMetadata.selector_single()).__dir__() if not i[0].isupper() and i[0] != "_" ] with open("docs/api-reference/expr.md") as fd: @@ -187,7 +187,7 @@ expr_methods = [ i for i in getattr( - nw.Expr(lambda: 0, ExprMetadata.selector_simple()), + nw.Expr(lambda: 0, ExprMetadata.selector_single()), namespace, ).__dir__() if not i[0].isupper() and i[0] != "_" @@ -231,7 +231,7 @@ # Check Expr vs Series expr = [ i - for i in nw.Expr(lambda: 0, ExprMetadata.selector_simple()).__dir__() + for i in nw.Expr(lambda: 0, ExprMetadata.selector_single()).__dir__() if not i[0].isupper() and i[0] != "_" ] series = [ @@ -253,7 +253,7 @@ expr_internal = [ i for i in getattr( - nw.Expr(lambda: 0, ExprMetadata.selector_simple()), + nw.Expr(lambda: 0, ExprMetadata.selector_single()), namespace, ).__dir__() if not i[0].isupper() and i[0] != "_" From 902a5908bbb1a5a3306c51371d68bb5366eba2d1 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 20:11:03 +0100 Subject: [PATCH 05/14] n_closed_windows => has_windows --- docs/how_it_works.md | 6 ++-- narwhals/_expression_parsing.py | 60 ++++++++++++++++---------------- narwhals/dataframe.py | 2 +- narwhals/expr.py | 23 ++++++------ narwhals/functions.py | 8 ++--- tests/expression_parsing_test.py | 4 +-- 6 files changed, 52 insertions(+), 51 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 819032d0fd..fc31872550 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -351,7 +351,7 @@ is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are. ### "You open a window to another window to another window to another window" When we print out an expression, in addition to the expression kind, -we also see `n_opened_windows` and `n_closed_windows`. +we also see `n_open_windows` and `has_windows`. An open window is a non-elementwise operation which depends on row order. For example, if we have `nw.col('a').cum_sum()`, then in which order @@ -371,8 +371,8 @@ are assumed to be eager and in-memory. Therefore, it's allowed to work with open windows. When working with `LazyFrame`s, on the other hand, row order is undefined. -Therefore, it's only allowed to work with expressions where `n_opened_windows` -is exactly equal to `n_closed_windows`. +Therefore, it's only allowed to work with expressions where `n_open_windows` +is exactly equal to `has_windows`. ### Broadcasting diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 3410763a9a..913b5f2a0d 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -192,20 +192,20 @@ def is_multi_output( class ExprMetadata: - __slots__ = ("_expansion_kind", "_kind", "_n_closed_windows", "_n_opened_windows") + __slots__ = ("_expansion_kind", "_has_windows", "_kind", "_n_open_windows") def __init__( self, kind: ExprKind, /, *, - n_opened_windows: int, - n_closed_windows: int, + n_open_windows: int, + has_windows: bool, expansion_kind: ExpansionKind, ) -> None: self._kind: ExprKind = kind - self._n_opened_windows = n_opened_windows - self._n_closed_windows = n_closed_windows + self._n_open_windows = n_open_windows + self._has_windows = has_windows self._expansion_kind = expansion_kind def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no cover @@ -213,19 +213,19 @@ def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no c raise TypeError(msg) def __repr__(self) -> str: - return f"ExprMetadata(kind: {self._kind}, n_opened_windows: {self._n_opened_windows}, n_closed_windows: {self._n_closed_windows}, expansion_kind: {self._expansion_kind})" + return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, has_windows: {self._has_windows}, expansion_kind: {self._expansion_kind})" @property def kind(self) -> ExprKind: return self._kind @property - def n_opened_windows(self) -> int: - return self._n_opened_windows + def n_open_windows(self) -> int: + return self._n_open_windows @property - def n_closed_windows(self) -> int: - return self._n_closed_windows + def has_windows(self) -> bool: + return self._has_windows @property def expansion_kind(self) -> ExpansionKind: @@ -235,26 +235,26 @@ def with_kind(self, kind: ExprKind, /) -> ExprMetadata: """Change metadata kind, leaving all other attributes the same.""" return ExprMetadata( kind, - n_opened_windows=self._n_opened_windows, - n_closed_windows=self._n_closed_windows, + n_open_windows=self._n_open_windows, + has_windows=self._has_windows, expansion_kind=self._expansion_kind, ) def with_extra_open_window(self) -> ExprMetadata: - """Increment `n_opened_windows` leaving other attributes the same.""" + """Increment `n_open_windows` leaving other attributes the same.""" return ExprMetadata( self.kind, - n_opened_windows=self._n_opened_windows + 1, + n_open_windows=self._n_open_windows + 1, expansion_kind=self._expansion_kind, - n_closed_windows=self._n_closed_windows, + has_windows=self._has_windows, ) def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: - """Change metadata kind and increment `n_opened_windows`.""" + """Change metadata kind and increment `n_open_windows`.""" return ExprMetadata( kind, - n_opened_windows=self._n_opened_windows + 1, - n_closed_windows=self._n_closed_windows, + n_open_windows=self._n_open_windows + 1, + has_windows=self._has_windows, expansion_kind=self._expansion_kind, ) @@ -263,8 +263,8 @@ def selector_single() -> ExprMetadata: # e.g. `nw.col('a')`, `nw.nth(0)` return ExprMetadata( ExprKind.TRANSFORM, - n_opened_windows=0, - n_closed_windows=0, + n_open_windows=0, + has_windows=False, expansion_kind=ExpansionKind.SINGLE, ) @@ -273,8 +273,8 @@ def selector_multi_named() -> ExprMetadata: # e.g. `nw.col('a', 'b')` return ExprMetadata( ExprKind.TRANSFORM, - n_opened_windows=0, - n_closed_windows=0, + n_open_windows=0, + has_windows=False, expansion_kind=ExpansionKind.MULTI_NAMED, ) @@ -283,8 +283,8 @@ def selector_multi_unnamed() -> ExprMetadata: # e.g. `nw.all()` return ExprMetadata( ExprKind.TRANSFORM, - n_opened_windows=0, - n_closed_windows=0, + n_open_windows=0, + has_windows=False, expansion_kind=ExpansionKind.MULTI_UNNAMED, ) @@ -308,8 +308,8 @@ def combine_metadata( has_transforms_or_windows = False has_aggregations = False has_literals = False - result_n_opened_windows = 0 - result_n_closed_windows = 0 + result_n_open_windows = 0 + result_has_windows = False result_expansion_kind = ExpansionKind.SINGLE for i, arg in enumerate(args): @@ -331,8 +331,8 @@ def combine_metadata( result_expansion_kind = resolve_expansion_kind( result_expansion_kind, arg._metadata.expansion_kind ) - result_n_opened_windows += arg._metadata.n_opened_windows - result_n_closed_windows += arg._metadata.n_closed_windows + result_n_open_windows += arg._metadata.n_open_windows + result_has_windows |= arg._metadata.has_windows kind = arg._metadata.kind if kind is ExprKind.AGGREGATION: has_aggregations = True @@ -368,8 +368,8 @@ def combine_metadata( return ExprMetadata( result_kind, - n_opened_windows=result_n_opened_windows, - n_closed_windows=result_n_closed_windows, + n_open_windows=result_n_open_windows, + has_windows=result_has_windows, expansion_kind=result_expansion_kind, ) diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py index be1bf9354b..5168f5c333 100644 --- a/narwhals/dataframe.py +++ b/narwhals/dataframe.py @@ -2150,7 +2150,7 @@ def _extract_compliant(self: Self, arg: Any) -> Any: plx = self.__narwhals_namespace__() return plx.col(arg) if isinstance(arg, Expr): - if arg._metadata.n_opened_windows > arg._metadata._n_closed_windows: + if arg._metadata.n_open_windows: msg = ( "Order-dependent expressions are not supported for use in LazyFrame.\n\n" "Hints:\n" diff --git a/narwhals/expr.py b/narwhals/expr.py index 74384d8289..1e309191c1 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -1580,25 +1580,26 @@ def over( raise ValueError(msg) kind = ExprKind.TRANSFORM - n_opened_windows = self._metadata.n_opened_windows - n_closed_windows = self._metadata.n_closed_windows - if n_closed_windows: + n_open_windows = self._metadata.n_open_windows + has_windows = self._metadata.has_windows + if has_windows: msg = "Nested `over` statements are not allowed." raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): - assert n_opened_windows # noqa: S101 - n_closed_windows += 1 - elif flat_order_by is not None and not n_opened_windows: + # debug assertion, `n_open_windows` should already have been incremented + # by the window function. If it's immediately followed by `over`, then the + # window gets closed, so we decrement `n_open_windows`. + assert n_open_windows # noqa: S101 + n_open_windows -= 1 + elif flat_order_by is not None and not n_open_windows: msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) - else: - n_opened_windows += 1 - n_closed_windows += 1 + has_windows = True current_meta = self._metadata next_meta = ExprMetadata( kind, - n_opened_windows=n_opened_windows, - n_closed_windows=n_closed_windows, + n_open_windows=n_open_windows, + has_windows=has_windows, expansion_kind=current_meta.expansion_kind, ) diff --git a/narwhals/functions.py b/narwhals/functions.py index 8daa8e0cf1..65d5499081 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -1184,8 +1184,8 @@ def func(plx: Any) -> Any: func, ExprMetadata( ExprKind.AGGREGATION, - n_opened_windows=0, - n_closed_windows=0, + n_open_windows=0, + has_windows=False, expansion_kind=ExpansionKind.SINGLE, ), ) @@ -1659,8 +1659,8 @@ def lit(value: Any, dtype: DType | type[DType] | None = None) -> Expr: lambda plx: plx.lit(value, dtype), ExprMetadata( ExprKind.LITERAL, - n_opened_windows=0, - n_closed_windows=0, + n_open_windows=0, + has_windows=False, expansion_kind=ExpansionKind.SINGLE, ), ) diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index 4608f4e93e..b95dd0405f 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -24,8 +24,8 @@ def test_has_open_windows( expr: nw.Expr, expected_open: int, expected_closed: int ) -> None: - assert expr._metadata.n_opened_windows == expected_open - assert expr._metadata.n_closed_windows == expected_closed + assert expr._metadata.n_open_windows == expected_open + assert expr._metadata.has_windows == expected_closed def test_misleading_order_by() -> None: From d44e7c5bcf82d74cbeaba6a2fd6ea81d2cda7e6e Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 20:13:13 +0100 Subject: [PATCH 06/14] docs --- docs/how_it_works.md | 4 ++-- narwhals/_expression_parsing.py | 30 +++++++++++++++--------------- narwhals/expr.py | 8 ++++---- narwhals/functions.py | 4 ++-- tests/expression_parsing_test.py | 2 +- 5 files changed, 24 insertions(+), 24 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index fc31872550..99e925f78a 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -351,7 +351,7 @@ is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are. ### "You open a window to another window to another window to another window" When we print out an expression, in addition to the expression kind, -we also see `n_open_windows` and `has_windows`. +we also see `n_open_windows` and `has_closed_windows`. An open window is a non-elementwise operation which depends on row order. For example, if we have `nw.col('a').cum_sum()`, then in which order @@ -372,7 +372,7 @@ with open windows. When working with `LazyFrame`s, on the other hand, row order is undefined. Therefore, it's only allowed to work with expressions where `n_open_windows` -is exactly equal to `has_windows`. +is exactly equal to zero. ### Broadcasting diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 913b5f2a0d..99eabae942 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -192,7 +192,7 @@ def is_multi_output( class ExprMetadata: - __slots__ = ("_expansion_kind", "_has_windows", "_kind", "_n_open_windows") + __slots__ = ("_expansion_kind", "_has_closed_windows", "_kind", "_n_open_windows") def __init__( self, @@ -200,12 +200,12 @@ def __init__( /, *, n_open_windows: int, - has_windows: bool, + has_closed_windows: bool, expansion_kind: ExpansionKind, ) -> None: self._kind: ExprKind = kind self._n_open_windows = n_open_windows - self._has_windows = has_windows + self._has_closed_windows = has_closed_windows self._expansion_kind = expansion_kind def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no cover @@ -213,7 +213,7 @@ def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no c raise TypeError(msg) def __repr__(self) -> str: - return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, has_windows: {self._has_windows}, expansion_kind: {self._expansion_kind})" + return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, has_closed_windows: {self._has_closed_windows}, expansion_kind: {self._expansion_kind})" @property def kind(self) -> ExprKind: @@ -224,8 +224,8 @@ def n_open_windows(self) -> int: return self._n_open_windows @property - def has_windows(self) -> bool: - return self._has_windows + def has_closed_windows(self) -> bool: + return self._has_closed_windows @property def expansion_kind(self) -> ExpansionKind: @@ -236,7 +236,7 @@ def with_kind(self, kind: ExprKind, /) -> ExprMetadata: return ExprMetadata( kind, n_open_windows=self._n_open_windows, - has_windows=self._has_windows, + has_closed_windows=self._has_closed_windows, expansion_kind=self._expansion_kind, ) @@ -246,7 +246,7 @@ def with_extra_open_window(self) -> ExprMetadata: self.kind, n_open_windows=self._n_open_windows + 1, expansion_kind=self._expansion_kind, - has_windows=self._has_windows, + has_closed_windows=self._has_closed_windows, ) def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: @@ -254,7 +254,7 @@ def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: return ExprMetadata( kind, n_open_windows=self._n_open_windows + 1, - has_windows=self._has_windows, + has_closed_windows=self._has_closed_windows, expansion_kind=self._expansion_kind, ) @@ -264,7 +264,7 @@ def selector_single() -> ExprMetadata: return ExprMetadata( ExprKind.TRANSFORM, n_open_windows=0, - has_windows=False, + has_closed_windows=False, expansion_kind=ExpansionKind.SINGLE, ) @@ -274,7 +274,7 @@ def selector_multi_named() -> ExprMetadata: return ExprMetadata( ExprKind.TRANSFORM, n_open_windows=0, - has_windows=False, + has_closed_windows=False, expansion_kind=ExpansionKind.MULTI_NAMED, ) @@ -284,7 +284,7 @@ def selector_multi_unnamed() -> ExprMetadata: return ExprMetadata( ExprKind.TRANSFORM, n_open_windows=0, - has_windows=False, + has_closed_windows=False, expansion_kind=ExpansionKind.MULTI_UNNAMED, ) @@ -309,7 +309,7 @@ def combine_metadata( has_aggregations = False has_literals = False result_n_open_windows = 0 - result_has_windows = False + result_has_closed_windows = False result_expansion_kind = ExpansionKind.SINGLE for i, arg in enumerate(args): @@ -332,7 +332,7 @@ def combine_metadata( result_expansion_kind, arg._metadata.expansion_kind ) result_n_open_windows += arg._metadata.n_open_windows - result_has_windows |= arg._metadata.has_windows + result_has_closed_windows |= arg._metadata.has_closed_windows kind = arg._metadata.kind if kind is ExprKind.AGGREGATION: has_aggregations = True @@ -369,7 +369,7 @@ def combine_metadata( return ExprMetadata( result_kind, n_open_windows=result_n_open_windows, - has_windows=result_has_windows, + has_closed_windows=result_has_closed_windows, expansion_kind=result_expansion_kind, ) diff --git a/narwhals/expr.py b/narwhals/expr.py index 1e309191c1..f659b0a7eb 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -1581,8 +1581,8 @@ def over( kind = ExprKind.TRANSFORM n_open_windows = self._metadata.n_open_windows - has_windows = self._metadata.has_windows - if has_windows: + has_closed_windows = self._metadata.has_closed_windows + if has_closed_windows: msg = "Nested `over` statements are not allowed." raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): @@ -1594,12 +1594,12 @@ def over( elif flat_order_by is not None and not n_open_windows: msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) - has_windows = True + has_closed_windows = True current_meta = self._metadata next_meta = ExprMetadata( kind, n_open_windows=n_open_windows, - has_windows=has_windows, + has_closed_windows=has_closed_windows, expansion_kind=current_meta.expansion_kind, ) diff --git a/narwhals/functions.py b/narwhals/functions.py index 65d5499081..1cb3b4f9cb 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -1185,7 +1185,7 @@ def func(plx: Any) -> Any: ExprMetadata( ExprKind.AGGREGATION, n_open_windows=0, - has_windows=False, + has_closed_windows=False, expansion_kind=ExpansionKind.SINGLE, ), ) @@ -1660,7 +1660,7 @@ def lit(value: Any, dtype: DType | type[DType] | None = None) -> Expr: ExprMetadata( ExprKind.LITERAL, n_open_windows=0, - has_windows=False, + has_closed_windows=False, expansion_kind=ExpansionKind.SINGLE, ), ) diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index b95dd0405f..c6243a6f1b 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -25,7 +25,7 @@ def test_has_open_windows( expr: nw.Expr, expected_open: int, expected_closed: int ) -> None: assert expr._metadata.n_open_windows == expected_open - assert expr._metadata.has_windows == expected_closed + assert expr._metadata.has_closed_windows == expected_closed def test_misleading_order_by() -> None: From fb11483ac2985ceb2db7d16607cc605e0e1b1502 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 20:32:40 +0100 Subject: [PATCH 07/14] fixup --- tests/expression_parsing_test.py | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index c6243a6f1b..49ae78631b 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -9,23 +9,23 @@ @pytest.mark.parametrize( ("expr", "expected_open", "expected_closed"), [ - (nw.col("a"), 0, 0), - (nw.col("a").mean(), 0, 0), - (nw.col("a").cum_sum(), 1, 0), - (nw.col("a").cum_sum().over(order_by="id"), 1, 1), - (nw.col("a").cum_sum().abs().over(order_by="id"), 2, 1), - ((nw.col("a").cum_sum() + 1).over(order_by="id"), 2, 1), - (nw.col("a").cum_sum().cum_sum().over(order_by="id"), 2, 1), - (nw.col("a").cum_sum().cum_sum(), 2, 0), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1, 0), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), 2, 1), + (nw.col("a"), 0, False), + (nw.col("a").mean(), 0, False), + (nw.col("a").cum_sum(), 1, False), + (nw.col("a").cum_sum().over(order_by="id"), 0, True), + (nw.col("a").cum_sum().abs().over(order_by="id"), 1, True), + ((nw.col("a").cum_sum() + 1).over(order_by="id"), 1, True), + (nw.col("a").cum_sum().cum_sum().over(order_by="id"), 1, True), + (nw.col("a").cum_sum().cum_sum(), 2, False), + (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1, False), + (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), 1, True), ], ) def test_has_open_windows( - expr: nw.Expr, expected_open: int, expected_closed: int + expr: nw.Expr, expected_open: int, *, expected_closed: bool ) -> None: assert expr._metadata.n_open_windows == expected_open - assert expr._metadata.has_closed_windows == expected_closed + assert expr._metadata.has_closed_windows is expected_closed def test_misleading_order_by() -> None: From a45284efdb7ec5198e0532f179defa431330f1a3 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Sun, 6 Apr 2025 20:53:17 +0100 Subject: [PATCH 08/14] simplify --- narwhals/expr.py | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/narwhals/expr.py b/narwhals/expr.py index f659b0a7eb..30badf993c 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -1581,8 +1581,7 @@ def over( kind = ExprKind.TRANSFORM n_open_windows = self._metadata.n_open_windows - has_closed_windows = self._metadata.has_closed_windows - if has_closed_windows: + if self._metadata.has_closed_windows: msg = "Nested `over` statements are not allowed." raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): @@ -1594,12 +1593,11 @@ def over( elif flat_order_by is not None and not n_open_windows: msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) - has_closed_windows = True current_meta = self._metadata next_meta = ExprMetadata( kind, n_open_windows=n_open_windows, - has_closed_windows=has_closed_windows, + has_closed_windows=True, expansion_kind=current_meta.expansion_kind, ) From 74b0de1fa884b17825d031e0ef148e0f4203a3c0 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Mon, 7 Apr 2025 11:03:11 +0100 Subject: [PATCH 09/14] it can always be done simpler! --- docs/how_it_works.md | 27 ++++---- narwhals/_expression_parsing.py | 107 +++++++++++++++++++++---------- narwhals/dataframe.py | 2 +- narwhals/expr.py | 20 +++--- narwhals/functions.py | 7 +- tests/expression_parsing_test.py | 49 +++++++++----- 6 files changed, 134 insertions(+), 78 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 99e925f78a..008a6cd1c4 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -351,28 +351,23 @@ is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are. ### "You open a window to another window to another window to another window" When we print out an expression, in addition to the expression kind, -we also see `n_open_windows` and `has_closed_windows`. +we also see `window_kind`. These are four window kinds: -An open window is a non-elementwise operation which depends on row order. -For example, if we have `nw.col('a').cum_sum()`, then in which order -should the cumulative sum be computed? It's not specified. By introducing -an operation like this one, we say that we've opened a window. An open -window can only be closed if it's followed by `over` with `order_by` -specified. - -A closed window is a non-elementwise operation which doesn't depend on row -order. For example: - -- `nw.col('a').mean().over('b')`. -- `nw.col('a').cum_sum().over(order_by='i')`. +- `NONE`: non-order-dependent operations, like `.abs()` or `.mean()`. +- `CLOSEABLE`: expression where the last operation is order-dependent. For + example, `nw.col('a').diff()`. +- `UNCLOSEABLE`: expression where some operation is order-dependent but + the order-dependent operation wasn't the last one. For example, + `nw.col('a').diff().abs()`. +- `CLOSED`: expression contains `over` at some point, and any order-dependent + operation was immediately followed by `over(order_by=...)`. When working with `DataFrame`s, row order is well-defined, as the dataframes are assumed to be eager and in-memory. Therefore, it's allowed to work -with open windows. +with all window kinds. When working with `LazyFrame`s, on the other hand, row order is undefined. -Therefore, it's only allowed to work with expressions where `n_open_windows` -is exactly equal to zero. +Therefore, window kinds must either be `NONE` or `CLOSED`. ### Broadcasting diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 99eabae942..abf339fe72 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -15,6 +15,7 @@ from narwhals.dependencies import is_narwhals_series from narwhals.dependencies import is_numpy_array +from narwhals.exceptions import InvalidOperationError from narwhals.exceptions import LengthChangingExprError from narwhals.exceptions import MultiOutputExpressionError from narwhals.exceptions import ShapeError @@ -191,21 +192,45 @@ def is_multi_output( return expansion_kind in {ExpansionKind.MULTI_NAMED, ExpansionKind.MULTI_UNNAMED} +class WindowKind(Enum): + """Describe what kind of window the expression contains.""" + + NONE = auto() + """e.g. `nw.col('a').abs()`, no windows.""" + + CLOSEABLE = auto() + """e.g. `nw.col('a').cum_sum()` - can be closed if immediately followed by `over(order_by=...)`.""" + + UNCLOSEABLE = auto() + """e.g. `nw.col('a').cum_sum().abs()` - the window function (`cum_sum`) wasn't immediately followed by + `over(order_by=...)`, and so the window is uncloseable.""" + + CLOSED = auto() + """e.g. `nw.col('a').cum_sum().over(order_by='i')`.""" + + def is_open(self) -> bool: + return self in {WindowKind.UNCLOSEABLE, WindowKind.CLOSEABLE} + + def is_closed(self) -> bool: + return self is WindowKind.CLOSED + + def is_uncloseable(self) -> bool: + return self is WindowKind.UNCLOSEABLE + + class ExprMetadata: - __slots__ = ("_expansion_kind", "_has_closed_windows", "_kind", "_n_open_windows") + __slots__ = ("_expansion_kind", "_kind", "_window_kind") def __init__( self, kind: ExprKind, /, *, - n_open_windows: int, - has_closed_windows: bool, + window_kind: WindowKind, expansion_kind: ExpansionKind, ) -> None: self._kind: ExprKind = kind - self._n_open_windows = n_open_windows - self._has_closed_windows = has_closed_windows + self._window_kind = window_kind self._expansion_kind = expansion_kind def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no cover @@ -213,19 +238,15 @@ def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never: # pragma: no c raise TypeError(msg) def __repr__(self) -> str: - return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, has_closed_windows: {self._has_closed_windows}, expansion_kind: {self._expansion_kind})" + return f"ExprMetadata(kind: {self._kind}, window_kind: {self._window_kind}, expansion_kind: {self._expansion_kind})" @property def kind(self) -> ExprKind: return self._kind @property - def n_open_windows(self) -> int: - return self._n_open_windows - - @property - def has_closed_windows(self) -> bool: - return self._has_closed_windows + def window_kind(self) -> WindowKind: + return self._window_kind @property def expansion_kind(self) -> ExpansionKind: @@ -235,8 +256,7 @@ def with_kind(self, kind: ExprKind, /) -> ExprMetadata: """Change metadata kind, leaving all other attributes the same.""" return ExprMetadata( kind, - n_open_windows=self._n_open_windows, - has_closed_windows=self._has_closed_windows, + window_kind=self._window_kind, expansion_kind=self._expansion_kind, ) @@ -244,17 +264,30 @@ def with_extra_open_window(self) -> ExprMetadata: """Increment `n_open_windows` leaving other attributes the same.""" return ExprMetadata( self.kind, - n_open_windows=self._n_open_windows + 1, + window_kind=self._window_kind, expansion_kind=self._expansion_kind, - has_closed_windows=self._has_closed_windows, ) def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: """Change metadata kind and increment `n_open_windows`.""" + if self._window_kind is WindowKind.NONE: + window_kind = WindowKind.CLOSEABLE + elif self._window_kind is WindowKind.CLOSED: + msg = "Cannot chain `over` expressions." + raise InvalidOperationError(msg) + else: + window_kind = WindowKind.UNCLOSEABLE return ExprMetadata( kind, - n_open_windows=self._n_open_windows + 1, - has_closed_windows=self._has_closed_windows, + window_kind=window_kind, + expansion_kind=self._expansion_kind, + ) + + def with_kind_and_uncloseable_window(self, kind: ExprKind, /) -> ExprMetadata: + """Change metadata kind and set window kind to uncloseable.""" + return ExprMetadata( + kind, + window_kind=WindowKind.UNCLOSEABLE, expansion_kind=self._expansion_kind, ) @@ -263,8 +296,7 @@ def selector_single() -> ExprMetadata: # e.g. `nw.col('a')`, `nw.nth(0)` return ExprMetadata( ExprKind.TRANSFORM, - n_open_windows=0, - has_closed_windows=False, + window_kind=WindowKind.NONE, expansion_kind=ExpansionKind.SINGLE, ) @@ -273,8 +305,7 @@ def selector_multi_named() -> ExprMetadata: # e.g. `nw.col('a', 'b')` return ExprMetadata( ExprKind.TRANSFORM, - n_open_windows=0, - has_closed_windows=False, + window_kind=WindowKind.NONE, expansion_kind=ExpansionKind.MULTI_NAMED, ) @@ -283,13 +314,12 @@ def selector_multi_unnamed() -> ExprMetadata: # e.g. `nw.all()` return ExprMetadata( ExprKind.TRANSFORM, - n_open_windows=0, - has_closed_windows=False, + window_kind=WindowKind.NONE, expansion_kind=ExpansionKind.MULTI_UNNAMED, ) -def combine_metadata( +def combine_metadata( # noqa: PLR0915 *args: IntoExpr | object | None, str_as_lit: bool, allow_multi_output: bool, @@ -308,9 +338,10 @@ def combine_metadata( has_transforms_or_windows = False has_aggregations = False has_literals = False - result_n_open_windows = 0 - result_has_closed_windows = False result_expansion_kind = ExpansionKind.SINGLE + has_closeable_windows = False + has_uncloseable_windows = False + has_closed_windows = False for i, arg in enumerate(args): if isinstance(arg, str) and not str_as_lit: @@ -331,8 +362,6 @@ def combine_metadata( result_expansion_kind = resolve_expansion_kind( result_expansion_kind, arg._metadata.expansion_kind ) - result_n_open_windows += arg._metadata.n_open_windows - result_has_closed_windows |= arg._metadata.has_closed_windows kind = arg._metadata.kind if kind is ExprKind.AGGREGATION: has_aggregations = True @@ -346,6 +375,14 @@ def combine_metadata( msg = "unreachable code" raise AssertionError(msg) + window_kind = arg._metadata.window_kind + if window_kind is WindowKind.UNCLOSEABLE: + has_uncloseable_windows = True + elif window_kind is WindowKind.CLOSEABLE: + has_closeable_windows = True + elif window_kind is WindowKind.CLOSED: + has_closed_windows = True + if ( has_literals and not has_aggregations @@ -366,11 +403,15 @@ def combine_metadata( else: result_kind = ExprKind.AGGREGATION + if has_uncloseable_windows or has_closeable_windows: + result_window_kind = WindowKind.UNCLOSEABLE + elif has_closed_windows: + result_window_kind = WindowKind.CLOSED + else: + result_window_kind = WindowKind.NONE + return ExprMetadata( - result_kind, - n_open_windows=result_n_open_windows, - has_closed_windows=result_has_closed_windows, - expansion_kind=result_expansion_kind, + result_kind, window_kind=result_window_kind, expansion_kind=result_expansion_kind ) diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py index 5168f5c333..ec19142ab1 100644 --- a/narwhals/dataframe.py +++ b/narwhals/dataframe.py @@ -2150,7 +2150,7 @@ def _extract_compliant(self: Self, arg: Any) -> Any: plx = self.__narwhals_namespace__() return plx.col(arg) if isinstance(arg, Expr): - if arg._metadata.n_open_windows: + if arg._metadata._window_kind.is_open(): msg = ( "Order-dependent expressions are not supported for use in LazyFrame.\n\n" "Hints:\n" diff --git a/narwhals/expr.py b/narwhals/expr.py index 30badf993c..f4f213b199 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -10,6 +10,7 @@ from narwhals._expression_parsing import ExprKind from narwhals._expression_parsing import ExprMetadata +from narwhals._expression_parsing import WindowKind from narwhals._expression_parsing import apply_n_ary_operation from narwhals._expression_parsing import combine_metadata from narwhals._expression_parsing import combine_metadata_binary_op @@ -65,8 +66,11 @@ def _with_callable(self, to_compliant_expr: Callable[[Any], Any]) -> Self: # Instantiate new Expr keeping metadata unchanged, unless # it's a WINDOW, in which case make it a TRANSFORM. if self._metadata.kind.is_window(): + # We had a window function, but it wasn't immediately followed by + # `over(order_by=...)` - it missed its chance, it's now forever uncloseable. return self.__class__( - to_compliant_expr, self._metadata.with_kind(ExprKind.TRANSFORM) + to_compliant_expr, + self._metadata.with_kind_and_uncloseable_window(ExprKind.TRANSFORM), ) return self.__class__(to_compliant_expr, self._metadata) @@ -1580,24 +1584,24 @@ def over( raise ValueError(msg) kind = ExprKind.TRANSFORM - n_open_windows = self._metadata.n_open_windows - if self._metadata.has_closed_windows: + window_kind = self._metadata.window_kind + if window_kind.is_closed(): msg = "Nested `over` statements are not allowed." raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): # debug assertion, `n_open_windows` should already have been incremented # by the window function. If it's immediately followed by `over`, then the # window gets closed, so we decrement `n_open_windows`. - assert n_open_windows # noqa: S101 - n_open_windows -= 1 - elif flat_order_by is not None and not n_open_windows: + assert window_kind.is_open() # noqa: S101 + elif flat_order_by is not None and not window_kind.is_open(): msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) current_meta = self._metadata next_meta = ExprMetadata( kind, - n_open_windows=n_open_windows, - has_closed_windows=True, + window_kind=WindowKind.UNCLOSEABLE + if window_kind.is_uncloseable() + else WindowKind.CLOSED, expansion_kind=current_meta.expansion_kind, ) diff --git a/narwhals/functions.py b/narwhals/functions.py index 1cb3b4f9cb..438fa76b7c 100644 --- a/narwhals/functions.py +++ b/narwhals/functions.py @@ -15,6 +15,7 @@ from narwhals._expression_parsing import ExpansionKind from narwhals._expression_parsing import ExprKind from narwhals._expression_parsing import ExprMetadata +from narwhals._expression_parsing import WindowKind from narwhals._expression_parsing import apply_n_ary_operation from narwhals._expression_parsing import check_expressions_preserve_length from narwhals._expression_parsing import combine_metadata @@ -1184,8 +1185,7 @@ def func(plx: Any) -> Any: func, ExprMetadata( ExprKind.AGGREGATION, - n_open_windows=0, - has_closed_windows=False, + window_kind=WindowKind.NONE, expansion_kind=ExpansionKind.SINGLE, ), ) @@ -1659,8 +1659,7 @@ def lit(value: Any, dtype: DType | type[DType] | None = None) -> Expr: lambda plx: plx.lit(value, dtype), ExprMetadata( ExprKind.LITERAL, - n_open_windows=0, - has_closed_windows=False, + window_kind=WindowKind.NONE, expansion_kind=ExpansionKind.SINGLE, ), ) diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index 49ae78631b..b5dd408324 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -3,29 +3,46 @@ import pytest import narwhals as nw +from narwhals._expression_parsing import WindowKind from narwhals.exceptions import InvalidOperationError @pytest.mark.parametrize( - ("expr", "expected_open", "expected_closed"), + ("expr", "expected"), [ - (nw.col("a"), 0, False), - (nw.col("a").mean(), 0, False), - (nw.col("a").cum_sum(), 1, False), - (nw.col("a").cum_sum().over(order_by="id"), 0, True), - (nw.col("a").cum_sum().abs().over(order_by="id"), 1, True), - ((nw.col("a").cum_sum() + 1).over(order_by="id"), 1, True), - (nw.col("a").cum_sum().cum_sum().over(order_by="id"), 1, True), - (nw.col("a").cum_sum().cum_sum(), 2, False), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), 1, False), - (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), 1, True), + (nw.col("a"), WindowKind.NONE), + (nw.col("a").mean(), WindowKind.NONE), + (nw.col("a").cum_sum(), WindowKind.CLOSEABLE), + (nw.col("a").cum_sum().over(order_by="id"), WindowKind.CLOSED), + (nw.col("a").cum_sum().abs().over(order_by="id"), WindowKind.UNCLOSEABLE), + ((nw.col("a").cum_sum() + 1).over(order_by="id"), WindowKind.UNCLOSEABLE), + (nw.col("a").cum_sum().cum_sum().over(order_by="id"), WindowKind.UNCLOSEABLE), + (nw.col("a").cum_sum().cum_sum(), WindowKind.UNCLOSEABLE), + (nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()), WindowKind.UNCLOSEABLE), + ( + nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum()).over("a"), + WindowKind.UNCLOSEABLE, + ), + ( + nw.sum_horizontal(nw.col("a"), nw.col("a").cum_sum().over(order_by="i")), + WindowKind.CLOSED, + ), + ( + nw.sum_horizontal( + nw.col("a").diff(), nw.col("a").cum_sum().over(order_by="i") + ), + WindowKind.UNCLOSEABLE, + ), + ( + nw.sum_horizontal(nw.col("a").diff(), nw.col("a").cum_sum()).over( + order_by="i" + ), + WindowKind.UNCLOSEABLE, + ), ], ) -def test_has_open_windows( - expr: nw.Expr, expected_open: int, *, expected_closed: bool -) -> None: - assert expr._metadata.n_open_windows == expected_open - assert expr._metadata.has_closed_windows is expected_closed +def test_window_kind(expr: nw.Expr, expected: WindowKind) -> None: + assert expr._metadata.window_kind is expected def test_misleading_order_by() -> None: From 6a138c6819169ac79c58677a288b272126c92dbb Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Mon, 7 Apr 2025 11:42:38 +0100 Subject: [PATCH 10/14] coverage --- narwhals/_expression_parsing.py | 7 +++---- tests/expression_parsing_test.py | 6 ++++++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index abf339fe72..a31f3121fe 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -15,7 +15,6 @@ from narwhals.dependencies import is_narwhals_series from narwhals.dependencies import is_numpy_array -from narwhals.exceptions import InvalidOperationError from narwhals.exceptions import LengthChangingExprError from narwhals.exceptions import MultiOutputExpressionError from narwhals.exceptions import ShapeError @@ -272,9 +271,9 @@ def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: """Change metadata kind and increment `n_open_windows`.""" if self._window_kind is WindowKind.NONE: window_kind = WindowKind.CLOSEABLE - elif self._window_kind is WindowKind.CLOSED: - msg = "Cannot chain `over` expressions." - raise InvalidOperationError(msg) + elif self._window_kind is WindowKind.CLOSED: # pragma: no cover + msg = "Unreachable code, please report a bug." + raise AssertionError(msg) else: window_kind = WindowKind.UNCLOSEABLE return ExprMetadata( diff --git a/tests/expression_parsing_test.py b/tests/expression_parsing_test.py index b5dd408324..3644fbf7f7 100644 --- a/tests/expression_parsing_test.py +++ b/tests/expression_parsing_test.py @@ -39,6 +39,12 @@ ), WindowKind.UNCLOSEABLE, ), + ( + nw.sum_horizontal(nw.col("a").diff().abs(), nw.col("a").cum_sum()).over( + order_by="i" + ), + WindowKind.UNCLOSEABLE, + ), ], ) def test_window_kind(expr: nw.Expr, expected: WindowKind) -> None: From bc87e6b8a27c0130046ccae1e5906d729800498a Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Mon, 7 Apr 2025 20:14:09 +0100 Subject: [PATCH 11/14] update outdated "open window" refs --- narwhals/_expression_parsing.py | 16 +++++++++++----- narwhals/expr.py | 32 ++++++++++++++++---------------- narwhals/stable/v1/__init__.py | 10 +++++----- 3 files changed, 32 insertions(+), 26 deletions(-) diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index a31f3121fe..22175d27fb 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -259,16 +259,22 @@ def with_kind(self, kind: ExprKind, /) -> ExprMetadata: expansion_kind=self._expansion_kind, ) - def with_extra_open_window(self) -> ExprMetadata: - """Increment `n_open_windows` leaving other attributes the same.""" + def with_uncloseable_window(self) -> ExprMetadata: + """Add uncloseable window, leaving other attributes the same.""" + if self._window_kind is WindowKind.CLOSED: # pragma: no cover + msg = "Unreachable code, please report a bug." + raise AssertionError(msg) return ExprMetadata( self.kind, - window_kind=self._window_kind, + window_kind=WindowKind.UNCLOSEABLE, expansion_kind=self._expansion_kind, ) - def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata: - """Change metadata kind and increment `n_open_windows`.""" + def with_kind_and_closeable_window(self, kind: ExprKind, /) -> ExprMetadata: + """Change metadata kind and add closeable window. + + If we already have an uncloseable window, the window stays uncloseable. + """ if self._window_kind is WindowKind.NONE: window_kind = WindowKind.CLOSEABLE elif self._window_kind is WindowKind.CLOSED: # pragma: no cover diff --git a/narwhals/expr.py b/narwhals/expr.py index f4f213b199..859219b690 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -90,7 +90,7 @@ def _with_order_dependent_aggregation( raise InvalidOperationError(msg) return self.__class__( to_compliant_expr, - self._metadata.with_kind_and_extra_open_window(ExprKind.AGGREGATION), + self._metadata.with_kind_and_closeable_window(ExprKind.AGGREGATION), ) def _with_filtration(self, to_compliant_expr: Callable[[Any], Any]) -> Self: @@ -682,7 +682,7 @@ def map_batches( function=function, return_dtype=return_dtype ), # safest assumptions - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + self._metadata.with_kind_and_closeable_window(ExprKind.FILTRATION), ) def skew(self: Self) -> Self: @@ -936,7 +936,7 @@ def cum_sum(self: Self, *, reverse: bool = False) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).cum_sum(reverse=reverse), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def diff(self: Self) -> Self: @@ -983,7 +983,7 @@ def diff(self: Self) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).diff(), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def shift(self: Self, n: int) -> Self: @@ -1033,7 +1033,7 @@ def shift(self: Self, n: int) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).shift(n), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def replace_strict( @@ -1123,7 +1123,7 @@ def sort(self: Self, *, descending: bool = False, nulls_last: bool = False) -> S lambda plx: self._to_compliant_expr(plx).sort( descending=descending, nulls_last=nulls_last ), - self._metadata.with_extra_open_window(), + self._metadata.with_uncloseable_window(), ) # --- transform --- @@ -1720,7 +1720,7 @@ def is_first_distinct(self: Self) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).is_first_distinct(), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def is_last_distinct(self: Self) -> Self: @@ -1753,7 +1753,7 @@ def is_last_distinct(self: Self) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).is_last_distinct(), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def quantile( @@ -2085,7 +2085,7 @@ def cum_count(self: Self, *, reverse: bool = False) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).cum_count(reverse=reverse), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def cum_min(self: Self, *, reverse: bool = False) -> Self: @@ -2122,7 +2122,7 @@ def cum_min(self: Self, *, reverse: bool = False) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).cum_min(reverse=reverse), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def cum_max(self: Self, *, reverse: bool = False) -> Self: @@ -2159,7 +2159,7 @@ def cum_max(self: Self, *, reverse: bool = False) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).cum_max(reverse=reverse), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def cum_prod(self: Self, *, reverse: bool = False) -> Self: @@ -2196,7 +2196,7 @@ def cum_prod(self: Self, *, reverse: bool = False) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).cum_prod(reverse=reverse), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def rolling_sum( @@ -2262,7 +2262,7 @@ def rolling_sum( min_samples=min_samples_int, center=center, ), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def rolling_mean( @@ -2328,7 +2328,7 @@ def rolling_mean( min_samples=min_samples, center=center, ), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def rolling_var( @@ -2394,7 +2394,7 @@ def rolling_var( lambda plx: self._to_compliant_expr(plx).rolling_var( window_size=window_size, min_samples=min_samples, center=center, ddof=ddof ), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def rolling_std( @@ -2463,7 +2463,7 @@ def rolling_std( center=center, ddof=ddof, ), - self._metadata.with_kind_and_extra_open_window(ExprKind.WINDOW), + self._metadata.with_kind_and_closeable_window(ExprKind.WINDOW), ) def rank( diff --git a/narwhals/stable/v1/__init__.py b/narwhals/stable/v1/__init__.py index 79712278f4..7e053646d5 100644 --- a/narwhals/stable/v1/__init__.py +++ b/narwhals/stable/v1/__init__.py @@ -995,7 +995,7 @@ def head(self: Self, n: int = 10) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).head(n), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + self._metadata.with_kind_and_closeable_window(ExprKind.FILTRATION), ) def tail(self: Self, n: int = 10) -> Self: @@ -1009,7 +1009,7 @@ def tail(self: Self, n: int = 10) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).tail(n), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + self._metadata.with_kind_and_closeable_window(ExprKind.FILTRATION), ) def gather_every(self: Self, n: int, offset: int = 0) -> Self: @@ -1024,7 +1024,7 @@ def gather_every(self: Self, n: int, offset: int = 0) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).gather_every(n=n, offset=offset), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + self._metadata.with_kind_and_closeable_window(ExprKind.FILTRATION), ) def unique(self: Self, *, maintain_order: bool | None = None) -> Self: @@ -1063,7 +1063,7 @@ def sort(self: Self, *, descending: bool = False, nulls_last: bool = False) -> S lambda plx: self._to_compliant_expr(plx).sort( descending=descending, nulls_last=nulls_last ), - self._metadata.with_extra_open_window(), + self._metadata.with_uncloseable_window(), ) def arg_true(self: Self) -> Self: @@ -1074,7 +1074,7 @@ def arg_true(self: Self) -> Self: """ return self.__class__( lambda plx: self._to_compliant_expr(plx).arg_true(), - self._metadata.with_kind_and_extra_open_window(ExprKind.FILTRATION), + self._metadata.with_kind_and_closeable_window(ExprKind.FILTRATION), ) def sample( From 770490a50dd06eda26ba3585ddbaeac76a3e8d15 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Mon, 7 Apr 2025 20:15:42 +0100 Subject: [PATCH 12/14] update outdated "open window" refs --- narwhals/expr.py | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/narwhals/expr.py b/narwhals/expr.py index 859219b690..eb3d5d76c2 100644 --- a/narwhals/expr.py +++ b/narwhals/expr.py @@ -1589,19 +1589,20 @@ def over( msg = "Nested `over` statements are not allowed." raise InvalidOperationError(msg) if flat_order_by is not None and self._metadata.kind.is_window(): - # debug assertion, `n_open_windows` should already have been incremented + # debug assertion, an open window should already have been set # by the window function. If it's immediately followed by `over`, then the - # window gets closed, so we decrement `n_open_windows`. + # window gets closed. assert window_kind.is_open() # noqa: S101 elif flat_order_by is not None and not window_kind.is_open(): msg = "Cannot use `order_by` in `over` on expression which isn't order-dependent." raise InvalidOperationError(msg) current_meta = self._metadata + next_window_kind = ( + WindowKind.UNCLOSEABLE if window_kind.is_uncloseable() else WindowKind.CLOSED + ) next_meta = ExprMetadata( kind, - window_kind=WindowKind.UNCLOSEABLE - if window_kind.is_uncloseable() - else WindowKind.CLOSED, + window_kind=next_window_kind, expansion_kind=current_meta.expansion_kind, ) From f3bf2b909210009cbf4b89d4b11b29c9ea3364fa Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Tue, 8 Apr 2025 17:08:34 +0100 Subject: [PATCH 13/14] document effect of UNCLOSEABLE window --- narwhals/_expression_parsing.py | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index 22175d27fb..a7c58a6c5f 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -202,7 +202,10 @@ class WindowKind(Enum): UNCLOSEABLE = auto() """e.g. `nw.col('a').cum_sum().abs()` - the window function (`cum_sum`) wasn't immediately followed by - `over(order_by=...)`, and so the window is uncloseable.""" + `over(order_by=...)`, and so the window is uncloseable. + + Uncloseable windows can be used freely in `nw.DataFrame`, but not in `nw.LazyFrame` where + row-order is undefined.""" CLOSED = auto() """e.g. `nw.col('a').cum_sum().over(order_by='i')`.""" From 5481f9a29604fd1de3ad8ca68524a0cc3bd63506 Mon Sep 17 00:00:00 2001 From: Marco Gorelli <33491632+MarcoGorelli@users.noreply.github.com> Date: Wed, 9 Apr 2025 16:07:56 +0100 Subject: [PATCH 14/14] typos (thanks Dan!) --- docs/how_it_works.md | 10 +++++----- narwhals/_expression_parsing.py | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/how_it_works.md b/docs/how_it_works.md index 008a6cd1c4..2030a2db3c 100644 --- a/docs/how_it_works.md +++ b/docs/how_it_works.md @@ -284,7 +284,7 @@ print(nw.col("a").mean()) print(nw.col("a").mean().over("b")) ``` -Note how they tell us something but their metadata. This section is all about +Note how they tell us something about their metadata. This section is all about making sense of what that all means, what the rules are, and what it enables. ### Expression kinds @@ -296,15 +296,15 @@ Each Narwhals expression can be of one of the following kinds: - `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`). - `WINDOW`: like `TRANSFORM`, but the last operation is a (row-order-dependent) window function (`rolling_*`, `cum_*`, `diff`, `shift`, `is_*_distinct`). - aggregate (e.g. `nw.col('a').drop_nulls()`). - `FILTRATION`: expressions which change length but don't + aggregate (e.g. `nw.col('a').drop_nulls()`). For example: - `nw.col('a')` is not order-dependent, so it's `TRANSFORM`. - `nw.col('a').abs()` is not order-dependent, so it's a `TRANSFORM`. - `nw.col('a').cum_sum()`'s last operation is `cum_sum`, so it's `WINDOW`. - - `nw.col('a').cum_sum()+1`'s last operation is `__add__`, and it preserves + - `nw.col('a').cum_sum() + 1`'s last operation is `__add__`, and it preserves the input dataframe's length, so it's a `TRANSFORM`. How these change depends on the operation. @@ -325,7 +325,7 @@ This depends on `expr_method`. - `drop_nulls` and `filter` result in `FILTRATION`, and can only be applied to `TRANSFORM` and `WINDOW`. - `over` always results in `TRANSFORM`. This is a bit more complicated, - so we elaborate on it in the "You open a window" section below. + so we elaborate on it in the ["You open a window ..."](#you-open-a-window-to-another-window-to-another-window-to-another-window). #### Binary operations (e.g. `nw.col('a') + nw.col('b')`) @@ -351,7 +351,7 @@ is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are. ### "You open a window to another window to another window to another window" When we print out an expression, in addition to the expression kind, -we also see `window_kind`. These are four window kinds: +we also see `window_kind`. There are four window kinds: - `NONE`: non-order-dependent operations, like `.abs()` or `.mean()`. - `CLOSEABLE`: expression where the last operation is order-dependent. For diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py index a7c58a6c5f..168c125b06 100644 --- a/narwhals/_expression_parsing.py +++ b/narwhals/_expression_parsing.py @@ -121,7 +121,7 @@ class ExprKind(Enum): - LITERAL vs LITERAL -> LITERAL - FILTRATION vs (LITERAL | AGGREGATION) -> FILTRATION - FILTRATION vs (FILTRATION | TRANSFORM | WINDOW) -> raise - - (TRANSFORM | WINDOW) vs (anything) -> TRANSFORM + - (TRANSFORM | WINDOW) vs (...) -> TRANSFORM - AGGREGATION vs (LITERAL | AGGREGATION) -> AGGREGATION """