narwhals-dev · MarcoGorelli · Apr 9, 2025 · Apr 5, 2025 · Apr 6, 2025 · Apr 6, 2025
diff --git a/docs/how_it_works.md b/docs/how_it_works.md
@@ -272,7 +272,104 @@ print((pn.col("a") + 1).mean())
 For simple aggregations, Narwhals can just look at `_depth` and `function_name` and figure out
 which (efficient) elementary operation this corresponds to in pandas.
 
-## Broadcasting
+## Expression Metadata
+
+Let's try printing out a few expressions to the console to see what they show us:
+
+```python exec="1" result="python" session="metadata" source="above"
+import narwhals as nw
+
+print(nw.col("a"))
+print(nw.col("a").mean())
+print(nw.col("a").mean().over("b"))
+```
+
+Note how they tell us something about their metadata. This section is all about
+making sense of what that all means, what the rules are, and what it enables.
+
+### Expression kinds
+
+Each Narwhals expression can be of one of the following kinds:
+
+- `LITERAL`: expressions which correspond to literal values, such as the `3` in `nw.col('a')+3`.
+- `AGGREGATION`: expressions which reduce a column to a single value (e.g. `nw.col('a').mean()`).
+- `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`).
+- `WINDOW`: like `TRANSFORM`, but the last operation is a (row-order-dependent) 
+   window function (`rolling_*`, `cum_*`, `diff`, `shift`, `is_*_distinct`).
+- `FILTRATION`: expressions which change length but don't
+   aggregate (e.g. `nw.col('a').drop_nulls()`).
+
+For example:
+
+  - `nw.col('a')` is not order-dependent, so it's `TRANSFORM`.
+  - `nw.col('a').abs()` is not order-dependent, so it's a `TRANSFORM`.
+  - `nw.col('a').cum_sum()`'s last operation is `cum_sum`, so it's `WINDOW`.
+  - `nw.col('a').cum_sum() + 1`'s last operation is `__add__`, and it preserves
+     the input dataframe's length, so it's a `TRANSFORM`.
+
+How these change depends on the operation.
+
+#### Chaining
+
+Say we have `expr.expr_method()`. How does `expr`'s `ExprMetadata` change?
+This depends on `expr_method`.
+
+- Element-wise expressions such `abs`, `alias`, `cast`, `__invert__`, and
+  many more, preserve the input kind (unless `expr` is a `WINDOW`, in
+  which case it becomes a `TRANSFORM`. This is because for an expression
+  to be `WINDOW`, the last expression needs to be the order-dependent one).
+- `rolling_*`, `cum_*`, `diff`, `shift`, `ewm_mean`, and `is_*_distinct`
+  are window functions and result in `WINDOW`.
+- `mean`, `std`, `median`, and other aggregations result in `AGGREGATION`,
+  and can only be applied to `TRANSFORM` and `WINDOW`.
+- `drop_nulls` and `filter` result in `FILTRATION`, and can only be applied
+  to `TRANSFORM` and `WINDOW`.
+- `over` always results in `TRANSFORM`. This is a bit more complicated,
+  so we elaborate on it in the ["You open a window ..."](#you-open-a-window-to-another-window-to-another-window-to-another-window).
+
+#### Binary operations (e.g. `nw.col('a') + nw.col('b')`)
+
+How do expression kinds change under binary operations? For example,
+if we do `expr1 + expr2`, then what can we say about the output kind?
+The rules are:
+
+- If both are `LITERAL`, then the output is `LITERAL`.
+- If one is a `FILTRATION`, then:
+
+    - if the other is `LITERAL` or `AGGREGATION`, then the output is `FILTRATION`.
+    - else, we raise an error.
+
+- If one is `TRANSFORM` or `WINDOW` and the other is not `FILTRATION`,
+  then the output is `TRANSFORM`.
+- If one is `AGGREGATION` and the other is `LITERAL` or `AGGREGATION`,
+  the output is `AGGREGATION`.
+
+For n-ary operations such as `nw.sum_horizontal`, the above logic is
+extended across inputs. For example, `nw.sum_horizontal(expr1, expr2, expr3)`
+is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are.
+
+### "You open a window to another window to another window to another window"
+
+When we print out an expression, in addition to the expression kind,
+we also see `window_kind`. There are four window kinds:
+
+- `NONE`: non-order-dependent operations, like `.abs()` or `.mean()`.
+- `CLOSEABLE`: expression where the last operation is order-dependent. For
+  example, `nw.col('a').diff()`.
+- `UNCLOSEABLE`: expression where some operation is order-dependent but
+  the order-dependent operation wasn't the last one. For example,
+  `nw.col('a').diff().abs()`.
+- `CLOSED`: expression contains `over` at some point, and any order-dependent
+  operation was immediately followed by `over(order_by=...)`.
+
+When working with `DataFrame`s, row order is well-defined, as the dataframes
+are assumed to be eager and in-memory. Therefore, it's allowed to work
+with all window kinds.
+
+When working with `LazyFrame`s, on the other hand, row order is undefined.
+Therefore, window kinds must either be `NONE` or `CLOSED`.
+
+### Broadcasting
 
 When performing comparisons between columns and aggregations or scalars, we operate as if the
 aggregation or scalar was broadcasted to the length of the whole column. For example, if we
@@ -282,14 +379,7 @@ with values `[-1, 0, 1]`.
 
 Different libraries do broadcasting differently. SQL-like libraries require an empty window
 function for expressions (e.g. `a - sum(a) over ()`), Polars does its own broadcasting of
-length-1 Series, and pandas does its own broadcasting of scalars. Narwhals keeps track of
-when to trigger a broadcast by tracking the `ExprKind` of each expression. `ExprKind` is an
-`Enum` with four variants:
-
-- `TRANSFORM`: expressions which don't change length (e.g. `nw.col('a').abs()`).
-- `AGGREGATION`: expressions which reduce a column to a single value (e.g. `nw.col('a').mean()`).
-- `CHANGE_LENGTH`: expressions which change length but don't necessarily aggregate (e.g. `nw.col('a').drop_nulls()`).
-- `LITERAL`: expressions which correspond to literal values, such as the `3` in `nw.col('a')+3`.
+length-1 Series, and pandas does its own broadcasting of scalars.
 
 Narwhals triggers a broadcast in these situations:
 

diff --git a/narwhals/_expression_parsing.py b/narwhals/_expression_parsing.py
@@ -121,7 +121,7 @@ class ExprKind(Enum):
     - LITERAL vs LITERAL -> LITERAL
     - FILTRATION vs (LITERAL | AGGREGATION) -> FILTRATION
     - FILTRATION vs (FILTRATION | TRANSFORM | WINDOW) -> raise
-    - (TRANSFORM | WINDOW) vs (LITERAL | AGGREGATION) -> TRANSFORM
+    - (TRANSFORM | WINDOW) vs (...) -> TRANSFORM
     - AGGREGATION vs (LITERAL | AGGREGATION) -> AGGREGATION
     """
 
@@ -191,30 +191,64 @@ def is_multi_output(
     return expansion_kind in {ExpansionKind.MULTI_NAMED, ExpansionKind.MULTI_UNNAMED}
 
 
+class WindowKind(Enum):
+    """Describe what kind of window the expression contains."""
+
+    NONE = auto()
+    """e.g. `nw.col('a').abs()`, no windows."""
+
+    CLOSEABLE = auto()
+    """e.g. `nw.col('a').cum_sum()` - can be closed if immediately followed by `over(order_by=...)`."""
+
+    UNCLOSEABLE = auto()
+    """e.g. `nw.col('a').cum_sum().abs()` - the window function (`cum_sum`) wasn't immediately followed by
+    `over(order_by=...)`, and so the window is uncloseable.
+
+    Uncloseable windows can be used freely in `nw.DataFrame`, but not in `nw.LazyFrame` where
+    row-order is undefined."""
+
+    CLOSED = auto()
+    """e.g. `nw.col('a').cum_sum().over(order_by='i')`."""
+
+    def is_open(self) -> bool:
+        return self in {WindowKind.UNCLOSEABLE, WindowKind.CLOSEABLE}
+
+    def is_closed(self) -> bool:
+        return self is WindowKind.CLOSED
+
+    def is_uncloseable(self) -> bool:
+        return self is WindowKind.UNCLOSEABLE
+
+
 class ExprMetadata:
-    __slots__ = ("_expansion_kind", "_kind", "_n_open_windows")
+    __slots__ = ("_expansion_kind", "_kind", "_window_kind")
 
     def __init__(
-        self, kind: ExprKind, /, *, n_open_windows: int, expansion_kind: ExpansionKind
+        self,
+        kind: ExprKind,
+        /,
+        *,
+        window_kind: WindowKind,
+        expansion_kind: ExpansionKind,
     ) -> None:
         self._kind: ExprKind = kind
-        self._n_open_windows = n_open_windows
+        self._window_kind = window_kind
         self._expansion_kind = expansion_kind
 
     def __init_subclass__(cls, /, *args: Any, **kwds: Any) -> Never:  # pragma: no cover
         msg = f"Cannot subclass {cls.__name__!r}"
         raise TypeError(msg)
 
     def __repr__(self) -> str:
-        return f"ExprMetadata(kind: {self._kind}, n_open_windows: {self._n_open_windows}, expansion_kind: {self._expansion_kind})"
+        return f"ExprMetadata(kind: {self._kind}, window_kind: {self._window_kind}, expansion_kind: {self._expansion_kind})"
 
     @property
     def kind(self) -> ExprKind:
         return self._kind
 
     @property
-    def n_open_windows(self) -> int:
-        return self._n_open_windows
+    def window_kind(self) -> WindowKind:
+        return self._window_kind
 
     @property
     def expansion_kind(self) -> ExpansionKind:
@@ -223,50 +257,77 @@ def expansion_kind(self) -> ExpansionKind:
     def with_kind(self, kind: ExprKind, /) -> ExprMetadata:
         """Change metadata kind, leaving all other attributes the same."""
         return ExprMetadata(
-            kind, n_open_windows=self._n_open_windows, expansion_kind=self._expansion_kind
+            kind,
+            window_kind=self._window_kind,
+            expansion_kind=self._expansion_kind,
         )
 
-    def with_extra_open_window(self) -> ExprMetadata:
-        """Increment `n_open_windows` leaving other attributes the same."""
+    def with_uncloseable_window(self) -> ExprMetadata:
+        """Add uncloseable window, leaving other attributes the same."""
+        if self._window_kind is WindowKind.CLOSED:  # pragma: no cover
+            msg = "Unreachable code, please report a bug."
+            raise AssertionError(msg)
         return ExprMetadata(
             self.kind,
-            n_open_windows=self._n_open_windows + 1,
+            window_kind=WindowKind.UNCLOSEABLE,
+            expansion_kind=self._expansion_kind,
+        )
+
+    def with_kind_and_closeable_window(self, kind: ExprKind, /) -> ExprMetadata:
+        """Change metadata kind and add closeable window.
+
+        If we already have an uncloseable window, the window stays uncloseable.
+        """
+        if self._window_kind is WindowKind.NONE:
+            window_kind = WindowKind.CLOSEABLE
+        elif self._window_kind is WindowKind.CLOSED:  # pragma: no cover
+            msg = "Unreachable code, please report a bug."
+            raise AssertionError(msg)
+        else:
+            window_kind = WindowKind.UNCLOSEABLE
+        return ExprMetadata(
+            kind,
+            window_kind=window_kind,
             expansion_kind=self._expansion_kind,
         )
 
-    def with_kind_and_extra_open_window(self, kind: ExprKind, /) -> ExprMetadata:
-        """Change metadata kind and increment `n_open_windows`."""
+    def with_kind_and_uncloseable_window(self, kind: ExprKind, /) -> ExprMetadata:
+        """Change metadata kind and set window kind to uncloseable."""
         return ExprMetadata(
             kind,
-            n_open_windows=self._n_open_windows + 1,
+            window_kind=WindowKind.UNCLOSEABLE,
             expansion_kind=self._expansion_kind,
         )
 
     @staticmethod
-    def simple_selector() -> ExprMetadata:
+    def selector_single() -> ExprMetadata:
         # e.g. `nw.col('a')`, `nw.nth(0)`
         return ExprMetadata(
-            ExprKind.TRANSFORM, n_open_windows=0, expansion_kind=ExpansionKind.SINGLE
+            ExprKind.TRANSFORM,
+            window_kind=WindowKind.NONE,
+            expansion_kind=ExpansionKind.SINGLE,
         )
 
     @staticmethod
-    def multi_output_selector_named() -> ExprMetadata:
+    def selector_multi_named() -> ExprMetadata:
         # e.g. `nw.col('a', 'b')`
         return ExprMetadata(
-            ExprKind.TRANSFORM, n_open_windows=0, expansion_kind=ExpansionKind.MULTI_NAMED
+            ExprKind.TRANSFORM,
+            window_kind=WindowKind.NONE,
+            expansion_kind=ExpansionKind.MULTI_NAMED,
         )
 
     @staticmethod
-    def multi_output_selector_unnamed() -> ExprMetadata:
+    def selector_multi_unnamed() -> ExprMetadata:
         # e.g. `nw.all()`
         return ExprMetadata(
             ExprKind.TRANSFORM,
-            n_open_windows=0,
+            window_kind=WindowKind.NONE,
             expansion_kind=ExpansionKind.MULTI_UNNAMED,
         )
 
 
-def combine_metadata(
+def combine_metadata(  # noqa: PLR0915
     *args: IntoExpr | object | None,
     str_as_lit: bool,
     allow_multi_output: bool,
@@ -285,8 +346,10 @@ def combine_metadata(
     has_transforms_or_windows = False
     has_aggregations = False
     has_literals = False
-    result_n_open_windows = 0
     result_expansion_kind = ExpansionKind.SINGLE
+    has_closeable_windows = False
+    has_uncloseable_windows = False
+    has_closed_windows = False
 
     for i, arg in enumerate(args):
         if isinstance(arg, str) and not str_as_lit:
@@ -307,8 +370,6 @@ def combine_metadata(
                         result_expansion_kind = resolve_expansion_kind(
                             result_expansion_kind, arg._metadata.expansion_kind
                         )
-            if arg._metadata.n_open_windows:
-                result_n_open_windows += 1
             kind = arg._metadata.kind
             if kind is ExprKind.AGGREGATION:
                 has_aggregations = True
@@ -322,6 +383,14 @@ def combine_metadata(
                 msg = "unreachable code"
                 raise AssertionError(msg)
 
+            window_kind = arg._metadata.window_kind
+            if window_kind is WindowKind.UNCLOSEABLE:
+                has_uncloseable_windows = True
+            elif window_kind is WindowKind.CLOSEABLE:
+                has_closeable_windows = True
+            elif window_kind is WindowKind.CLOSED:
+                has_closed_windows = True
+
     if (
         has_literals
         and not has_aggregations
@@ -342,10 +411,15 @@ def combine_metadata(
     else:
         result_kind = ExprKind.AGGREGATION
 
+    if has_uncloseable_windows or has_closeable_windows:
+        result_window_kind = WindowKind.UNCLOSEABLE
+    elif has_closed_windows:
+        result_window_kind = WindowKind.CLOSED
+    else:
+        result_window_kind = WindowKind.NONE
+
     return ExprMetadata(
-        result_kind,
-        n_open_windows=result_n_open_windows,
-        expansion_kind=result_expansion_kind,
+        result_kind, window_kind=result_window_kind, expansion_kind=result_expansion_kind
     )
 
 

diff --git a/narwhals/dataframe.py b/narwhals/dataframe.py
@@ -2152,7 +2152,7 @@ def _extract_compliant(self: Self, arg: Any) -> Any:
             plx = self.__narwhals_namespace__()
             return plx.col(arg)
         if isinstance(arg, Expr):
-            if arg._metadata.n_open_windows > 0:
+            if arg._metadata._window_kind.is_open():
                 msg = (
                     "Order-dependent expressions are not supported for use in LazyFrame.\n\n"
                     "Hints:\n"