Skip to content

feat: unify exception for nested over statements, nested aggregations, filtrations on aggregations, document expression metadata better#2351

Merged
MarcoGorelli merged 16 commits intonarwhals-dev:mainfrom
MarcoGorelli:no-double-over
Apr 9, 2025
Merged

feat: unify exception for nested over statements, nested aggregations, filtrations on aggregations, document expression metadata better#2351
MarcoGorelli merged 16 commits intonarwhals-dev:mainfrom
MarcoGorelli:no-double-over

Conversation

@MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Apr 5, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 5, 2025 21:39
@MarcoGorelli MarcoGorelli added documentation Improvements or additions to documentation enhancement New feature or request error reporting labels Apr 6, 2025
expr_methods = [
i
for i in nw.Expr(lambda: 0, ExprMetadata.simple_selector()).__dir__()
for i in nw.Expr(lambda: 0, ExprMetadata.selector_single()).__dir__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the lambda: 0, part is for here - but pyright is shouting at all 4 of these

Argument of type "() -> Literal[0]" cannot be assigned to parameter "to_compliant_expr" of type "_ToCompliant" in function "__init__"
  Type "() -> Literal[0]" is not assignable to type "_ToCompliant"
    Function accepts too many positional parameters; expected 0 but received 1
      Function return type "Literal[0]" is incompatible with type "CompliantExpr[Any, Any]"
        "Literal[0]" is incompatible with protocol "CompliantExpr[Any, Any]"
          "_implementation" is not present
          "_backend_version" is not present
          "_version" is not present
          "_evaluate_output_names" is not present
  ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to get the list of methods which exist of Expr, it's an internal script

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Apr 7, 2025

This can be improved, marking as draft for now done

@MarcoGorelli MarcoGorelli marked this pull request as draft April 7, 2025 07:46
@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 7, 2025 10:55
@MarcoGorelli MarcoGorelli marked this pull request as draft April 7, 2025 21:59
@dangotbanned
Copy link
Member

@MarcoGorelli @FBruzzesi

Is there overlap with the rules defined here and what we plan to allow in #2325?

I was thinking about (#2225 (comment)) again and just wanted to double-check things are aligned (if they need to be)

@MarcoGorelli
Copy link
Member Author

Is there overlap with the rules defined here and what we plan to allow in #2325?

somewhat - in #2325, we should only allow TRANSFORM expressions where window_kind is NONE as arguments to group_by

@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 8, 2025 16:41
@dangotbanned dangotbanned self-requested a review April 8, 2025 17:10
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great stuff @MarcoGorelli 🎉

Left a few notes/suggestions - also maybe consider renaming the PR to something shorter 😉

print(nw.col("a").mean().over("b"))
```

Note how they tell us something but their metadata. This section is all about
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how they tell us something but their metadata

Is this a typo?

Comment on lines 299 to 300
aggregate (e.g. `nw.col('a').drop_nulls()`).
- `FILTRATION`: expressions which change length but don't
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aggregate (e.g. `nw.col('a').drop_nulls()`).
- `FILTRATION`: expressions which change length but don't
- `FILTRATION`: expressions which change length but don't
aggregate (e.g. `nw.col('a').drop_nulls()`).

- `nw.col('a')` is not order-dependent, so it's `TRANSFORM`.
- `nw.col('a').abs()` is not order-dependent, so it's a `TRANSFORM`.
- `nw.col('a').cum_sum()`'s last operation is `cum_sum`, so it's `WINDOW`.
- `nw.col('a').cum_sum()+1`'s last operation is `__add__`, and it preserves
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `nw.col('a').cum_sum()+1`'s last operation is `__add__`, and it preserves
- `nw.col('a').cum_sum() + 1`'s last operation is `__add__`, and it preserves

Comment on lines +310 to +315
How these change depends on the operation.

#### Chaining

Say we have `expr.expr_method()`. How does `expr`'s `ExprMetadata` change?
This depends on `expr_method`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what to suggest, but some notes

  • Maybe the sentence before Chaining is redundant?
  • After Chaining I think there's a few too many expr/Expr 😵

- `drop_nulls` and `filter` result in `FILTRATION`, and can only be applied
to `TRANSFORM` and `WINDOW`.
- `over` always results in `TRANSFORM`. This is a bit more complicated,
so we elaborate on it in the "You open a window" section below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
so we elaborate on it in the "You open a window" section below.
so we elaborate on it in ["You open a window ..."](#you-open-a-window-to-another-window-to-another-window-to-another-window).

Comment on lines +330 to +349
#### Binary operations (e.g. `nw.col('a') + nw.col('b')`)

How do expression kinds change under binary operations? For example,
if we do `expr1 + expr2`, then what can we say about the output kind?
The rules are:

- If both are `LITERAL`, then the output is `LITERAL`.
- If one is a `FILTRATION`, then:

- if the other is `LITERAL` or `AGGREGATION`, then the output is `FILTRATION`.
- else, we raise an error.

- If one is `TRANSFORM` or `WINDOW` and the other is not `FILTRATION`,
then the output is `TRANSFORM`.
- If one is `AGGREGATION` and the other is `LITERAL` or `AGGREGATION`,
the output is `AGGREGATION`.

For n-ary operations such as `nw.sum_horizontal`, the above logic is
extended across inputs. For example, `nw.sum_horizontal(expr1, expr2, expr3)`
is `LITERAL` if all of `expr1`, `expr2`, and `expr3` are.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about it for this PR, but I feel like this could really benefit from some kind of visual aid.

Not sure what format would make the most sense - just wanted to note it as tough to keep track of in my head 😔

### "You open a window to another window to another window to another window"

When we print out an expression, in addition to the expression kind,
we also see `window_kind`. These are four window kinds:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
we also see `window_kind`. These are four window kinds:
we also see `window_kind`. There are four window kinds:

- FILTRATION vs (LITERAL | AGGREGATION) -> FILTRATION
- FILTRATION vs (FILTRATION | TRANSFORM | WINDOW) -> raise
- (TRANSFORM | WINDOW) vs (LITERAL | AGGREGATION) -> TRANSFORM
- (TRANSFORM | WINDOW) vs (anything) -> TRANSFORM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- (TRANSFORM | WINDOW) vs (anything) -> TRANSFORM
- (TRANSFORM | WINDOW) vs (...) -> TRANSFORM

@MarcoGorelli
Copy link
Member Author

thanks for your review! really appreciate it

@MarcoGorelli MarcoGorelli merged commit 16b4527 into narwhals-dev:main Apr 9, 2025
29 checks passed
dangotbanned added a commit that referenced this pull request Apr 23, 2025
These have been popping up and bugging me for 2 weeks 🙈
#2351 (comment)
MarcoGorelli pushed a commit that referenced this pull request Apr 26, 2025
* chore(typing): fix `check_docstrings` `Path` warning

`pyright` did a good job spotting that one

* chore: Use `dir(...)` instead of `.__dir__()`

Fixes 2/6 errors and shrinks the script

* chore(typing): Fix 2 more errors

These have been popping up and bugging me for 2 weeks 🙈
#2351 (comment)

* chore(typing): Fix the last 2 as well

* chore: Simplify the rest

* chore(typing): Ignore remaining in `utils`

* chore(typing): Ignore everything in `tpch`

These are all the same issue, can be fixed with a constrained `TypeVar`

* fix: pre-`3.13` compat?

Following where `__name__` used to be sourced from
- python/cpython#101876

* chore(typing): Fix some of the new `tpch` ignores

The rest seem pretty unavoidable

* fix: don't clobber `typing` 🤦‍♂️

https://github.com/narwhals-dev/narwhals/actions/runs/14624570842/job/41032838053?pr=2426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request error reporting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants