feat: add `rank` for Lazy backends by raisadz · Pull Request #2310 · narwhals-dev/narwhals

raisadz · 2025-03-28T18:49:06Z

What type of PR is this? (check all applicable)

Related issues

Related issue #<issue number>
Closes implement remaining order-dependent lazy operations #2174

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

Only min and dense methods are implemented for DuckDB and PySpark without .over().

FBruzzesi · 2025-03-28T19:39:11Z

Hey @raisadz thanks for the PR. I didn't have the time to look at this, I just wanted to drop a comment and ping @marvinl803 who also took a look at rank and maybe can share some insights for the remaining methods 🙌🏼

MarcoGorelli

thanks for working on this 🙌 !

it's occurred to me that we class rank as ExprKind.WINDOW and that, after narwhals.stable.v1, it won't be supported to use rank if not followed by over(order_by=...)

Here it works because the import is import narwhals.stable.v1 as nw, but would raise for import narwhals as nw

🤔 gonna have to think about a solution here, not super-sure

maybe, just maybe, we may want to introduce nw.rank instead of Expr.rank, because then that would compose more naturally with over. And for eager backends, nw.col('a').rank() does the same thing as nw.rank().over('a')

🤔 i'm not totally sure, will do a bit more thinking on this one, but i think we can come up with something

MarcoGorelli · 2025-03-29T09:21:26Z

Having slept on this, I'd suggest:

nw.rank().over(order_by='a')  # equivalent to pl.col('a').rank()

An alternative could be nw.rank('a', partition_by='b'), but then what happens if a user writes nw.rank('a', partition_by='b').over('b')? I think in general, any expression that uses over internally is problematic

The only solution I can currently think of is: nw.rank().over(order_by='a'), and deprecate Expr.rank (but keep it around in stable.v1 - this is especially important here because, IIRC, we already have downstream libraries using nw.Expr.rank)

dangotbanned · 2025-03-29T10:10:20Z

#2310 (comment)

@MarcoGorelli I haven't looked into this more than your two comments, so apologies in advance if this seems too simplified of a view.

IMO, the current narwhals internal implementation of the polars API shouldn't lead the decision on the narwhals public API.
If that is the case here, the general concept (if you're unfamiliar) is called a leaky abstraction.

I'm not sure how likely this would be, but another concern might be - what if polars adds a pl.rank in the future that has different behaviour?
The list of sugar functions is still growing

MarcoGorelli · 2025-03-29T10:15:57Z

agree, i'm also not keen on implementation contraints determining the api, ideally it should be the opposite...

will think about this more, but maybe we could get to:

✅ allowed:

nw.col('a').rank()
nw.col('a').rank().over('b')

❌ disallowed:

nw.col('a').rank().over('b', order_by='c')

Then there's also:

nw.col('a').rank().abs().over('b', order_by='c')

that requires a decision

This involves more refactors, but...it's probably worth it

…ends

dangotbanned · 2025-03-29T11:26:25Z

#2310 (comment)

@MarcoGorelli that seems consistent with the pl.Expr.rank docs 👍

I don't know if this already plays a factor in the narwhals API.
Do you think about aiming to cover sequences of expressions that are demonstrated in the polars docs?
That seems like a good compass to me - even if we can't always get there

MarcoGorelli · 2025-03-29T11:29:47Z

Yeah i think it's good to aim for as much expressivity as Polars offers, with the restriction that for LazyFrame we only allow relational expressions - so, lf.select(nw.col('a').drop_nulls(), nw.col('b').drop_nulls()) will never be allowed, as it may result in tuple which were never part of the original relation.
The rest should in theory just be limited by how clear our mental model and implemenation is

MarcoGorelli

thanks @raisadz - I think we can ship this, we'll sort out .over behaviour later, just left a minor comment

MarcoGorelli · 2025-03-29T21:43:10Z

narwhals/_duckdb/expr.py

+            sql = (
+                f"CASE WHEN {_input} IS NULL THEN NULL "
+                f"ELSE {func_name}() OVER ({order_by_sql}) END"
+            )


can we flip this round (case when not-null), like you did for the spark-like case?

MarcoGorelli · 2025-03-29T21:44:41Z

i'll address #2310 (comment) now, gonna start by getting this in though

Was missed in (#2310)

* feat(typing): Add `FromNative` protocol * chore(typing): Add `FromNative` to `CompliantSeries` - Adding `._is_native` made `TypeVar` invariant - Realistically, it always was, but underspecified * feat: Implement for `(Arrow|PandasLike)Series` * feat: Implement for `PolarsSeries` + get some coverage * chore: `ArrowSeries` coverage * chore: `PandasLikeSeries` partial coverage * ignore coverage for now ... * feat(typing): Add `CompliantDataFrame.from_native` * feat: Implement for `ArrowDataFrame` Also coverage for `ArrowSeries` * feat: Implement for `PandasLikeDataFrame` Loads of coverage for both `PandasLike` * feat: Implement for `PolarsDataFrame` * refactor: Found one more * chore(typing): Fix missing `SQLExpression` ignore Was missed in (#2310) * feat: Implement `EagerNamespace.from_native` - `Polars*` will also need to handle `LazyFrame` - `Lazy*` has other constraints * feat: Add `Polars(Namespace|LazyFrame).from_native` Probably need to add a `LazyNamespace` protocol for `LazyOnly` * chore: Ignore coverage `PandasLikeDataFrame._is_native` Nowhere to use it yet, current stuff uses the more precise `self.native.__class__` * feat: Add all `CompliantLazyFrame.from_native` * feat: Add `LazyNamespace.from_native` * refactor: Get some lazy coverage https://github.com/narwhals-dev/narwhals/actions/runs/14157697512/job/39659084059?pr=2315 * refactor: More `polars` coverage https://github.com/narwhals-dev/narwhals/actions/runs/14158424987/job/39660662342?pr=2315 * refactor: reuse `is_spark_like_dataframe` * Update narwhals/_compliant/namespace.py Co-authored-by: Dan Redding <125183946+dangotbanned@users.noreply.github.com> --------- Co-authored-by: Marco Edward Gorelli <marcogorelli@protonmail.com>

raisadz added 4 commits March 28, 2025 18:43

feat: add rank for Lazy backends

c1d6897

fixup

4edcf57

fix descending flag, add tests for it

dcc1438

add data_int

d643504

raisadz marked this pull request as ready for review March 28, 2025 19:29

simplify spark expr

aa53839

MarcoGorelli reviewed Mar 28, 2025

View reviewed changes

MarcoGorelli mentioned this pull request Mar 28, 2025

enh: support Expr.is_unique in over context #2308

Closed

raisadz added 2 commits March 29, 2025 10:39

remove default descending args

90047b5

Merge remote-tracking branch 'upstream/main' into feat/rank_lazy_back…

69426d4

…ends

MarcoGorelli approved these changes Mar 29, 2025

View reviewed changes

MarcoGorelli merged commit 8a79617 into narwhals-dev:main Mar 29, 2025
28 checks passed

dangotbanned added a commit that referenced this pull request Mar 30, 2025

chore(typing): Fix missing SQLExpression ignore

ec187d5

Was missed in (#2310)

MarcoGorelli added the enhancement New feature or request label Mar 31, 2025

MarcoGorelli mentioned this pull request May 4, 2025

refactor: rewrite expression-parsing for the 1,432th time #2478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `rank` for Lazy backends#2310

feat: add `rank` for Lazy backends#2310
MarcoGorelli merged 7 commits intonarwhals-dev:mainfrom
raisadz:feat/rank_lazy_backends

raisadz commented Mar 28, 2025

Uh oh!

FBruzzesi commented Mar 28, 2025

Uh oh!

MarcoGorelli left a comment

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

dangotbanned commented Mar 29, 2025

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

dangotbanned commented Mar 29, 2025

Uh oh!

MarcoGorelli commented Mar 29, 2025 •

edited

Loading

Uh oh!

MarcoGorelli left a comment •

edited

Loading

Uh oh!

MarcoGorelli Mar 29, 2025

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

raisadz commented Mar 28, 2025

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

FBruzzesi commented Mar 28, 2025

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

dangotbanned commented Mar 29, 2025

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

dangotbanned commented Mar 29, 2025

Uh oh!

MarcoGorelli commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli Mar 29, 2025

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli commented Mar 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MarcoGorelli commented Mar 29, 2025 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading