feat: Adds `{Expr,Series}.{first,last}` #2528

+        Examples:
+            >>> import polars as pl
+            >>> import narwhals as nw
+            >>>
+            >>> s_native = pl.Series([1, 2, 3])
+            >>> s_nw = nw.from_native(s_native, series_only=True)
+            >>> s_nw.first()
+            1
+            >>> s_nw.filter(s_nw > 5).first() is None
+            True


I don't like the None example, but this was the only way I saw to get a repr 😞

I think it's important to have an example for that case though - since pandas and pyarrow would raise an index error normally

The description is exactly https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.first.html

#2528 (comment)

Still need to add `dask`, `duckdb` equivalent of (bd4ab89)

tests/expr_and_series/first_test.py

MarcoGorelli · 2025-10-02T22:30:14Z

narwhals/_arrow/expr.py

+                results = self(df.drop([token], strict=True))
+                if meta is not None and meta.last_node is ExprKind.ORDERABLE_AGGREGATION:
+                    # Orderable aggregations require `order_by` columns and results in a
+                    # scalar output (well actually in a length 1 series).
+                    # Therefore we need to broadcast the results to the original size, since
+                    # `over` is not a length changing operation.
+                    size = len(df)
+                    return [s._with_native(pa.repeat(s.item(), size)) for s in results]


ok before this PR, we need to support

df = nw.from_native(pl.DataFrame({'a': [1,2,3,4,None,None,2,None,2], 'b': [1,1,1,1,1,1,2,2,2]})).lazy('duckdb').collect('pandas') print(df.with_columns( nw.col('a').diff().mean().over(order_by='b') ))

which currently raises for both pandas and pyarrow

What is the relation between that and this PR?

it requires the same kind of solution

the fact that the it's orderable shouldn't be relevant, and it's not enough to just look at the last node

I've just tried that example out natively in polars

I'm getting the same result from both of these:

pl.col("a").diff().mean() pl.col("a").diff().mean().over(order_by="b")

If I change the input data in either "a" or "b", the result of "a" is always the mean broadcast to length

Note
Update: I didn't test it here, but over does have an impact if you use .over(order_by="a")
But the result is still broadcast

Show repro

import polars as pl import narwhals as nw data_orig = {"a": [1, 2, 3, 4, None, None, 2, None, 2], "b": [1, 1, 1, 1, 1, 1, 2, 2, 2]} data_b_non_asc = { "a": [1, 2, 3, 4, None, None, 2, None, 2], "b": [1, 5, 1, 1, 1, 1, 2, 2, 2], } data_a_varied = { "a": [1, 2, 5, 4, None, None, 2, 12, 2], "b": [1, 1, 1, 1, 3, 1, 3, 2, 2], } datasets = { "Original": data_orig, "`b` non-ascending": data_b_non_asc, "`a` varied": data_a_varied, } diff = pl.col("a").diff() diff_mean = diff.mean() diff_mean_order_b = diff_mean.over(order_by="b") native = pl.LazyFrame(data_orig) with pl.Config(tbl_hide_dataframe_shape=True): for name, data in datasets.items(): native = pl.LazyFrame(data) underline = "-" * len(name) print(name, underline, sep="\n") print(diff, native.with_columns(diff).collect(), sep="\n") print(diff_mean, native.with_columns(diff_mean).collect(), sep="\n") print( diff_mean_order_b, native.with_columns(diff_mean_order_b).collect(), sep="\n" )

Show output

Original -------- col("a").diff([dyn int: 1]) ┌──────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════╪═════╡ │ null ┆ 1 │ │ 1 ┆ 1 │ │ 1 ┆ 1 │ │ 1 ┆ 1 │ │ null ┆ 1 │ │ null ┆ 1 │ │ null ┆ 2 │ │ null ┆ 2 │ │ null ┆ 2 │ └──────┴─────┘ col("a").diff([dyn int: 1]).mean() ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ └─────┴─────┘ col("a").diff([dyn int: 1]).mean().over(partition_by: [dyn int: 1], order_by: col("b")) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ └─────┴─────┘ `b` non-ascending ----------------- col("a").diff([dyn int: 1]) ┌──────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════╪═════╡ │ null ┆ 1 │ │ 1 ┆ 5 │ │ 1 ┆ 1 │ │ 1 ┆ 1 │ │ null ┆ 1 │ │ null ┆ 1 │ │ null ┆ 2 │ │ null ┆ 2 │ │ null ┆ 2 │ └──────┴─────┘ col("a").diff([dyn int: 1]).mean() ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 1.0 ┆ 1 │ │ 1.0 ┆ 5 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ └─────┴─────┘ col("a").diff([dyn int: 1]).mean().over(partition_by: [dyn int: 1], order_by: col("b")) ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 1.0 ┆ 1 │ │ 1.0 ┆ 5 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 1 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ │ 1.0 ┆ 2 │ └─────┴─────┘ `a` varied ---------- col("a").diff([dyn int: 1]) ┌──────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ i64 ┆ i64 │ ╞══════╪═════╡ │ null ┆ 1 │ │ 1 ┆ 1 │ │ 3 ┆ 1 │ │ -1 ┆ 1 │ │ null ┆ 3 │ │ null ┆ 1 │ │ null ┆ 3 │ │ 10 ┆ 2 │ │ -10 ┆ 2 │ └──────┴─────┘ col("a").diff([dyn int: 1]).mean() ┌─────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═════╪═════╡ │ 0.6 ┆ 1 │ │ 0.6 ┆ 1 │ │ 0.6 ┆ 1 │ │ 0.6 ┆ 1 │ │ 0.6 ┆ 3 │ │ 0.6 ┆ 1 │ │ 0.6 ┆ 3 │ │ 0.6 ┆ 2 │ │ 0.6 ┆ 2 │ └─────┴─────┘ col("a").diff([dyn int: 1]).mean().over(partition_by: [dyn int: 1], order_by: col("b")) ┌───────┬─────┐ │ a ┆ b │ │ --- ┆ --- │ │ f64 ┆ i64 │ ╞═══════╪═════╡ │ -1.75 ┆ 1 │ │ -1.75 ┆ 1 │ │ -1.75 ┆ 1 │ │ -1.75 ┆ 1 │ │ -1.75 ┆ 3 │ │ -1.75 ┆ 1 │ │ -1.75 ┆ 3 │ │ -1.75 ┆ 2 │ │ -1.75 ┆ 2 │ └───────┴─────┘

nw.col('a').diff().mean().over(order_by='b')

@MarcoGorelli was this based on something you've used in polars before?

I've had a look through this recent PR:

fix: More precisely model expression ordering requirements pola-rs/polars#24437

I was surprised that diff doesn't seem to have any ordering requirements 🤔

Some select bits from it though:

Function

Window

ExprOutputOrder

All the rules on aggregations

Note-worthy: first, last and implode (#2660)

simple example where it makes a difference:

In [13]: df = pl.DataFrame({'a': [1, 2, 3], 'b': [0, 2, 1]}) In [14]: df.with_columns(c=pl.col('a').diff().mean()) Out[14]: shape: (3, 3) ┌─────┬─────┬──────────────────────────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ f64 │ ╞═════╪═════╪══════════════════════════╡ │ 1 ┆ 0 ┆ 1.00000000000000000000e0 │ │ 2 ┆ 2 ┆ 1.00000000000000000000e0 │ │ 3 ┆ 1 ┆ 1.00000000000000000000e0 │ └─────┴─────┴──────────────────────────┘ In [15]: df.with_columns(c=pl.col('a').diff().mean().over(order_by='b')) Out[15]: shape: (3, 3) ┌─────┬─────┬───────────────────────────┐ │ a ┆ b ┆ c │ │ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ f64 │ ╞═════╪═════╪═══════════════════════════╡ │ 1 ┆ 0 ┆ 5.00000000000000000000e-1 │ │ 2 ┆ 2 ┆ 5.00000000000000000000e-1 │ │ 3 ┆ 1 ┆ 5.00000000000000000000e-1 │ └─────┴─────┴───────────────────────────┘

1.00000000000000000000e0

What were you up to needing this much precision? 😄

MarcoGorelli

cool, i think i'm finally happy to ship this

thanks both for having got this started!

dangotbanned · 2025-10-03T20:36:56Z

@MarcoGorelli I've just been trying out some tests from before I removed the initial lazy support (4618d01)

I think the docstring for first, last needs to be clearer on what is/isn't allowed

import narwhals as nw

data = {"a": [1, 1, 2, 2], "b": ["foo", None, None, "baz"]}
df = nw.from_dict(data, backend="polars")

This is fine

>>> df.select(nw.col("a", "b").first())

┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (1, 2)   |
|  ┌─────┬─────┐   |
|  │ a   ┆ b   │   |
|  │ --- ┆ --- │   |
|  │ i64 ┆ str │   |
|  ╞═════╪═════╡   |
|  │ 1   ┆ foo │   |
|  └─────┴─────┘   |
└──────────────────┘

This isn't allowed, and the error message points you in the wrong direction:

>>> df.lazy().select(nw.col("a", "b").first()).collect()
InvalidOperationError: Order-dependent expressions are not supported for use in LazyFrame.

Hint: To make the expression valid, use `.over` with `order_by` specified.

For example, if you wrote `nw.col('price').cum_sum()` and you have a column
`'date'` which orders your data, then replace:

   nw.col('price').cum_sum()

 with:

   nw.col('price').cum_sum().over(order_by='date')
                            ^^^^^^^^^^^^^^^^^^^^^^

See https://narwhals-dev.github.io/narwhals/concepts/order_dependence/.

>>> df.with_row_index("i").lazy().select(
    nw.col("a", "b").first().over(order_by="i")
).collect()

ShapeError: Series b, length 1 doesn't match the DataFrame height of 4

If you want expression: col("b").first().over(partition_by: [1], order_by: col("i")) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

So we have this circular thing where we want order_by, but polars wants first 😂

I do understand you've rejected sort_by (#2534 (comment)), but it does solve this use-case (if we ever venture down that road again)

data = {"a": [1, 1, 2, 2], "b": ["foo", None, None, "baz"]}
df = pl.DataFrame(data).with_row_index("i").sort("i", descending=True)
>>> df.lazy().select(pl.col("a", "b").sort_by("i").first()).collect()

shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ foo │
└─────┴─────┘

MarcoGorelli · 2025-10-03T20:43:38Z

I think this might be a bug in Polars, will check, but thanks for spotting it

dangotbanned · 2025-10-03T20:54:28Z

I think this might be a bug in Polars, will check, but thanks for spotting it

Thanks marco!

Just to be 100% clear, I'm not trying to relitigate sort_by
(#2528 (comment)) is just highlighting a UX issue

I am quite keen to see a proposed API for {min,max}_by and maybe discuss in another issue (#2526 (comment)) 🙂

MarcoGorelli · 2025-10-03T21:43:37Z

Reported here: pola-rs/polars#24756

It's a fairly simple workaround, fortunately (just use pl.repeat(1, pl.len()) instead of pl.lit(1) in over)

dangotbanned added 10 commits May 10, 2025 20:34

chore: Add CompliantExpr.first

ff661ae

Towards (#2526)

feat: "Implement" PolarsExpr.First

1b77bd7

feat: Add EagerExpr.first

e84cba3

chore: Repeat for *Series

25ef241

feat: Add (Arrow|PandasLike)Series.first()

78822aa

chore: Mark LazyExpr.first as not_implemented for now

4075c50

See #2526 (comment)

feat: Add SparkLikeExpr.first

45f24b9

feat: Add DuckDBExpr.first

4041dd1

https://duckdb.org/docs/stable/sql/functions/aggregates#firstarg

feat: Add DaskExpr.first

bb9912d

- Less sure about this one - `head(1)` also seemed like an option

revert: 4075c50

6a53aa1

All have *an* implementation now

dangotbanned added the enhancement New feature or request label May 10, 2025

dangotbanned changed the title ~~feat(DRAFT): Adds Expr.first()~~ feat(DRAFT): Adds (Expr|Series).first() May 10, 2025

MarcoGorelli reviewed May 10, 2025

View reviewed changes

narwhals/_duckdb/expr.py Outdated Show resolved Hide resolved

dangotbanned added 7 commits May 10, 2025 22:26

feat: Add nw.Series.first

4efc939

test: Add Series.first tests

fc149c1

fix: I guess the stubs were wrong then?

7489e61

fix: Handle the out-of-bounds case

d2719a4

https://docs.pola.rs/api/python/stable/reference/series/api/polars.Series.first.html

fix: polars backcompat

0af11db

pola-rs/polars#19093

docs: Add Series.first

afe20f0

lol version typo

6c0bd6f

https://github.com/narwhals-dev/narwhals/actions/runs/14949511597/job/41997113260?pr=2528

dangotbanned commented May 10, 2025

View reviewed changes

narwhals/_compliant/series.py Outdated Show resolved Hide resolved

cov

e0fdf78

https://github.com/narwhals-dev/narwhals/actions/runs/14949533546/job/41997163953?pr=2528

dangotbanned commented May 10, 2025

View reviewed changes

dangotbanned added 5 commits May 11, 2025 12:11

chore: Add nw.Expr.first

aa7c510

The description is exactly https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.first.html

Merge remote-tracking branch 'upstream/main' into expr-first

4fdc0aa

feat: Maybe SparkLike requires order_by?

bd4ab89

#2528 (comment)

test: Try out eager backends

9f7f5a9

Still need to add `dask`, `duckdb` equivalent of (bd4ab89)

Merge branch 'main' into expr-first

ddb50d2

dangotbanned commented May 11, 2025

View reviewed changes

tests/expr_and_series/first_test.py Outdated Show resolved Hide resolved

MarcoGorelli added 8 commits October 2, 2025 19:58

sort out ibis

c87935d

dask

0393dfe

add note to docs

466c922

remove unnecessary code

4266e4b

pyarrow

555098b

fixup

36e38e0

typing

42d2cd6

dask

63f012a

MarcoGorelli reviewed Oct 2, 2025

View reviewed changes

MarcoGorelli added 12 commits October 3, 2025 15:26

test and support diff().sum().over(order_by=...)

c4ac043

cross-pandas version compat

8739b6a

make test more unusual

ff22604

fix another pyarrow issue

d9c4a1b

catch more warnings for modin

03b7969

factor out sql_expression, link to feature request

d01a398

combine first and last blocks

18c0861

remove more unneeded

948d96d

less special-casing

8810d03

simplify further

843549f

Merge remote-tracking branch 'upstream/main' into expr-first

d7be792

typing

363490d

MarcoGorelli approved these changes Oct 3, 2025

View reviewed changes

use repeat_by instead of lit for polars

c25d649

MarcoGorelli removed the eager-only label Oct 3, 2025

MarcoGorelli merged commit 053390d into main Oct 4, 2025
31 of 33 checks passed

MarcoGorelli deleted the expr-first branch October 4, 2025 10:33

feat: Adds {Expr,Series}.{first,last} #2528

feat: Adds {Expr,Series}.{first,last} #2528

Uh oh!

Conversation

dangotbanned commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

dangotbanned commented May 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dangotbanned Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcoGorelli left a comment

Choose a reason for hiding this comment

Uh oh!

dangotbanned commented Oct 3, 2025

Uh oh!

MarcoGorelli commented Oct 3, 2025

Uh oh!

dangotbanned commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Adds `{Expr,Series}.{first,last}` #2528

feat: Adds `{Expr,Series}.{first,last}` #2528

dangotbanned commented May 10, 2025 •

edited

Loading

dangotbanned commented May 10, 2025 •

edited

Loading

dangotbanned Oct 3, 2025 •

edited

Loading

dangotbanned commented Oct 3, 2025 •

edited

Loading