Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
156 commits
Select commit Hold shift + click to select a range
ff661ae
chore: Add `CompliantExpr.first`
dangotbanned May 10, 2025
1b77bd7
feat: "Implement" `PolarsExpr.First`
dangotbanned May 10, 2025
e84cba3
feat: Add `EagerExpr.first`
dangotbanned May 10, 2025
25ef241
chore: Repeat for `*Series`
dangotbanned May 10, 2025
78822aa
feat: Add `(Arrow|PandasLike)Series.first()`
dangotbanned May 10, 2025
4075c50
chore: Mark `LazyExpr.first` as `not_implemented` for now
dangotbanned May 10, 2025
45f24b9
feat: Add `SparkLikeExpr.first`
dangotbanned May 10, 2025
4041dd1
feat: Add `DuckDBExpr.first`
dangotbanned May 10, 2025
bb9912d
feat: Add `DaskExpr.first`
dangotbanned May 10, 2025
6a53aa1
revert: 4075c50f2496ab9908b25dc15e240650bc686dc0
dangotbanned May 10, 2025
4efc939
feat: Add `nw.Series.first`
dangotbanned May 10, 2025
fc149c1
test: Add `Series.first` tests
dangotbanned May 10, 2025
7489e61
fix: I guess the stubs were wrong then?
dangotbanned May 10, 2025
d2719a4
fix: Handle the out-of-bounds case
dangotbanned May 10, 2025
0af11db
fix: `polars` backcompat
dangotbanned May 10, 2025
afe20f0
docs: Add `Series.first`
dangotbanned May 10, 2025
6c0bd6f
lol version typo
dangotbanned May 10, 2025
e0fdf78
cov
dangotbanned May 10, 2025
aa7c510
chore: Add `nw.Expr.first`
dangotbanned May 11, 2025
4fdc0aa
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned May 11, 2025
bd4ab89
feat: Maybe `SparkLike` requires `order_by`?
dangotbanned May 11, 2025
9f7f5a9
test: Try out eager backends
dangotbanned May 11, 2025
ddb50d2
Merge branch 'main' into expr-first
dangotbanned May 11, 2025
7146f60
test: Add mostly broken lazy tests 😒
dangotbanned May 11, 2025
8c24e6e
feat: `duckdb` support?
dangotbanned May 11, 2025
54a4cb4
test: Update xfails
dangotbanned May 11, 2025
63e0459
fix: Use `head(1)` in `DaskExpr`
dangotbanned May 11, 2025
9493aad
ignore cov
dangotbanned May 11, 2025
88535a4
Apply suggestion
dangotbanned May 11, 2025
77ae9c0
test: Remove dask `xfail`
dangotbanned May 11, 2025
c1a6173
revert: Remove `dask` implementation
dangotbanned May 11, 2025
3c4ff9b
refactor(typing): Use `PythonLiteral` for `Series` return
dangotbanned May 11, 2025
696e35d
Merge branch 'main' into expr-first
dangotbanned May 12, 2025
b2866d2
Merge branch 'main' into expr-first
dangotbanned May 12, 2025
cd002f3
test: Add `test_group_by_agg_first`
dangotbanned May 12, 2025
1458530
feat(DRAFT): Start trying `pyarrow` `agg(first())`
dangotbanned May 12, 2025
962ebcd
fix: Maybe `pyarrow` support?
dangotbanned May 12, 2025
5d310bc
refactor: Add `ArrowGroupBy._configure_agg`
dangotbanned May 12, 2025
a417341
fix: Add `pyarrow` compat for `first`
dangotbanned May 12, 2025
354da1a
fix: Don't support below `14` ever
dangotbanned May 12, 2025
0cea41b
test: Add some `None` cases
dangotbanned May 12, 2025
5229096
feat(DRAFT): Partial support for `pandas`
dangotbanned May 12, 2025
8d3aaec
docs: Tidy error and comments
dangotbanned May 12, 2025
a62e3ef
Merge branch 'main' into expr-first
dangotbanned May 12, 2025
9c36285
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned May 13, 2025
ad8e3f7
test: xfail `ibis`
dangotbanned May 13, 2025
628f71e
feat: Add `IbisExpr.first`
dangotbanned May 13, 2025
deacc71
test: Don't xfail for `pandas<1.0.0`
dangotbanned May 13, 2025
5c52ee4
Merge branch 'main' into expr-first
dangotbanned May 14, 2025
eec2a4f
Merge branch 'main' into expr-first
dangotbanned May 16, 2025
e003bab
Merge branch 'main' into expr-first
dangotbanned May 16, 2025
fb2dc1c
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned May 18, 2025
211673b
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jun 3, 2025
652615f
fix: Use reverted `partition_by`, `_sort`
dangotbanned Jun 13, 2025
68fdfe8
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jun 13, 2025
ecaca9a
fix: Update `DuckDBExpr.first`
dangotbanned Jun 15, 2025
ea30f26
fix: Update `IbisExpr.first`
dangotbanned Jun 15, 2025
12987ee
fix: Update `SparkLikeExpr.first`
dangotbanned Jun 15, 2025
7d70a42
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jun 15, 2025
5446095
test: Update `pandas` xfail
dangotbanned Jun 15, 2025
b927340
Merge branch 'main' into expr-first
dangotbanned Jun 20, 2025
f62c085
test: Don't xfail for pandas `1.1.3<=...<1.1.5`
dangotbanned Jun 20, 2025
45d20c8
Merge branch 'main' into expr-first
dangotbanned Jun 21, 2025
72ab185
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jun 29, 2025
e72b115
fix: Upgrade `DuckDBExpr.first` again
dangotbanned Jun 29, 2025
fae137c
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 19, 2025
cb363be
test(DRAFT): Let's start trying to fix pandas
dangotbanned Jul 19, 2025
bc80a5f
try `pandas>=2.2.1` path
dangotbanned Jul 19, 2025
14051fa
allow very old pandas that worked?
dangotbanned Jul 19, 2025
3d42dcf
test: xfail `pandas[pyarrow]`, `modin[pyarrow]`
dangotbanned Jul 19, 2025
934d09e
Apply suggestion narwhals/_polars/series.py
dangotbanned Jul 20, 2025
3fbf6f2
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 20, 2025
801a7a8
docs: Be more explicit on WIP `pandas`
dangotbanned Jul 20, 2025
47bfaba
docs: Link to long explanation
dangotbanned Jul 20, 2025
4618d01
revert: remove lazy support
dangotbanned Jul 20, 2025
1998ad2
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 20, 2025
570cdaf
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 20, 2025
d561027
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 21, 2025
b77d2b3
try `nth` for `>=1.1.5; <2.0.0`
dangotbanned Jul 21, 2025
2b0bc16
Is this fixed?
dangotbanned Jul 21, 2025
abbb4b7
cov
dangotbanned Jul 21, 2025
ccfe532
feat: Add `(Expr|Series).last`
dangotbanned Jul 21, 2025
dd1f89e
test: Add `last_test.py`
dangotbanned Jul 22, 2025
54b3188
test: Add `test_group_by_agg_last`
dangotbanned Jul 22, 2025
5f9ff6f
fix: Add missing `PandasLikeGroupBy._REMAP_AGGS` entry
dangotbanned Jul 22, 2025
4000b25
test: Repeat `@single_cases` pattern for `first`
dangotbanned Jul 22, 2025
1c62ce2
docs: Examples for `Expr.(first|last)`
dangotbanned Jul 22, 2025
64fdf10
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 22, 2025
063e5d0
Remove `modin` todo
dangotbanned Jul 22, 2025
2e4f260
Merge branch 'main' into expr-first
dangotbanned Jul 23, 2025
65e6804
clean up and doc `pandas`
dangotbanned Jul 23, 2025
22fae20
feat: Warn on new pandas apply path
dangotbanned Jul 23, 2025
60624b9
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 23, 2025
d66fddc
cov
dangotbanned Jul 23, 2025
5e444a5
always use `apply` for `cudf` 😒
dangotbanned Jul 24, 2025
e1a9bc3
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 25, 2025
0cbe33d
Merge branch 'main' into expr-first
dangotbanned Jul 26, 2025
2960736
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Jul 28, 2025
4ede6b2
merge main
FBruzzesi Jul 29, 2025
2ae4245
special path for orderable_aggregation in over
FBruzzesi Jul 29, 2025
b8066c4
expand on comments
FBruzzesi Jul 29, 2025
2dae6ef
assign metadata in arrow
FBruzzesi Jul 29, 2025
3aa52dc
Merge branch 'main' into expr-first
FBruzzesi Aug 2, 2025
7c578c7
Merge branch 'main' into expr-first
dangotbanned Aug 5, 2025
30bad0e
Merge branch 'main' into expr-first
dangotbanned Aug 7, 2025
d269d56
Merge branch 'main' into expr-first
dangotbanned Aug 7, 2025
c0e37aa
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 8, 2025
6f5c05b
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 12, 2025
20be193
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 13, 2025
abd027a
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 14, 2025
1fd9fd3
Merge branch 'main' into expr-first
dangotbanned Aug 15, 2025
94d6b19
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 17, 2025
849a6d9
Merge branch 'main' into expr-first
dangotbanned Aug 18, 2025
476c63e
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 18, 2025
c169104
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 19, 2025
d77fcd1
Merge branch 'main' into expr-first
dangotbanned Aug 19, 2025
1f38bde
Merge branch 'main' into expr-first
dangotbanned Aug 19, 2025
3c63726
Merge branch 'main' into expr-first
dangotbanned Aug 20, 2025
6d7b09b
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 21, 2025
3b6301f
Merge branch 'main' into expr-first
dangotbanned Aug 23, 2025
bfc55c7
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 23, 2025
f47ef14
docs: Remove *Returns* from `Expr` version
dangotbanned Aug 23, 2025
b32db75
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 25, 2025
f22a497
Merge branch 'main' into expr-first
dangotbanned Aug 25, 2025
b5fe1ba
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Aug 28, 2025
0f301e7
Merge branch 'main' into expr-first
dangotbanned Aug 30, 2025
16a2762
Merge branch 'main' into expr-first
dangotbanned Sep 3, 2025
09dca76
Merge branch 'main' into expr-first
dangotbanned Sep 5, 2025
ffe7e24
Merge remote-tracking branch 'upstream/main' into expr-first
dangotbanned Sep 13, 2025
0fb0455
chore(typing): fix incompatible override
dangotbanned Sep 13, 2025
6d63ea6
simplify grouped first/last#
MarcoGorelli Oct 2, 2025
7b00310
simplify test, remove unnecessary over(order_by)
MarcoGorelli Oct 2, 2025
016abc9
combine tests
MarcoGorelli Oct 2, 2025
29d6cb7
combine tests
MarcoGorelli Oct 2, 2025
3b91e23
duckdb fix#
MarcoGorelli Oct 2, 2025
c87935d
sort out ibis
MarcoGorelli Oct 2, 2025
0393dfe
dask
MarcoGorelli Oct 2, 2025
466c922
add note to docs
MarcoGorelli Oct 2, 2025
4266e4b
remove unnecessary code
MarcoGorelli Oct 2, 2025
555098b
pyarrow
MarcoGorelli Oct 2, 2025
36e38e0
fixup
MarcoGorelli Oct 2, 2025
42d2cd6
typing
MarcoGorelli Oct 2, 2025
63f012a
dask
MarcoGorelli Oct 2, 2025
c4ac043
test and support `diff().sum().over(order_by=...)`
MarcoGorelli Oct 3, 2025
8739b6a
cross-pandas version compat
MarcoGorelli Oct 3, 2025
ff22604
make test more unusual
MarcoGorelli Oct 3, 2025
d9c4a1b
fix another pyarrow issue
MarcoGorelli Oct 3, 2025
03b7969
catch more warnings for modin
MarcoGorelli Oct 3, 2025
d01a398
factor out sql_expression, link to feature request
MarcoGorelli Oct 3, 2025
18c0861
combine first and last blocks
MarcoGorelli Oct 3, 2025
948d96d
remove more unneeded
MarcoGorelli Oct 3, 2025
8810d03
less special-casing
MarcoGorelli Oct 3, 2025
843549f
simplify further
MarcoGorelli Oct 3, 2025
d7be792
Merge remote-tracking branch 'upstream/main' into expr-first
MarcoGorelli Oct 3, 2025
363490d
typing
MarcoGorelli Oct 3, 2025
c25d649
use repeat_by instead of lit for polars
MarcoGorelli Oct 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/api-reference/expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- fill_nan
- fill_null
- filter
- first
- is_between
- is_close
- is_duplicated
Expand All @@ -34,6 +35,7 @@
- is_null
- is_unique
- kurtosis
- last
- len
- log
- map_batches
Expand Down
2 changes: 2 additions & 0 deletions docs/api-reference/series.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
- fill_nan
- fill_null
- filter
- first
- from_iterable
- from_numpy
- gather_every
Expand All @@ -50,6 +51,7 @@
- is_unique
- item
- kurtosis
- last
- len
- log
- max
Expand Down
21 changes: 14 additions & 7 deletions narwhals/_arrow/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from typing import TYPE_CHECKING, Any

import pyarrow as pa
import pyarrow.compute as pc

from narwhals._arrow.series import ArrowSeries
Expand Down Expand Up @@ -111,11 +112,8 @@ def _reuse_series_extra_kwargs(
return {"_return_py_scalar": False} if returns_scalar else {}

def over(self, partition_by: Sequence[str], order_by: Sequence[str]) -> Self:
if (
partition_by
and self._metadata is not None
and not self._metadata.is_scalar_like
):
meta = self._metadata
if partition_by and meta is not None and not meta.is_scalar_like:
msg = "Only aggregation or literal operations are supported in grouped `over` context for PyArrow."
raise NotImplementedError(msg)

Expand All @@ -129,15 +127,24 @@ def func(df: ArrowDataFrame) -> Sequence[ArrowSeries]:
df = df.with_row_index(token, order_by=None).sort(
*order_by, descending=False, nulls_last=False
)
result = self(df.drop([token], strict=True))
results = self(df.drop([token], strict=True))
if meta is not None and meta.is_scalar_like:
# We need to broadcast the results to the original size, since
# `over` is a length-preserving operation.
size = len(df)
return [s._with_native(pa.repeat(s.item(), size)) for s in results]

# TODO(marco): is there a way to do this efficiently without
# doing 2 sorts? Here we're sorting the dataframe and then
# again calling `sort_indices`. `ArrowSeries.scatter` would also sort.
sorting_indices = pc.sort_indices(df.get_column(token).native)
return [s._with_native(s.native.take(sorting_indices)) for s in result]
return [s._with_native(s.native.take(sorting_indices)) for s in results]
else:

def func(df: ArrowDataFrame) -> Sequence[ArrowSeries]:
if order_by:
df = df.sort(*order_by, descending=False, nulls_last=False)

output_names, aliases = evaluate_output_names_and_aliases(self, df, [])
if overlap := set(output_names).intersection(partition_by):
# E.g. `df.select(nw.all().sum().over('a'))`. This is well-defined,
Expand Down
76 changes: 60 additions & 16 deletions narwhals/_arrow/group_by.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
from narwhals._arrow.utils import cast_to_comparable_string_types, extract_py_scalar
from narwhals._compliant import EagerGroupBy
from narwhals._expression_parsing import evaluate_output_names_and_aliases
from narwhals._utils import generate_temporary_column_name
from narwhals._utils import generate_temporary_column_name, requires

if TYPE_CHECKING:
from collections.abc import Iterator, Mapping, Sequence
Expand Down Expand Up @@ -39,12 +39,23 @@ class ArrowGroupBy(EagerGroupBy["ArrowDataFrame", "ArrowExpr", "Aggregation"]):
"count": "count",
"all": "all",
"any": "any",
"first": "first",
"last": "last",
}
_REMAP_UNIQUE: ClassVar[Mapping[UniqueKeepStrategy, Aggregation]] = {
"any": "min",
"first": "min",
"last": "max",
}
_OPTION_COUNT_ALL: ClassVar[frozenset[NarwhalsAggregation]] = frozenset(
("len", "n_unique")
)
_OPTION_COUNT_VALID: ClassVar[frozenset[NarwhalsAggregation]] = frozenset(("count",))
_OPTION_ORDERED: ClassVar[frozenset[NarwhalsAggregation]] = frozenset(
("first", "last")
)
_OPTION_VARIANCE: ClassVar[frozenset[NarwhalsAggregation]] = frozenset(("std", "var"))
_OPTION_SCALAR: ClassVar[frozenset[NarwhalsAggregation]] = frozenset(("any", "all"))

def __init__(
self,
Expand All @@ -60,12 +71,58 @@ def __init__(
self._grouped = pa.TableGroupBy(self.compliant.native, self._keys)
self._drop_null_keys = drop_null_keys

def _configure_agg(
self, grouped: pa.TableGroupBy, expr: ArrowExpr, /
) -> tuple[pa.TableGroupBy, Aggregation, AggregateOptions | None]:
option: AggregateOptions | None = None
function_name = self._leaf_name(expr)
if function_name in self._OPTION_VARIANCE:
ddof = expr._scalar_kwargs.get("ddof", 1)
option = pc.VarianceOptions(ddof=ddof)
elif function_name in self._OPTION_COUNT_ALL:
option = pc.CountOptions(mode="all")
elif function_name in self._OPTION_COUNT_VALID:
option = pc.CountOptions(mode="only_valid")
elif function_name in self._OPTION_SCALAR:
option = pc.ScalarAggregateOptions(min_count=0)
elif function_name in self._OPTION_ORDERED:
grouped, option = self._ordered_agg(grouped, function_name)
return grouped, self._remap_expr_name(function_name), option
Comment on lines +74 to +90
Copy link
Member Author

@dangotbanned dangotbanned Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible follow-up

Do another pass on this, since it was written before the pandas refactor - and solves a similar problem in a different way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should be able to upstream some version of that to main (or this PR if blocking) if there's interest

It works on all versions of pyarrow that we support and doesn't require touching pyarrow internals like this does


def _ordered_agg(
self, grouped: pa.TableGroupBy, name: NarwhalsAggregation, /
) -> tuple[pa.TableGroupBy, AggregateOptions]:
"""The default behavior of `pyarrow` raises when `first` or `last` are used.

You'd see an error like:

ArrowNotImplementedError: Using ordered aggregator in multiple threaded execution is not supported

We need to **disable** multi-threading to use them, but the ability to do so
wasn't possible before `14.0.0` ([pyarrow-36709])

[pyarrow-36709]: https://github.com/apache/arrow/issues/36709
"""
backend_version = self.compliant._backend_version
if backend_version >= (14, 0) and grouped._use_threads:
native = self.compliant.native
grouped = pa.TableGroupBy(native, grouped.keys, use_threads=False)
elif backend_version < (14, 0): # pragma: no cover
msg = (
f"Using `{name}()` in a `group_by().agg(...)` context is only available in 'pyarrow>=14.0.0', "
f"found version {requires._unparse_version(backend_version)!r}.\n\n"
f"See https://github.com/apache/arrow/issues/36709"
)
raise NotImplementedError(msg)
return grouped, pc.ScalarAggregateOptions(skip_nulls=False)

def agg(self, *exprs: ArrowExpr) -> ArrowDataFrame:
self._ensure_all_simple(exprs)
aggs: list[tuple[str, Aggregation, AggregateOptions | None]] = []
expected_pyarrow_column_names: list[str] = self._keys.copy()
new_column_names: list[str] = self._keys.copy()
exclude = (*self._keys, *self._output_key_names)
grouped = self._grouped

for expr in exprs:
output_names, aliases = evaluate_output_names_and_aliases(
Expand All @@ -83,20 +140,7 @@ def agg(self, *exprs: ArrowExpr) -> ArrowDataFrame:
aggs.append((self._keys[0], "count", pc.CountOptions(mode="all")))
continue

function_name = self._leaf_name(expr)
if function_name in {"std", "var"}:
assert "ddof" in expr._scalar_kwargs # noqa: S101
option: Any = pc.VarianceOptions(ddof=expr._scalar_kwargs["ddof"])
elif function_name in {"len", "n_unique"}:
option = pc.CountOptions(mode="all")
elif function_name == "count":
option = pc.CountOptions(mode="only_valid")
elif function_name in {"all", "any"}:
option = pc.ScalarAggregateOptions(min_count=0)
else:
option = None

function_name = self._remap_expr_name(function_name)
grouped, function_name, option = self._configure_agg(grouped, expr)
new_column_names.extend(aliases)
expected_pyarrow_column_names.extend(
[f"{output_name}_{function_name}" for output_name in output_names]
Expand All @@ -105,7 +149,7 @@ def agg(self, *exprs: ArrowExpr) -> ArrowDataFrame:
[(output_name, function_name, option) for output_name in output_names]
)

result_simple = self._grouped.aggregate(aggs)
result_simple = grouped.aggregate(aggs)

# Rename columns, being very careful
expected_old_names_indices: dict[str, list[int]] = collections.defaultdict(list)
Expand Down
9 changes: 9 additions & 0 deletions narwhals/_arrow/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,15 @@ def filter(self, predicate: ArrowSeries | list[bool | None]) -> Self:
other_native = predicate
return self._with_native(self.native.filter(other_native))

def first(self, *, _return_py_scalar: bool = True) -> PythonLiteral:
result = self.native[0] if len(self.native) else None
return maybe_extract_py_scalar(result, _return_py_scalar)

def last(self, *, _return_py_scalar: bool = True) -> PythonLiteral:
ca = self.native
result = ca[height - 1] if (height := len(ca)) else None
return maybe_extract_py_scalar(result, _return_py_scalar)

def mean(self, *, _return_py_scalar: bool = True) -> float:
return maybe_extract_py_scalar(pc.mean(self.native), _return_py_scalar)

Expand Down
8 changes: 8 additions & 0 deletions narwhals/_compliant/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ def max(self) -> Self: ...
def mean(self) -> Self: ...
def sum(self) -> Self: ...
def median(self) -> Self: ...
def first(self) -> Self: ...
def last(self) -> Self: ...
def skew(self) -> Self: ...
def kurtosis(self) -> Self: ...
def std(self, *, ddof: int) -> Self: ...
Expand Down Expand Up @@ -867,6 +869,12 @@ def is_close(
nans_equal=nans_equal,
)

def first(self) -> Self:
return self._reuse_series("first", returns_scalar=True)

def last(self) -> Self:
return self._reuse_series("last", returns_scalar=True)

@property
def cat(self) -> EagerExprCatNamespace[Self]:
return EagerExprCatNamespace(self)
Expand Down
3 changes: 3 additions & 0 deletions narwhals/_compliant/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
Into1DArray,
IntoDType,
MultiIndexSelector,
PythonLiteral,
RollingInterpolationMethod,
SizedMultiIndexSelector,
_1DArray,
Expand Down Expand Up @@ -131,6 +132,8 @@ def arg_min(self) -> int: ...
def arg_true(self) -> Self: ...
def count(self) -> int: ...
def filter(self, predicate: Any) -> Self: ...
def first(self) -> PythonLiteral: ...
def last(self) -> PythonLiteral: ...
def gather_every(self, n: int, offset: int) -> Self: ...
def head(self, n: int) -> Self: ...
def is_empty(self) -> bool:
Expand Down
2 changes: 2 additions & 0 deletions narwhals/_compliant/typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,8 @@ class ScalarKwargs(TypedDict, total=False):
"quantile",
"all",
"any",
"first",
"last",
]
"""`Expr` methods we aim to support in `DepthTrackingGroupBy`.

Expand Down
2 changes: 2 additions & 0 deletions narwhals/_dask/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -729,6 +729,8 @@ def dt(self) -> DaskExprDateTimeNamespace:
return DaskExprDateTimeNamespace(self)

rank = not_implemented()
first = not_implemented()
last = not_implemented()

# namespaces
list: not_implemented = not_implemented() # type: ignore[assignment]
Expand Down
20 changes: 20 additions & 0 deletions narwhals/_duckdb/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@
DeferredTimeZone,
F,
col,
generate_order_by_sql,
lit,
narwhals_to_native_dtype,
sql_expression,
when,
window_expression,
)
Expand Down Expand Up @@ -93,6 +95,24 @@ def _window_expression(
nulls_last=nulls_last,
)

def _first(self, expr: Expression, *order_by: str) -> Expression:
# https://github.com/duckdb/duckdb/discussions/19252
order_by_sql = generate_order_by_sql(
*order_by,
descending=[False] * len(order_by),
nulls_last=[False] * len(order_by),
)
return sql_expression(f"first({expr} {order_by_sql})")

def _last(self, expr: Expression, *order_by: str) -> Expression:
# https://github.com/duckdb/duckdb/discussions/19252
order_by_sql = generate_order_by_sql(
*order_by,
descending=[False] * len(order_by),
nulls_last=[False] * len(order_by),
)
return sql_expression(f"last({expr} {order_by_sql})")

def __narwhals_namespace__(self) -> DuckDBNamespace: # pragma: no cover
from narwhals._duckdb.namespace import DuckDBNamespace

Expand Down
16 changes: 10 additions & 6 deletions narwhals/_duckdb/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,11 +324,6 @@ def window_expression(
) -> Expression:
# TODO(unassigned): Replace with `duckdb.WindowExpression` when they release it.
# https://github.com/duckdb/duckdb/discussions/14725#discussioncomment-11200348
try:
from duckdb import SQLExpression
except ModuleNotFoundError as exc: # pragma: no cover
msg = f"DuckDB>=1.3.0 is required for this operation. Found: DuckDB {duckdb.__version__}"
raise NotImplementedError(msg) from exc
pb = generate_partition_by_sql(*partition_by)
descending = descending or [False] * len(order_by)
nulls_last = nulls_last or [False] * len(order_by)
Expand All @@ -344,7 +339,7 @@ def window_expression(
rows = ""

func = f"{str(expr).removesuffix(')')} ignore nulls)" if ignore_nulls else str(expr)
return SQLExpression(f"{func} over ({pb} {ob} {rows})")
return sql_expression(f"{func} over ({pb} {ob} {rows})")


def catch_duckdb_exception(
Expand Down Expand Up @@ -375,3 +370,12 @@ def function(name: str, *args: Expression) -> Expression:
raise NotImplementedError(msg) from exc
return SQLExpression(f"count(distinct {args[0]})")
return F(name, *args)


def sql_expression(expr: str) -> Expression:
try:
from duckdb import SQLExpression
except ModuleNotFoundError as exc: # pragma: no cover
msg = f"DuckDB>=1.3.0 is required for this operation. Found: DuckDB {duckdb.__version__}"
raise NotImplementedError(msg) from exc
return SQLExpression(expr)
10 changes: 10 additions & 0 deletions narwhals/_ibis/expr.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,16 @@ def _window_expression(
)
return expr.over(window)

def _first(self, expr: ir.Value, *order_by: str) -> ir.Value:
return cast("ir.Column", expr).first(
order_by=self._sort(*order_by), include_null=True
)

def _last(self, expr: ir.Value, *order_by: str) -> ir.Value:
return cast("ir.Column", expr).last(
order_by=self._sort(*order_by), include_null=True
)

def __narwhals_namespace__(self) -> IbisNamespace: # pragma: no cover
from narwhals._ibis.namespace import IbisNamespace

Expand Down
Loading
Loading