Conversation
- Very barebones to start with - Mostly just defining the current impl in a generic way - There are some quirks I'd like to iron out before merging
narwhals/_compliant/when_then.py
Outdated
| def otherwise( | ||
| self, value: CompliantExprT | CompliantSeriesOrNativeExprT_co | _Scalar, / | ||
| ) -> CompliantExprT: | ||
| self._call._otherwise_value = value | ||
| self._function_name = "whenotherwise" | ||
| return cast("CompliantExprT", self) |
There was a problem hiding this comment.
I've been thinking about this for a while.
@MarcoGorelli would you be able to help me understand what is going on here?
For reference, I'm just mirroring what is in (Arrow|Pandas)Then
narwhals/narwhals/_arrow/namespace.py
Lines 388 to 391 in 29c0f6c
Side note for DuckDBThen
I think that is typing issue would be fixed by annotating DuckDB._call: DuckDBWhen
narwhals/narwhals/_duckdb/namespace.py
Lines 320 to 326 in 29c0f6c
1. Mutability
*Then.otherwise mutates both *When and *Then.
I expected that would cause an issue for something like:
Code block
import pandas as pd
import polars as pl
import narwhals as nw
df = pl.DataFrame({"a": ["A", "B", "A"], "b": [1, 2, 3], "c": [9, 2, 4], "d": [8, 7, 8]})
nw_pd = nw.from_native(nw.from_native(df).to_pandas())
when = nw.when(nw.col("a") == "B")
when_then = when.then("b")
when_then_otherwise = when.then("b").otherwise("c")
nw_pd.with_columns(
when_then.alias("e"),
when_then_otherwise.alias("f"),
when_then.otherwise("d").alias("g"),
when.then(nw.col("c") * 10).otherwise(nw.col("c") * 100).alias("h"),
)But it all worked 🎉!
Output
┌─────────────────────────────┐
| Narwhals DataFrame |
|-----------------------------|
| a b c d e f g h|
|0 A 1 9 8 NaN 9 8 900|
|1 B 2 2 7 2.0 2 2 20|
|2 A 3 4 8 NaN 4 8 400|
└─────────────────────────────┘However, AFAICT we're depending on the implementation details of nw.(When|Then) to make this safe.
My best guess is we get new function(s) (and therefore Compliant* objects) on every narwhals-level method call?
Suggestions
If we change nothing
It might be helpful to add some tests for this kind of thing to avoid unintentionally regressing.
The tests don't appear to cover it
The example code I've given was arbitrary, but the pattern of reusing parts of a when-then-otherwise is something I've done pretty frequently in altair:
- https://altair-viz.github.io/gallery/dot_dash_plot.html
- https://altair-viz.github.io/gallery/multiline_tooltip_standard.html
- https://altair-viz.github.io/gallery/scatter_point_paths_hover.html
.otherwise(...)
If we returned a new object - I think that would be more consistent with the rest of the Compliant* API
2. function_name
Is this solely to align with the CompliantExpr._function_name requirement?
Every backend defines this, but I haven't found anywhere that utilizes the distinction between the two:
class SomeWhen:
def then(self, value):
return SomeThen(..., function_name="whenthen", ...)
class SomeThen:
def otherwise(self, value):
...
self._function_name = "whenotherwise"
return self3. otherwise() -> CompliantExpr
I'm assuming the intent is to "erase" the fact that this returns CompliantThen?
At this stage we have a Callable[[Frame], Sequence[Series]].
So, we could go a step further and actually wrap it in the parent class?
Note
All of this will need to be rewritten either way, if anyone picks up chaining again (#669)
There was a problem hiding this comment.
so we can get rid of tracking
function_nameat the compliant level
@MarcoGorelli we're getting a bit too in sync lol
There was a problem hiding this comment.
sure, sounds good to returns a new object and avoid mutating
I've held off allowing chaining of when-then statements until the current implementation is better, so definitely happy to make steps in the right direction
There was a problem hiding this comment.
sure, sounds good to returns a new object and avoid mutating
I've held off allowing chaining of when-then statements until the current implementation is better, so definitely happy to make steps in the right direction
@MarcoGorelli lovely
Should be quite easy to tweak after this PR 😎
- Skipping `__init__` to internally handle a generic `CompliantExpr` - Should mean we only need: - `CompliantWhen.__call__(...)` - Stub `CompliantThen` as a mixin with the main `*Expr` class - Return that type from `CompliantWhen._then`
| self._call._otherwise_value = value | ||
| self._function_name = "whenotherwise" | ||
| return self | ||
| class ArrowThen(CompliantThen[ArrowDataFrame, ArrowSeries, ArrowExpr], ArrowExpr): ... |
There was a problem hiding this comment.
Before anyone asks - yes this is a fully working implementation 😄
narwhals/_compliant/when_then.py
Outdated
| def __call__( | ||
| self, compliant_frame: CompliantFrameT, / | ||
| ) -> Sequence[CompliantSeriesOrNativeExprT]: ... |
There was a problem hiding this comment.
TODO
Try to find common parts and split up their implementation
- Ideally be able to define more as private methods here
- Less work for subclasses
There was a problem hiding this comment.
This will allow most of (Arrow|Pandas)Then.__call__ to be the same
There was a problem hiding this comment.
Wasn't expecting I'd be able to reduce it down this much ...
There was a problem hiding this comment.
Note
Mentioned in (b0b1a05) that I'm tentative on LazyWhen
Happy to merge as-is, but its just the least-bad idea I had so far
- Reserve `_when`, `_then` for types - Always use `_*_value` for expression fragments stored on an instance - Positional-only, non-prefixed args for the method/function they correspond to #2261 (comment)
- The annotations in `when_then` will be much easier to read by just defining the `TypeVar`(s) internally - Also means one less place to keep the type params in sync
First step towards (#2261 (comment))
Using `Incomplete` for now, instead of adding more `TypeVar`s
- Need to sleep on this one - It works, but seems like quite a battle to unify them
Compliant(When|Then)Compliant(When|Then)
| self._call._otherwise_value = value | ||
| self._function_name = "whenotherwise" | ||
| return self | ||
| class ArrowThen(CompliantThen[ArrowDataFrame, ArrowSeries, ArrowExpr], ArrowExpr): ... |
Thanks @MarcoGorelli 😍 Wasn't sure if I'd gone off the deep end - but I'm really happy with how clean each of the backends are looking now |
| def maybe_evaluate_expr( | ||
| df: DuckDBLazyFrame, obj: DuckDBExpr | object | ||
| ) -> duckdb.Expression: | ||
| from narwhals._duckdb.expr import DuckDBExpr | ||
|
|
||
| if isinstance(obj, DuckDBExpr): | ||
| column_results = obj._call(df) | ||
| assert len(column_results) == 1 # debug assertion # noqa: S101 | ||
| return column_results[0] | ||
| return lit(obj) |
| @classmethod | ||
| def from_when( | ||
| cls, | ||
| when: CompliantWhen[FrameT, SeriesT, ExprT], | ||
| then: IntoExpr[SeriesT, ExprT], | ||
| /, | ||
| ) -> Self: | ||
| when._then_value = then | ||
| obj = cls.__new__(cls) | ||
| obj._call = when | ||
| obj._when_value = when | ||
| obj._depth = 0 | ||
| obj._function_name = "whenthen" | ||
| obj._evaluate_output_names = getattr( | ||
| then, "_evaluate_output_names", lambda _df: ["literal"] | ||
| ) | ||
| obj._alias_output_names = getattr(then, "_alias_output_names", None) | ||
| obj._implementation = when._implementation | ||
| obj._backend_version = when._backend_version | ||
| obj._version = when._version | ||
| obj._call_kwargs = {} | ||
| return obj |
There was a problem hiding this comment.
Important
Probably going to try something like this for _compliant.selectors.py
Will wait for (#2266) to land first though

What type of PR is this? (check all applicable)
Related issues
Compliant*protocols #2230CompliantNamespaceprotocol #2202 (comment)Checklist
Tasks
CompliantWhenCompliantThenArrow(When|Then)Dask(When|Then)DuckDB(When|Then)Pandas(When|Then)SparkLike(When|Then)CompliantWhen.__call__(See thread)EagerWhen(DuckDB|SparkLike)WhenDaskWhenDepthTracking...idea (chore: TrackExpansionKindinExprMetadata#2266 (comment))