Skip to content

chore: remove more kwargs from dask, dont treat ddof as expressifiable arg#2059

Merged
MarcoGorelli merged 5 commits intonarwhals-dev:mainfrom
MarcoGorelli:more-kwargs
Feb 21, 2025
Merged

chore: remove more kwargs from dask, dont treat ddof as expressifiable arg#2059
MarcoGorelli merged 5 commits intonarwhals-dev:mainfrom
MarcoGorelli:more-kwargs

Conversation

@MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Feb 20, 2025

I don't like how currently we're muddying the waters between "expressifiable arguments" and "kwargs", which leads us to having to pass ddof as an expressifiable argument in std and var

I'd like to start by separating the concepts, and not passing ddof as an expressifiable argument

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

  • Related issue #<issue number>
  • Closes #<issue number>

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@MarcoGorelli MarcoGorelli marked this pull request as ready for review February 20, 2025 18:57
# First argument to `call` should be `dx.Series`
call: Callable[..., dx.Series],
expr_name: str,
kwargs: dict[str, Any] | None = None,
Copy link
Collaborator

@EdAbati EdAbati Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we call this call_kwargs i.e. kwargs for the call function ?

(Just to clarify these kwargs are not passed to _from_call itself)

Copy link
Member

@dangotbanned dangotbanned Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdAbati Couldn't you just pass call as a partial in the cases where this is needed?


  • Avoids needing this parameter entirely
  • "binding" of arguments is done once, instead of on each use
  • Also supports positional arguments

So instead of call, call_kwargs - you'd have call.func, call.keywords.
But the keywords themselves aren't exposed here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that functools.partial stored keywords like that, thanks!

Copy link
Member

@dangotbanned dangotbanned Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that functools.partial stored keywords like that, thanks!

You might need to use operator.methodcaller instead, if you don't know the class that would be passed.

E.g. partial would work if you could do partial(dx.Series.std, ddof=ddof).
But where it might be dx._groupby.GroupBy.std, using the method name would be more widely accepted

def std(ddof: int = 1) -> _AggFn:
return partial(_DaskGroupBy.std, ddof=ddof)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't know the class that would be passed.

Is that why you're storing function_name, in addition to the lambda?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that's right

Comment on lines 312 to 317
def std(self: Self, ddof: int) -> Self:
return self._from_call(
lambda _input, ddof: _input.std(ddof=ddof).to_series(), "std", ddof=ddof
lambda _input: _input.std(ddof=ddof).to_series(),
"std",
kwargs={"ddof": ddof},
)
Copy link
Member

@dangotbanned dangotbanned Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #2059 (comment)

and

https://github.com/MarcoGorelli/narwhals/blob/52b80b240f0fc0e7eb069a84f0ee001f218bedaf/narwhals/_dask/expr.py#L51-L53

        # Kwargs with metadata which we may need in group-by agg
        # (e.g. `ddof` for `std` and `var`).
        kwargs: dict[str, Any] | None = None,

I'm not following why kwargs is used at all here:

def std(ddof: int = 1) -> _AggFn:
return partial(_DaskGroupBy.std, ddof=ddof)

kwargs: dict[str, Any] = (
{"ddof": expr._kwargs["ddof"]} if function_name in {"std", "var"} else {} # type: ignore[attr-defined]
)

Am I reading this wrong, or are the same kwargs applied twice?

@MarcoGorelli
Copy link
Member Author

thanks both for comments!

i'd like to refactor this mechanism more broadly, so i'll hold off functools gymnastics for the time being (even though it does look really nice!), but definitely interested in revisiting in the future

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good @MarcoGorelli

@MarcoGorelli
Copy link
Member Author

thanks all!

@MarcoGorelli MarcoGorelli merged commit f0a5fbb into narwhals-dev:main Feb 21, 2025
28 checks passed
dangotbanned added a commit that referenced this pull request Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants