feat: add `Series|Expr.rank` #1342

FBruzzesi · 2024-11-09T20:41:41Z

What type of PR is this? (check all applicable)

Related issues

Related issue [Enh]: Support polars.Expr.rank #1323

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below.

This PR provides initial support for rank method. I will start it as a draft due to a bunch of shortcomings:

pandas:
- there is a (nasty) trick to make it work with nulls/nans and nullable dtypes (related: to BUG: rank does not respect na_option='keep' for numpy nullable integer dtypes, which should be fixed in pandas 3) - I will comment the trick in the code
- support in group by context seems quite hard, or at least I could not find a clear way to achieve it due to the inability to forward rank arguments. The API to rank on a grouped object is df.groupby(keys)[cols].rank(*args, **kwargs), and this does not even return an aggregated value. Maybe we could support it if that's the only expression passed in the context, yet we need to figure out how to pass the arguments along. This is relevant since over is implemented as a group_by under the hood.
pyarrow:
- does not support polars default method (namely, "average"), therefore if rank is called without specifying another method, it will end up raising an error
- does not implement rank in a group by context
- I am using pyarrow.compute.rank which is available but not exposed/documented (?)
dask: it just does not support ranking
polars:
- group by context always returns an aggregate, in this case a list of ranks - which is fairly useless as it is a list of increasing/descreasing range until the size of the group

FBruzzesi · 2024-11-09T20:45:33Z

narwhals/_pandas_like/series.py

+            # crazy workaround for the case of `na_option="keep"` and nullable
+            # integer dtypes. This should be supported in pandas > 3.0
+            # https://github.com/pandas-dev/pandas/issues/56976


Here is the workaround.

@MarcoGorelli I was not able to properly use the pandas like util function get_dtype_backend to figure out the nullable backend. It should not really matter as the non-nullable backend would not result in integer type if the series contains nulls anyway

FBruzzesi · 2024-11-09T20:47:00Z

tests/expr_and_series/rank_test.py

+    constructor: Constructor,
+    method: Literal["average", "min", "max", "dense", "ordinal"],
+) -> None:
+    if "polars" not in str(constructor):


FBruzzesi · 2024-11-09T20:56:41Z

Hey @adamblake, this is an initial implementation to support rank. In the description I tried to explain all the shortcomings and the challenges I am facing.

FBruzzesi added 6 commits November 6, 2024 16:01

WIP

1e0d4ae

WIP

ebf4321

WIPWIP

e60214d

merge main

ea13f0c

pandas int workaround

cbc13b5

comma?

8b492d5

github-actions bot added the enhancement New feature or request label Nov 9, 2024

FBruzzesi commented Nov 9, 2024

View reviewed changes

FBruzzesi changed the title ~~feat: ass Series|Expr.rank~~ feat: add Series|Expr.rank Nov 10, 2024

FBruzzesi added 3 commits November 10, 2024 10:49

Merge branch 'main' into feat/expr-rank

cafed4b

merge main, test invalid method

4c8cc1b

old pyarrow

ec0f8a7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `Series|Expr.rank` #1342

feat: add `Series|Expr.rank` #1342

FBruzzesi commented Nov 9, 2024 •

edited

Loading

FBruzzesi Nov 9, 2024

FBruzzesi Nov 9, 2024

FBruzzesi commented Nov 9, 2024

feat: add Series|Expr.rank #1342

Are you sure you want to change the base?

feat: add Series|Expr.rank #1342

Conversation

FBruzzesi commented Nov 9, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

FBruzzesi Nov 9, 2024

Choose a reason for hiding this comment

FBruzzesi Nov 9, 2024

Choose a reason for hiding this comment

FBruzzesi commented Nov 9, 2024

feat: add `Series|Expr.rank` #1342

feat: add `Series|Expr.rank` #1342

FBruzzesi commented Nov 9, 2024 •

edited

Loading