feat: add DataFrame.top_k and LazyFrame.top_k#2977
feat: add DataFrame.top_k and LazyFrame.top_k#2977MarcoGorelli merged 17 commits intonarwhals-dev:mainfrom
DataFrame.top_k and LazyFrame.top_k#2977Conversation
| def top_k( | ||
| self, k: int, *, by: str | Iterable[str], reverse: bool | Sequence[bool] = False | ||
| ) -> Self: | ||
| flatten_by = flatten([by]) |
There was a problem hiding this comment.
Can we add a check that if reverse is a sequence, and it's length is different than flatten_by, then an exception is raise? This guarantees that zip(by, reverse) at the compliant level is same as zip_strict.
From polars:
df = pl.DataFrame(
{
"a": ["a", "b", "a", "b", "b", "c"],
"b": [2, 1, 1, 3, 2, 1],
}
)
df.top_k(4, by=["b", "a"], reverse=[True])ValueError: the length of
reverse(1) does not match the length ofby(2)
There was a problem hiding this comment.
@raisadz I would still prefer to add a check at this level to also align the error with polars (notice that the output of flatten is a list anyway), but feel free to merge. We can follow up on it
There was a problem hiding this comment.
i think there's some other places where this would be useful (like sort) so we could probably make a validation utility for this and use it in multiple places
narwhals/_duckdb/dataframe.py
Outdated
| return self._with_native(self.native.sort(*it)) | ||
|
|
||
| def top_k(self, k: int, *, by: Iterable[str], reverse: bool | Sequence[bool]) -> Self: | ||
| df = self.native # noqa: F841 |
There was a problem hiding this comment.
If you prefix the variable name with an underscore (_df) you can avoid the # noqa: F841 flag. It's hacky I know
Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
Co-authored-by: Francesco Bruzzesi <42817048+FBruzzesi@users.noreply.github.com>
|
Thanks for the review @FBruzzesi ! I addressed your comments and will add |
|
merging then, i've opened #3026 for a follow-up, thanks all for comments! |

What type of PR is this? (check all applicable)
Related issues
{DataFrame/LazyFrame}.top_k#2947Checklist
If you have comments or can explain your changes, please do so below