Skip to content

feat: Adds CompliantGroupBy#2252

Merged
MarcoGorelli merged 28 commits intomainfrom
compliant-group-by
Mar 20, 2025
Merged

feat: Adds CompliantGroupBy#2252
MarcoGorelli merged 28 commits intomainfrom
compliant-group-by

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Mar 19, 2025

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

  • ArrowGroupBy
  • DaskLazyGroupBy
  • PandasLikeGroupBy
    • Partially in (9637662)
    • Looks a bit too fragile, for me to feel confident changing anything else
  • LazyGroupBy
    • DuckDBGroupBy
    • SparkLikeLazyGroupBy

Needed for `__init__` support
- Dropped `native` as it got too complex
- I've done this fake `Any` thing to make `mypy` understand in multiple places
- Having a hard time working out what is going on here
- All I've changed is what the refs are named
Much happier with this than the `pandas` one
- Shorted in each backend
- Added docs
- Accounts for `dask` deviation from `str`
- More performant to compile a single pattern and reuse everywhere
- Gives a name to a common op
- Backend code is shorter
Keeps this part of `pandas` in sync, despite it doing some extra name stuff
- `_duckdb` and `_spark_like` don't need these parts (only `_dask` does)
- Also avoids needing a `TypeVar` default, which caused some issues in https://github.com/narwhals-dev/narwhals/actions/runs/13970848097/job/39111959826?pr=2252
- These two are almost identical
- Trying to reduce them as much as possible, before moving the common parts to `nw._compliant.LazyGroupBy`
- Greatly simplifies what each backend needs to implement
- Avoids creating and combining intermediate lists
- Avoids performing a `not in exclude` check, where `exclude` is empty
- Identified and documented a new common method `CompliantExpr._is_multi_output_agg`
- Will need to make (inavriant) `CompliantExprT` to type the second part correctly
- Believe that is also causing this weird error

```
Argument of type "CompliantExprT_contra@CompliantDataFrame" cannot be assigned to
parameter "exprs" of type "CompliantExprT_contra@CompliantDataFrame" in function "select"
  Type
  "CompliantExprT_contra@CompliantDataFrame" is not assignable to type
  "CompliantExprT_contra@CompliantDataFrame"

Pylance(reportArgumentType)
```
@dangotbanned dangotbanned changed the title feat(DRAFT): CompliantGroupBy feat: CompliantGroupBy Mar 20, 2025
@dangotbanned dangotbanned changed the title feat: CompliantGroupBy feat: Adds CompliantGroupBy Mar 20, 2025
@dangotbanned dangotbanned marked this pull request as ready for review March 20, 2025 20:55
@dangotbanned dangotbanned linked an issue Mar 20, 2025 that may be closed by this pull request
@dangotbanned dangotbanned requested review from EdAbati, FBruzzesi and MarcoGorelli and removed request for EdAbati, FBruzzesi and MarcoGorelli March 20, 2025 21:44
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done, love this

else output_names
)
native_exprs = expr(self.compliant)
if expr._is_multi_output_agg():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/narwhals-dev/narwhals/pull/2246/files kinda ties in here...i'll update that later, i like what you've done here anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice we were pretty close to landing on the same name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli another one that might be cleaner as a method:

is_elementary_expression -> CompliantExpr._is_elementary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell if you mean the same thing as elementwise or just a fancy way of saying simple

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by 'elementwise' i mean something that operates on each row independently of all the other rows. something like Expr.abs

'elementary' i just meant that it only does a single operation - so, "fancy way of saying simple" seems accurate 😄

Comment on lines +30 to 33
agg_columns = list(chain(self._keys, self._evaluate_exprs(exprs)))
return self.compliant._from_native_frame(
self.compliant.native.aggregate(agg_columns) # type: ignore[arg-type]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, this now looks super-simple!

keys: list[str],
drop_null_keys: bool, # noqa: FBT001
df: SparkLikeLazyFrame,
keys: Sequence[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should call these by, for consistency with polars? ok to do separately anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, no objections from me

I just picked the name used in most of the implementations

@MarcoGorelli MarcoGorelli merged commit 29c0f6c into main Mar 20, 2025
31 of 32 checks passed
@MarcoGorelli MarcoGorelli deleted the compliant-group-by branch March 20, 2025 21:48
@dangotbanned
Copy link
Member Author

Thanks for such a quick review @MarcoGorelli ♥️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[surviving mutant] in groupby.agg (pyarrow)

2 participants