Skip to content

fix: GroupBy with mixed length exprs as keys#3004

Merged
FBruzzesi merged 2 commits intomainfrom
fix/group-by-mixed-expr
Aug 17, 2025
Merged

fix: GroupBy with mixed length exprs as keys#3004
FBruzzesi merged 2 commits intomainfrom
fix/group-by-mixed-expr

Conversation

@FBruzzesi
Copy link
Member

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Bug spotted and reported in #3003 (comment)

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

@FBruzzesi FBruzzesi added the fix label Aug 17, 2025

output_names = _evaluate_aliases(compliant_frame, keys)

output_names_grouped = [expr._evaluate_aliases(compliant_frame) for expr in keys]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has a better variable name I am open to suggestions. Meaning is: this is not flattened

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone has a better variable name I am open to suggestions

aliases_(per|by)_expr?

I'm mostly just confused about using output_names for this

def _evaluate_aliases(
self: CompliantExpr[CompliantFrameT, Any], frame: CompliantFrameT, /
) -> Sequence[str]:
names = self._evaluate_output_names(frame)
return alias(names) if (alias := self._alias_output_names) else names

# otherwise it's single named and we can use Expr.alias
else key.alias(_temporary_name(new_name))
for key, new_name in zip(keys, output_names)
else key.alias(_temporary_name(new_name[0]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's not multi output, new_name is guaranteed to have length 1

@FBruzzesi FBruzzesi added the high priority Your PR will be reviewed very quickly if you address this label Aug 17, 2025
@FBruzzesi FBruzzesi changed the title fix: GroupBy mixed length exprs keys fix: GroupBy with mixed length exprs as keys Aug 17, 2025
Comment on lines +117 to +118
else key.alias(_temporary_name(new_name[0]))
for key, new_name in zip(keys, output_names_grouped)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they aren't flattened anymore, new_name makes it seem like we're renaming to the first character 😳

Suggested change
else key.alias(_temporary_name(new_name[0]))
for key, new_name in zip(keys, output_names_grouped)
else key.alias(_temporary_name(new_names[0]))
for key, new_names in zip(keys, output_names_grouped)

@dangotbanned dangotbanned added pyarrow Issue is related to pyarrow backend pandas-like Issue is related to pandas-like backends dask Issue is related to dask backend labels Aug 17, 2025
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks @FBruzzesi for the quick fix

@FBruzzesi FBruzzesi merged commit a4d8ebf into main Aug 17, 2025
31 checks passed
@FBruzzesi FBruzzesi deleted the fix/group-by-mixed-expr branch August 17, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dask Issue is related to dask backend fix high priority Your PR will be reviewed very quickly if you address this pandas-like Issue is related to pandas-like backends pyarrow Issue is related to pyarrow backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants