Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UDAF to align Builtin aggregate function #10493

Merged
merged 2 commits into from
May 15, 2024

Conversation

jayzhan211
Copy link
Contributor

@jayzhan211 jayzhan211 commented May 14, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

Most of the function expect the same args or the same behaviour as builtin aggregate function for UDAF.
This PR change most of them I need for #10484

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate substrait labels May 14, 2024
@jayzhan211 jayzhan211 changed the title Minor: Support UDAF in many places Minor: Support UDAF in many function utils May 14, 2024
Signed-off-by: jayzhan211 <[email protected]>
@github-actions github-actions bot added the optimizer Optimizer rules label May 14, 2024
@jayzhan211 jayzhan211 changed the title Minor: Support UDAF in many function utils Support UDAF in many function utils May 14, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review May 14, 2024 02:21
@jayzhan211 jayzhan211 changed the title Support UDAF in many function utils Support UDAF to align Builtin aggregate function May 14, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite cool @jayzhan211 and the code is 👌 very nice. Thank you

One potential concern I have here is that I think this PR adds new feaures (e.g. support for filter/order by in user defined aggregates, I think) but not new tests.

However, given that the point of this exercise is to port built in functions to UDAF I think all the new features will be properly tested once that porting is complete

@@ -229,12 +229,15 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
)?;
let order_by = (!order_by.is_empty()).then_some(order_by);
let args = self.function_args_to_expr(args, schema, planner_context)?;
// TODO: Support filter and distinct for UDAFs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

aggregate_function::AggregateFunction::Count,
) if args.len() == 1 && is_wildcard(&args[0]) => true,
WindowFunctionDefinition::AggregateUDF(ref udaf)
if udaf.name() == "COUNT" && args.len() == 1 && is_wildcard(&args[0]) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should document the COUNT somewhere. Also should it do a case insensitive comparison?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name should follow what is defined in Count struct. case sensitive is not needed, it is strictly comparing with fn name().

&stats.num_rows,
agg_expr.as_any().downcast_ref::<expressions::Count>(),
) {
// TODO implementing Eq on PhysicalExpr would help a lot here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html# seems to imply it is possible to to compare PhysicalExprs (perhaps expr.eq(other_expr.as_any()) 🤔 )

I don't know PhysicalExpr doesn't implement PartialEq directly

pub trait PhysicalExpr: ... PartialEq<dyn Any> {`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not take a look carefully to the comment here, since they are not needed after removing builtin function

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented May 15, 2024

However, given that the point of this exercise is to port built in functions to UDAF I think all the new features will be properly tested once that porting is complete

Don't worry, the test should be covered after porting the related test, if not I will add it.

@jayzhan211
Copy link
Contributor Author

Thanks @alamb

@jayzhan211 jayzhan211 merged commit 1634602 into apache:main May 15, 2024
27 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* align udaf and builtin

Signed-off-by: jayzhan211 <[email protected]>

* add more

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants