-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups should support partial aggregation #14222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| implicit val resultEncoder = ExpressionEncoder.tuple(kExprEnc, vExprEnc) | ||
| flatMapGroups(func) | ||
| def zero: (Int, V) = (0, null.asInstanceOf[V]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with Aggregator here is the zero value. This PR uses an Int (can be Boolean too) to indicate if the buffer is initialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull this out to be a generic ReduceAggregator and add unit test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will update this.
|
Test build #62375 has finished for PR 14222 at commit
|
|
there is a usefulness to this |
4ba124c to
7e8d8c1
Compare
|
Test build #62454 has finished for PR 14222 at commit
|
|
Test build #62456 has finished for PR 14222 at commit
|
|
ping @rxin The change is ok for you? Please review this. Thanks. |
| import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder | ||
| import org.apache.spark.sql.expressions.ReduceAggregator | ||
|
|
||
| class ReduceAggregatorSuite extends SparkFunSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just put this in DatasetAggregatorSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I need to update this as you want to take over it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated it.
|
@viirya I'm going to take over the PR and play with the API a little bit. |
|
Ok. |
|
Test build #62594 has finished for PR 14222 at commit
|
|
@rxin Any thing I need to update for this? Thanks. |
|
ping @rxin |
|
ping @rxin any thoughts on this? It is waiting for a while. Thanks. |
[SPARK-16391][SQL] KeyValueGroupedDataset.reduceGroups should support partial aggregation
|
I've created a pr here for discussion, based on my playing with the API: #14576 |
|
Close this now since the pr #14576 is merged. |
What changes were proposed in this pull request?
KeyValueGroupedDataset.reduceGroupsis currently implemented viaflatMapGroups, which does not support partial aggregation and so is very inefficient.KeyValueGroupedDataset.reduceGroupsshould support partial aggregation. This PR implements it withAggregator.How was this patch tested?
Existing tests.