Track result sizes of partial aggregate evaluation in HBO#21160
Track result sizes of partial aggregate evaluation in HBO#21160arhimondr merged 1 commit intoprestodb:masterfrom
Conversation
cf87d90 to
ddf42a9
Compare
There was a problem hiding this comment.
remove debug printf
There was a problem hiding this comment.
Isn't the inputBytes here the same as the inputBytes above?
There was a problem hiding this comment.
yes, I think you are right, the inputs are the same and only the outputs are different (one is the output of partial agg, and the other output of the final agg)
There was a problem hiding this comment.
The input bytes is available by checking the output of the aggregation input, hence not necessary here?
There was a problem hiding this comment.
actually these are different: in one case it's the input to the final agg (available from the child node), in the other case it's the input to the partial agg (which is a node several levels down the query tree: we'll cache it at the level of the final agg)
There was a problem hiding this comment.
I think this is the output of the partial aggregation node? Maybe either rename this to partialAggregationOutputBytes or rename this class to PartialAggregationNodeStatsEstimate to make it clearer?
There was a problem hiding this comment.
makes sense, will rename
There was a problem hiding this comment.
This whole block can be moved to the beginning of the for loop?
There was a problem hiding this comment.
The for loop can exit early once a match is found?
There was a problem hiding this comment.
Is it possible to have two aggregations with the same grouping sets hence match a different one?
There was a problem hiding this comment.
We need a better way to find which partial aggregate corresponds to which final one - I found out that there are cases where the grouping columns get renamed in the middle (perhaps by a different transformation) and we no longer can match the aggregation nodes based on grouping columns.
I'm thinking matching them either by planNode.sourceLocation or introducing another variable in aggregation nodes aggregationId to know which 2 nodes came from the same original aggregate
There was a problem hiding this comment.
I introduced a new field aggregationId in AggregationNode: it's normally empty, but when an aggregate is split into partial and final, the new nodes get the same id so they can be matched later
81aa6ab to
434afbb
Compare
c0d5d75 to
dcfb9aa
Compare
ad7e1f0 to
e031a92
Compare
b7c15c8 to
5f477c9
Compare
fa85601 to
923a13d
Compare
1420b17 to
9979da6
Compare
There was a problem hiding this comment.
nit: It's usually more convenient when config names and session property names are consistent. Also I remember there used to be a rule of thumb to call boolean properties as ...-enabled.
What do you think about history_based_partial_aggregation_optimization_enabled (optimizer.history-based-partial-aggregation-optimization-enabled) (or something along the line to keep it close to how HISTORY_BASED_SCALED_WRITER is called)?
There was a problem hiding this comment.
that makes sense.
The only caveat here is that this optimization was already history-based, but now we use the statistics from the partial aggregation instead of the final aggregation. I added the flag to avoid possible regressions and be able to gradually deploy.
If you have more name suggestions let me know :)
There was a problem hiding this comment.
Make sense. Let's keep the name. However It may still be worth to have a consistent session property name (use_partial_aggregation_history) and config property name (optimizer.use-partial-aggregation-history)
There was a problem hiding this comment.
nit: maybe add a helper method isUnknown to avoid comparing the references (it can be potentially fragile)
9979da6 to
364b520
Compare
There was a problem hiding this comment.
Why it's named partialAggregationStatsInfo, my understanding is that aggregationNodeStats stores the information of final aggregation node?
There was a problem hiding this comment.
we cache historical partial stats execution at the level of the final aggregation because the partial agg is not part of the canonical plan
There was a problem hiding this comment.
aggregationNodeStats is a helper structure where we accumulate information about results of aggregation execution: the statistics comes from the partial agg node but ends up being tracked in the final agg node because the partial agg is not part of the canonical plan
There was a problem hiding this comment.
oops, I misread your comment @feilong-liu
you are right, the variable should be named finalAggregationStatsInfo, will fix that in the code
There was a problem hiding this comment.
planStatisticsFinalAgg comes from partialAggregationStatsInfo, this is confusing.
There was a problem hiding this comment.
will fix the naming
There was a problem hiding this comment.
why !outputVariables.isEmpty() here?
There was a problem hiding this comment.
there was an assumption that if there is at least one output row, there must be at least one output byte.
This is not the case with partial aggs that don't project any new columns (for example "select count(*)"): in that case the output bytes were tracked as 0 and we ended up replacing it with NaN which raises an exception later on
I'll add some comments
There was a problem hiding this comment.
@mlyublena I think this may be the culprit, we are directly using the output from this function to populate the aggregation stats, which can be a NaN.
There was a problem hiding this comment.
should be equal check?
There was a problem hiding this comment.
thanks, introduced this in the last refactoring
will fix!
There was a problem hiding this comment.
I moved this check to the function partialAggregationNotUseful
The original PR which introduced cost-based reasoning for partial aggs was conservative and only allowed skipping partial aggs for single-column GROUP BY-s because of possible estimation errors with multi-key aggs:
https://github.com/prestodb/presto/pull/16175/files#r657520172
If we are tracking partial history in HBO, we don't need to be that conservative and can allow more cases
There was a problem hiding this comment.
Thanks for clarification. I guess we will still keep this behaviour for CBO and only extend for HBO right?
There was a problem hiding this comment.
Use partial aggregation stats when it's unknown?
There was a problem hiding this comment.
this was a mistake on my part: the function was actually checking for isNotUnknown: I fixed the naming and the semantics
There was a problem hiding this comment.
Why single-key aggregation is special?
There was a problem hiding this comment.
see comment above: the original implementation of this optimizer had this to avoid mis-estimation with multi-key aggregates in CBO. I moved the check inside the function and relaxed it if we know we're using partial HBO stats
364b520 to
6c4dfb5
Compare
feilong-liu
left a comment
There was a problem hiding this comment.
Look good. Remember to fill the PR description field and release note etc.
6c4dfb5 to
684e653
Compare
There was a problem hiding this comment.
Make sense. Let's keep the name. However It may still be worth to have a consistent session property name (use_partial_aggregation_history) and config property name (optimizer.use-partial-aggregation-history)
684e653 to
84203b8
Compare
Until now HBO tracked only the final input and output sizes of evaluating the aggregate, but not the result of partial agg evaluation.
This may lead to incorrect estimation of data size reduction when the final aggregate reduces data size, but the partial aggregate does not.
This changelist adds the ability to track input and output sizes of the partial aggregate to help make a better decision of when to split an aggregate
into partial and final.
This change adds a new field aggregationId to AggregationNode to track split partial/final aggregate pairs, and uses that during history tracking to record partial agg execution details.
The new statistic estimates now contains the following for Final aggregation nodes:
"partialAggregationStatsEstimate" : {
"inputBytes" : 0.0,
"outputBytes" : 23976.0,
"inputRowCount" : 5.6717574E7,
"outputRowCount" : 2664.0
}
This CL also changes the PushPartialAggregationThroughExchange optimizer to use the partial agg statistics when available.
In addition, the following modifications to the PushPartialAggregationThroughExchange optimization were made:
- when using partial aggregation statistics, apply optimization to multi-key aggregates. The original optimization only triggers for single-key aggregate nodes.
- Use rows instead of bytes when use_partial_aggregation_history flag is on
84203b8 to
9483dad
Compare
| return new PartialAggregationStatistics(Estimate.of(partialAggregationInputBytes), | ||
| Estimate.of(outputBytes), | ||
| Estimate.of(childNodeStats.getPlanNodeOutputPositions()), | ||
| Estimate.of(outputPositions)); |
There was a problem hiding this comment.
@mlyublena In the following line Estimate.of(nan) throws IllegalArgumentException in Presto-on-Spark. Can you please add the necessary handling, please?
There was a problem hiding this comment.
stack trace for reference
java.lang.IllegalArgumentException: value is NaN
at com.facebook.presto.spi.statistics.Estimate.of(Estimate.java:54)
at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.constructAggregationNodeStatistics(HistoryBasedPlanStatisticsTracker.java:250)
at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.getQueryStats(HistoryBasedPlanStatisticsTracker.java:164)
at com.facebook.presto.event.QueryMonitor.queryCompletedEvent(QueryMonitor.java:267)
at com.facebook.presto.spark.execution.AbstractPrestoSparkQueryExecution.queryCompletedEvent(AbstractPrestoSparkQueryExecution.java:607)
at com.facebook.presto.spark.execution.AbstractPrestoSparkQueryExecution.execute(AbstractPrestoSparkQueryExecution.java:430)
at com.facebook.presto.spark.launcher.PrestoSparkRunner.execute(PrestoSparkRunner.java:181)
at com.facebook.presto.spark.launcher.PrestoSparkRunner.run(PrestoSparkRunner.java:125)
There was a problem hiding this comment.
@mlyublena I think this may be the culprit, we are directly using the output from this function to populate the aggregation stats, which can be a NaN.
| PlanNode childNode = planNode.getSources().get(0); | ||
| PlanNodeStats childNodeStats = planNodeStatsMap.get(childNode.getId()); | ||
| if (childNodeStats != null) { | ||
| double partialAggregationInputBytes = adjustedOutputBytes(childNode, childNodeStats); |
There was a problem hiding this comment.
@mlyublena where the adjustedOutputBytes are directly used to populate the PartialAggregationStatistics
There was a problem hiding this comment.
thanks @feilong-liu , that is indeed the fix, I wanted to investigate the case where this happens so I can test that it is fixed. The problem was when doing a GROUP BY on the partition key, the partition key is not materialized in the page sent through partial aggregation, so the byte count is 0 (and then becomes negative because we exclude the hash variables). So the problematic queries were of this pattern:
select ds from lineitem group by ds;
new PR here:
#21502
| PlanNodeStats childNodeStats = planNodeStatsMap.get(childNode.getId()); | ||
| if (childNodeStats != null) { | ||
| double partialAggregationInputBytes = adjustedOutputBytes(childNode, childNodeStats); | ||
| return new PartialAggregationStatistics(Estimate.of(partialAggregationInputBytes), |
There was a problem hiding this comment.
isNaN(partialAggregationInputBytes) ? Estimate.unknown() : Estimate.of(partialAggregationInputBytes)
Until now HBO tracked only the final input and output sizes of evaluating the aggregate, but not the result of partial agg evaluation. This may lead to incorrect estimation of data size reduction when the final aggregate reduces data size, but the partial aggregate does not. This changelist adds the ability to track input and output sizes of the partial aggregate to help make a better decision of when to split an aggregate into partial and final.
Example partial agg stats:
This PR also adds a new flag use_partial_aggregation_history to enable using the new statistics during optimization.
Description
Most of the non-trivial changes are in the following classes:
Most of the other changes are due to changing the signature of the AggregationNode constructor to take an optional aggregationId argument.
Motivation and Context
Impact
Test Plan
Unit tests and verification on production queries
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.