Track result sizes of partial aggregate evaluation in HBO by mlyublena · Pull Request #21160 · prestodb/presto

mlyublena · 2023-10-16T20:26:43Z

Until now HBO tracked only the final input and output sizes of evaluating the aggregate, but not the result of partial agg evaluation. This may lead to incorrect estimation of data size reduction when the final aggregate reduces data size, but the partial aggregate does not. This changelist adds the ability to track input and output sizes of the partial aggregate to help make a better decision of when to split an aggregate into partial and final.

Example partial agg stats:

      "partialAggregationStatsEstimate" : {
        "inputBytes" : 0.0,
        "outputBytes" : 23976.0,
        "inputRowCount" : 5.6717574E7,
        "outputRowCount" : 2664.0
      }

This PR also adds a new flag use_partial_aggregation_history to enable using the new statistics during optimization.

Description

Most of the non-trivial changes are in the following classes:

PartialAggregationStatistics, PlanStatistics, PartialAggregationStatsEstimate, PlanNodeStatsEstimate
- add partial aggregation statistics/estimate
PushPartialAggregationThroughExchange
- when splitting an aggregate into partial and final, make sure they are assigned the same aggregation id
- if available, use partial aggregation statistics from the stats estimate instead of the final node input/output bytes
HistoryBasedPlanStatisticsTracker.java
- implement logic for storing partial aggregation statistics inside the final aggregation node

Most of the other changes are due to changing the signature of the AggregationNode constructor to take an optional aggregationId argument.

Motivation and Context

Impact

Test Plan

Unit tests and verification on production queries

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Extends HBO to track statistics of execution partial aggregation nodes
* Introduces a new flag use_partial_aggregation_history, which controls whether or not partial aggregation histories are used in deciding whether to split aggregates into partial and final
* Change PushPartialAggregationThroughExchange optimization to use partial aggregation statistics when available, and when use_partial_aggregation_history=true
* When partial aggregation statistics are used in PushPartialAggregationThroughExchange, the optimization also applies to multi-key aggregations (as opposed to single-key only when CBO is used)

mlyublena · 2023-10-17T23:22:19Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

remove debug printf

feilong-liu · 2023-10-18T18:50:20Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

Isn't the inputBytes here the same as the inputBytes above?

yes, I think you are right, the inputs are the same and only the outputs are different (one is the output of partial agg, and the other output of the final agg)

feilong-liu · 2023-10-18T18:53:57Z

presto-main/src/main/java/com/facebook/presto/cost/AggregationNodeStatsEstimate.java

The input bytes is available by checking the output of the aggregation input, hence not necessary here?

actually these are different: in one case it's the input to the final agg (available from the child node), in the other case it's the input to the partial agg (which is a node several levels down the query tree: we'll cache it at the level of the final agg)

feilong-liu · 2023-10-18T18:55:14Z

presto-main/src/main/java/com/facebook/presto/cost/AggregationNodeStatsEstimate.java

I think this is the output of the partial aggregation node? Maybe either rename this to partialAggregationOutputBytes or rename this class to PartialAggregationNodeStatsEstimate to make it clearer?

makes sense, will rename

feilong-liu · 2023-10-18T20:36:28Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

This whole block can be moved to the beginning of the for loop?

feilong-liu · 2023-10-18T20:37:30Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

The for loop can exit early once a match is found?

feilong-liu · 2023-10-18T20:50:09Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

Is it possible to have two aggregations with the same grouping sets hence match a different one?

We need a better way to find which partial aggregate corresponds to which final one - I found out that there are cases where the grouping columns get renamed in the middle (perhaps by a different transformation) and we no longer can match the aggregation nodes based on grouping columns.
I'm thinking matching them either by planNode.sourceLocation or introducing another variable in aggregation nodes aggregationId to know which 2 nodes came from the same original aggregate

I introduced a new field aggregationId in AggregationNode: it's normally empty, but when an aggregate is split into partial and final, the new nodes get the same id so they can be matched later

mlyublena · 2023-10-25T18:27:42Z

presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java

arhimondr · 2023-11-10T20:13:33Z

presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java

nit: It's usually more convenient when config names and session property names are consistent. Also I remember there used to be a rule of thumb to call boolean properties as ...-enabled.

What do you think about history_based_partial_aggregation_optimization_enabled (optimizer.history-based-partial-aggregation-optimization-enabled) (or something along the line to keep it close to how HISTORY_BASED_SCALED_WRITER is called)?

that makes sense.
The only caveat here is that this optimization was already history-based, but now we use the statistics from the partial aggregation instead of the final aggregation. I added the flag to avoid possible regressions and be able to gradually deploy.
If you have more name suggestions let me know :)

Make sense. Let's keep the name. However It may still be worth to have a consistent session property name (use_partial_aggregation_history) and config property name (optimizer.use-partial-aggregation-history)

arhimondr · 2023-11-10T20:25:27Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveHistoryBasedStatsTracking.java

nit: @Test?

arhimondr · 2023-11-10T20:40:41Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

nit: remove

arhimondr · 2023-11-10T20:43:49Z

presto-main/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java

nit: reformat

arhimondr · 2023-11-10T21:02:57Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

nit: maybe add a helper method isUnknown to avoid comparing the references (it can be potentially fragile)

feilong-liu · 2023-11-11T06:50:50Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

Why it's named partialAggregationStatsInfo, my understanding is that aggregationNodeStats stores the information of final aggregation node?

we cache historical partial stats execution at the level of the final aggregation because the partial agg is not part of the canonical plan

aggregationNodeStats is a helper structure where we accumulate information about results of aggregation execution: the statistics comes from the partial agg node but ends up being tracked in the final agg node because the partial agg is not part of the canonical plan

oops, I misread your comment @feilong-liu
you are right, the variable should be named finalAggregationStatsInfo, will fix that in the code

feilong-liu · 2023-11-11T06:51:34Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

planStatisticsFinalAgg comes from partialAggregationStatsInfo, this is confusing.

will fix the naming

feilong-liu · 2023-11-11T06:55:35Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

why !outputVariables.isEmpty() here?

there was an assumption that if there is at least one output row, there must be at least one output byte.
This is not the case with partial aggs that don't project any new columns (for example "select count(*)"): in that case the output bytes were tracked as 0 and we ended up replacing it with NaN which raises an exception later on
I'll add some comments

@mlyublena I think this may be the culprit, we are directly using the output from this function to populate the aggregation stats, which can be a NaN.

feilong-liu · 2023-11-11T06:57:41Z

presto-main/src/main/java/com/facebook/presto/cost/PartialAggregationStatsEstimate.java

should be equal check?

thanks, introduced this in the last refactoring
will fix!

feilong-liu · 2023-11-11T07:01:02Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

why removing this?

I moved this check to the function partialAggregationNotUseful
The original PR which introduced cost-based reasoning for partial aggs was conservative and only allowed skipping partial aggs for single-column GROUP BY-s because of possible estimation errors with multi-key aggs:
https://github.com/prestodb/presto/pull/16175/files#r657520172

If we are tracking partial history in HBO, we don't need to be that conservative and can allow more cases

Thanks for clarification. I guess we will still keep this behaviour for CBO and only extend for HBO right?

feilong-liu · 2023-11-11T17:56:19Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

Use partial aggregation stats when it's unknown?

this was a mistake on my part: the function was actually checking for isNotUnknown: I fixed the naming and the semantics

feilong-liu · 2023-11-11T17:56:35Z

...va/com/facebook/presto/sql/planner/iterative/rule/PushPartialAggregationThroughExchange.java

Why single-key aggregation is special?

see comment above: the original implementation of this optimizer had this to avoid mis-estimation with multi-key aggregates in CBO. I moved the check inside the function and relaxed it if we know we're using partial HBO stats

feilong-liu

Look good. Remember to fill the PR description field and release note etc.

arhimondr · 2023-11-15T19:41:55Z

presto-main/src/main/java/com/facebook/presto/SystemSessionProperties.java

Make sense. Let's keep the name. However It may still be worth to have a consistent session property name (use_partial_aggregation_history) and config property name (optimizer.use-partial-aggregation-history)

Until now HBO tracked only the final input and output sizes of evaluating the aggregate, but not the result of partial agg evaluation. This may lead to incorrect estimation of data size reduction when the final aggregate reduces data size, but the partial aggregate does not. This changelist adds the ability to track input and output sizes of the partial aggregate to help make a better decision of when to split an aggregate into partial and final. This change adds a new field aggregationId to AggregationNode to track split partial/final aggregate pairs, and uses that during history tracking to record partial agg execution details. The new statistic estimates now contains the following for Final aggregation nodes: "partialAggregationStatsEstimate" : { "inputBytes" : 0.0, "outputBytes" : 23976.0, "inputRowCount" : 5.6717574E7, "outputRowCount" : 2664.0 } This CL also changes the PushPartialAggregationThroughExchange optimizer to use the partial agg statistics when available. In addition, the following modifications to the PushPartialAggregationThroughExchange optimization were made: - when using partial aggregation statistics, apply optimization to multi-key aggregates. The original optimization only triggers for single-key aggregate nodes. - Use rows instead of bytes when use_partial_aggregation_history flag is on

vermapratyush · 2023-11-20T18:00:05Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

+            return new PartialAggregationStatistics(Estimate.of(partialAggregationInputBytes),
+                    Estimate.of(outputBytes),
+                    Estimate.of(childNodeStats.getPlanNodeOutputPositions()),
+                    Estimate.of(outputPositions));


@mlyublena In the following line Estimate.of(nan) throws IllegalArgumentException in Presto-on-Spark. Can you please add the necessary handling, please?

stack trace for reference

java.lang.IllegalArgumentException: value is NaN at com.facebook.presto.spi.statistics.Estimate.of(Estimate.java:54) at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.constructAggregationNodeStatistics(HistoryBasedPlanStatisticsTracker.java:250) at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.getQueryStats(HistoryBasedPlanStatisticsTracker.java:164) at com.facebook.presto.event.QueryMonitor.queryCompletedEvent(QueryMonitor.java:267) at com.facebook.presto.spark.execution.AbstractPrestoSparkQueryExecution.queryCompletedEvent(AbstractPrestoSparkQueryExecution.java:607) at com.facebook.presto.spark.execution.AbstractPrestoSparkQueryExecution.execute(AbstractPrestoSparkQueryExecution.java:430) at com.facebook.presto.spark.launcher.PrestoSparkRunner.execute(PrestoSparkRunner.java:181) at com.facebook.presto.spark.launcher.PrestoSparkRunner.run(PrestoSparkRunner.java:125)

feilong-liu · 2023-12-05T20:51:47Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

@mlyublena I think this may be the culprit, we are directly using the output from this function to populate the aggregation stats, which can be a NaN.

feilong-liu · 2023-12-05T20:52:30Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

+        PlanNode childNode = planNode.getSources().get(0);
+        PlanNodeStats childNodeStats = planNodeStatsMap.get(childNode.getId());
+        if (childNodeStats != null) {
+            double partialAggregationInputBytes = adjustedOutputBytes(childNode, childNodeStats);


@mlyublena where the adjustedOutputBytes are directly used to populate the PartialAggregationStatistics

thanks @feilong-liu , that is indeed the fix, I wanted to investigate the case where this happens so I can test that it is fixed. The problem was when doing a GROUP BY on the partition key, the partition key is not materialized in the page sent through partial aggregation, so the byte count is 0 (and then becomes negative because we exclude the hash variables). So the problematic queries were of this pattern:

select ds from lineitem group by ds;

new PR here:
#21502

feilong-liu · 2023-12-05T21:39:06Z

presto-main/src/main/java/com/facebook/presto/cost/HistoryBasedPlanStatisticsTracker.java

+        PlanNodeStats childNodeStats = planNodeStatsMap.get(childNode.getId());
+        if (childNodeStats != null) {
+            double partialAggregationInputBytes = adjustedOutputBytes(childNode, childNodeStats);
+            return new PartialAggregationStatistics(Estimate.of(partialAggregationInputBytes),


isNaN(partialAggregationInputBytes) ? Estimate.unknown() : Estimate.of(partialAggregationInputBytes)

mlyublena force-pushed the hbo-partial-agg branch from cf87d90 to ddf42a9 Compare October 16, 2023 21:44

mlyublena requested review from feilong-liu and pranjalssh October 17, 2023 23:12

mlyublena marked this pull request as ready for review October 17, 2023 23:12

mlyublena requested review from a team and shrinidhijoshi as code owners October 17, 2023 23:12

mlyublena requested a review from presto-oss October 17, 2023 23:12

mlyublena commented Oct 17, 2023

View reviewed changes

feilong-liu reviewed Oct 18, 2023

View reviewed changes

mlyublena force-pushed the hbo-partial-agg branch 3 times, most recently from 81aa6ab to 434afbb Compare October 20, 2023 23:59

mlyublena commented Oct 25, 2023

View reviewed changes

presto-spi/src/main/java/com/facebook/presto/spi/plan/AggregationNode.java Outdated

Copy link

Contributor Author

mlyublena Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

mlyublena force-pushed the hbo-partial-agg branch from c0d5d75 to dcfb9aa Compare October 25, 2023 19:28

mlyublena requested a review from arhimondr October 27, 2023 20:33

mlyublena force-pushed the hbo-partial-agg branch 4 times, most recently from ad7e1f0 to e031a92 Compare November 1, 2023 20:59

mlyublena requested a review from feilong-liu November 1, 2023 21:02

mlyublena force-pushed the hbo-partial-agg branch 2 times, most recently from b7c15c8 to 5f477c9 Compare November 1, 2023 23:46

mlyublena requested a review from NikhilCollooru November 2, 2023 17:22

mlyublena force-pushed the hbo-partial-agg branch 4 times, most recently from fa85601 to 923a13d Compare November 8, 2023 22:03

mlyublena force-pushed the hbo-partial-agg branch from 1420b17 to 9979da6 Compare November 10, 2023 19:56

arhimondr reviewed Nov 10, 2023

View reviewed changes

mlyublena force-pushed the hbo-partial-agg branch from 9979da6 to 364b520 Compare November 10, 2023 22:14

feilong-liu reviewed Nov 12, 2023

View reviewed changes

mlyublena force-pushed the hbo-partial-agg branch from 364b520 to 6c4dfb5 Compare November 14, 2023 00:40

feilong-liu approved these changes Nov 14, 2023

View reviewed changes

mlyublena force-pushed the hbo-partial-agg branch from 6c4dfb5 to 684e653 Compare November 14, 2023 22:41

arhimondr approved these changes Nov 15, 2023

View reviewed changes

mlyublena force-pushed the hbo-partial-agg branch from 684e653 to 84203b8 Compare November 15, 2023 19:51

mlyublena force-pushed the hbo-partial-agg branch from 84203b8 to 9483dad Compare November 15, 2023 21:19

arhimondr approved these changes Nov 15, 2023

View reviewed changes

arhimondr merged commit 27c6d9e into prestodb:master Nov 15, 2023

vermapratyush reviewed Nov 20, 2023

View reviewed changes

mlyublena mentioned this pull request Nov 29, 2023

Add a flag to disable tracking of partial histories #21453

Merged

6 tasks

feilong-liu reviewed Dec 5, 2023

View reviewed changes

wanglinsong mentioned this pull request Dec 8, 2023

Add release notes for 0.285 #21500

Closed

26 tasks

Conversation

mlyublena commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlyublena commented Oct 16, 2023 •

edited

Loading

mlyublena Nov 14, 2023 •

edited

Loading