Skip to content

Fix bug in tracking partial aggregation stats tracking when the query is a group by on a partitioning key#21502

Merged
mlyublena merged 1 commit intoprestodb:masterfrom
mlyublena:hbo-partial-agg-bug-fix
Dec 13, 2023
Merged

Fix bug in tracking partial aggregation stats tracking when the query is a group by on a partitioning key#21502
mlyublena merged 1 commit intoprestodb:masterfrom
mlyublena:hbo-partial-agg-bug-fix

Conversation

@mlyublena
Copy link
Contributor

@mlyublena mlyublena commented Dec 8, 2023

Description

During tracking of histories, we adjust the output byte count to account for hash variables introduced during planning. In certain cases, the reported byte count is 0. For example in queries like the following, the grouping key is a partition key, its value does not contribute to the byte count:

select ds from lineitem group by ds;

After accounting for hash variables, the 0 becomes a negative number, which raises a NaN exception that looks like this:

java.lang.IllegalArgumentException: value is NaN
	at com.facebook.presto.spi.statistics.Estimate.of(Estimate.java:54)
	at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.constructAggregationNodeStatistics(HistoryBasedPlanStatisticsTracker.java:250)
	at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.getQueryStats(HistoryBasedPlanStatisticsTracker.java:164)

This PR fixes that by turning NaN values into unknown like it is done in a few other places of the HBO tracker code.

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this? Can we keep this test, and add a new test instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why partition key is special? For a non partition key, for example, select orderpriority from test_orders group by orderpriority, does it have the same problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it is special because it doesn't get materialized in the output until later (so doesn't contribute to the byte count). We make an incorrect assumption that since it is part of the output column list, its size has been accounted for in the byte count estimate for the input node

@mlyublena mlyublena force-pushed the hbo-partial-agg-bug-fix branch 2 times, most recently from 8c6f43b to 7b1aecd Compare December 12, 2023 21:25
@mlyublena mlyublena requested a review from pranjalssh December 12, 2023 22:49
Copy link
Contributor

@pranjalssh pranjalssh Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a method Estimate.estimateFromDouble which does this. Current naming is terrible as we have methods like Estimate.of as well. We can change these names to be more explicit of what they do and use right methods.

…up-by on the partitioning key

During tracking of histories, we adjust the output byte count to account for hash variables introduced during planning.
In certain cases, the reported byte count is 0 (for example when the grouping key is a partition key, its value does not contribute to the byte count). After accounting for hash variables, the 0 becomes a negative number,
which raises a NaN exception that looks like this:
java.lang.IllegalArgumentException: value is NaN
	at com.facebook.presto.spi.statistics.Estimate.of(Estimate.java:54)
	at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.constructAggregationNodeStatistics(HistoryBasedPlanStatisticsTracker.java:250)
	at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.getQueryStats(HistoryBasedPlanStatisticsTracker.java:164)

This PR fixes that by turning NaN values into unknown like it is done in a few other places of the HBO tracker code.
@mlyublena mlyublena force-pushed the hbo-partial-agg-bug-fix branch from 7b1aecd to e456ae9 Compare December 12, 2023 23:42
@mlyublena mlyublena merged commit fae3a7d into prestodb:master Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants