Fix bug in tracking partial aggregation stats tracking when the query is a group by on a partitioning key#21502
Conversation
There was a problem hiding this comment.
Why remove this? Can we keep this test, and add a new test instead?
There was a problem hiding this comment.
Why partition key is special? For a non partition key, for example, select orderpriority from test_orders group by orderpriority, does it have the same problem?
There was a problem hiding this comment.
I believe it is special because it doesn't get materialized in the output until later (so doesn't contribute to the byte count). We make an incorrect assumption that since it is part of the output column list, its size has been accounted for in the byte count estimate for the input node
8c6f43b to
7b1aecd
Compare
There was a problem hiding this comment.
There is a method Estimate.estimateFromDouble which does this. Current naming is terrible as we have methods like Estimate.of as well. We can change these names to be more explicit of what they do and use right methods.
…up-by on the partitioning key During tracking of histories, we adjust the output byte count to account for hash variables introduced during planning. In certain cases, the reported byte count is 0 (for example when the grouping key is a partition key, its value does not contribute to the byte count). After accounting for hash variables, the 0 becomes a negative number, which raises a NaN exception that looks like this: java.lang.IllegalArgumentException: value is NaN at com.facebook.presto.spi.statistics.Estimate.of(Estimate.java:54) at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.constructAggregationNodeStatistics(HistoryBasedPlanStatisticsTracker.java:250) at com.facebook.presto.cost.HistoryBasedPlanStatisticsTracker.getQueryStats(HistoryBasedPlanStatisticsTracker.java:164) This PR fixes that by turning NaN values into unknown like it is done in a few other places of the HBO tracker code.
7b1aecd to
e456ae9
Compare
Description
During tracking of histories, we adjust the output byte count to account for hash variables introduced during planning. In certain cases, the reported byte count is 0. For example in queries like the following, the grouping key is a partition key, its value does not contribute to the byte count:
After accounting for hash variables, the 0 becomes a negative number, which raises a NaN exception that looks like this:
This PR fixes that by turning NaN values into unknown like it is done in a few other places of the HBO tracker code.
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: