fix(planner): Fix filter stats estimation corner cases#26812
Merged
aaneja merged 3 commits intoprestodb:masterfrom Feb 4, 2026
Merged
fix(planner): Fix filter stats estimation corner cases#26812aaneja merged 3 commits intoprestodb:masterfrom
aaneja merged 3 commits intoprestodb:masterfrom
Conversation
Contributor
Reviewer's GuideRefines filter statistics estimation for IN, NOT IN, and <> predicates, especially for low/unknown NDV cases, and adds regression tests to ensure filter factors remain within sane bounds and avoid zero/NaN estimates. Class diagram for updated filter statistics calculatorsclassDiagram
class ComparisonStatsCalculator {
+PlanNodeStatsEstimate estimateExpressionNotEqualToLiteral(expression, literal, expressionStatistics, inputStatistics, expressionVariable)
}
class FilterStatsCalculator {
+static double CIEL_IN_PREDICATE_UPPER_BOUND_COEFFICIENT
+static double UNKNOWN_FILTER_COEFFICIENT
+PlanNodeStatsEstimate visitInPredicate(node, context)
+PlanNodeStatsEstimate estimateIn(value, values, input, session)
}
class PlanNodeStatsEstimate {
+double outputRowCount
+PlanNodeStatsEstimate buildFrom(inputStatistics)
+PlanNodeStatsEstimate$Builder setOutputRowCount(rowCount)
+PlanNodeStatsEstimate$Builder addVariableStatistics(variable, variableStatsEstimate)
}
class VariableStatsEstimate {
+double nullsFraction
+double distinctValuesCount
+VariableStatsEstimate$Builder buildFrom(variableStatsEstimate)
+VariableStatsEstimate$Builder setNullsFraction(nullsFraction)
+VariableStatsEstimate$Builder setDistinctValuesCount(distinctValuesCount)
}
ComparisonStatsCalculator --> PlanNodeStatsEstimate : builds
ComparisonStatsCalculator --> VariableStatsEstimate : updates
FilterStatsCalculator --> PlanNodeStatsEstimate : builds
FilterStatsCalculator --> VariableStatsEstimate : reads
class StatisticRange {
+double low
+boolean lowInclusive
+double high
+boolean highInclusive
+double distinctValuesCount
}
ComparisonStatsCalculator --> StatisticRange : uses
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
aaneja
commented
Dec 16, 2025
presto-main-base/src/test/java/com/facebook/presto/cost/AbstractTestFilterStatsCalculator.java
Show resolved
Hide resolved
aaneja
commented
Dec 16, 2025
presto-main-base/src/test/java/com/facebook/presto/cost/AbstractTestFilterStatsCalculator.java
Outdated
Show resolved
Hide resolved
e55b354 to
272f4cb
Compare
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new constant name
CIEL_IN_PREDICATE_UPPER_BOUND_COEFFICIENTlooks like a typo ofCEIL; consider renaming it (and adjusting the comment wording) to avoid confusion about its intent. - The test method
tesInPredicateWithoutNDVappears to be missing atintest; renaming it would keep naming consistent and clearer. - In
estimateExpressionNotEqualToLiteral, the NDV == 1 special case is handled in two places (filterFactor andnewNDV); consider centralizing this logic or adding a brief comment tying the two together so the coupling is explicit for future maintainers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new constant name `CIEL_IN_PREDICATE_UPPER_BOUND_COEFFICIENT` looks like a typo of `CEIL`; consider renaming it (and adjusting the comment wording) to avoid confusion about its intent.
- The test method `tesInPredicateWithoutNDV` appears to be missing a `t` in `test`; renaming it would keep naming consistent and clearer.
- In `estimateExpressionNotEqualToLiteral`, the NDV == 1 special case is handled in two places (filterFactor and `newNDV`); consider centralizing this logic or adding a brief comment tying the two together so the coupling is explicit for future maintainers.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/cost/ComparisonStatsCalculator.java:110-119` </location>
<code_context>
+ double expressionNDV = expressionStatistics.getDistinctValuesCount();
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Use a consistent comparison approach for `expressionNDV` to avoid subtle floating-point discrepancies.
Here you mix `Double.compare(expressionNDV, 1D) == 0` with `expressionNDV == 1`. For doubles coming from stats, this can diverge in edge cases (`-0.0`, rounding, NaN). Please use a single, consistent pattern (e.g., `Double.compare(expressionNDV, 1D) == 0` or an epsilon-based check) everywhere `expressionNDV` is compared to 1.
Suggested implementation:
```java
double filterFactor;
double expressionNDV = expressionStatistics.getDistinctValuesCount();
if (Double.compare(expressionNDV, 1D) == 0) {
// It's hard to make a meaningful estimate when we have only one distinct value
filterFactor = UNKNOWN_FILTER_COEFFICIENT;
}
else {
filterFactor = 1 - calculateFilterFactor(expressionStatistics, filterRange);
}
PlanNodeStatsEstimate.Builder estimate = PlanNodeStatsEstimate.buildFrom(inputStatistics);
estimate.setOutputRowCount(filterFactor * (1 - expressionStatistics.getNullsFraction()) * inputStatistics.getOutputRowCount());
```
```java
if (Double.compare(expressionNDV, 1D) == 0) {
```
If there are other comparisons in this file (or related cost calculators) using `expressionNDV == 1`, `expressionNDV == 1.0`, or similar, they should also be updated to `Double.compare(expressionNDV, 1D) == 0` for consistency with this change.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/cost/ComparisonStatsCalculator.java
Show resolved
Hide resolved
Contributor
|
Nit: when you can, edit the Release Notes section of the description to |
- Typo fixes - Fix IN estimate to a lower bound of 1.0 rows
646a4c7 to
8c76b14
Compare
Contributor
Author
|
@arhimondr Can you help review ? I see from git history that you've worked on the same classes |
tdcmeehan
approved these changes
Jan 16, 2026
Contributor
tdcmeehan
left a comment
There was a problem hiding this comment.
LGTM. @arhimondr please review if you have time.
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fix for #26685
Fix for #26808
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: