fix(optimizer): Fix hash function for prefilter groupby limit#27033
Merged
feilong-liu merged 1 commit intoprestodb:masterfrom Jan 26, 2026
Merged
fix(optimizer): Fix hash function for prefilter groupby limit#27033feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu merged 1 commit intoprestodb:masterfrom
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideRefactors and centralizes the variable hashing logic into PlannerUtils and updates join and prefilter-for-aggregation optimizations to use the shared hash function with FunctionAndTypeManager, ensuring a consistent, working hash implementation in both Java and C++ paths. Class diagram for shared variable hash utility in plannerclassDiagram
class PlannerUtils {
+static RowExpression getVariableHash(inputVariables, functionAndTypeManager)
}
class JoinPrefilter {
}
class JoinPrefilterRewriter {
-boolean planChanged
+PlanNode visitJoin(node, context)
+boolean isPlanChanged()
..removed..
-RowExpression getVariableHash(inputVariables)
}
class PrefilterForLimitingAggregation {
+PlanNode addPrefilter(aggregationNode, count)
}
class FunctionAndTypeManager {
+FunctionAndTypeResolver getFunctionAndTypeResolver()
}
class FunctionAndTypeResolver {
+Operator XX_HASH_64
}
PlannerUtils ..> FunctionAndTypeManager : uses
PlannerUtils ..> FunctionAndTypeResolver : uses via getFunctionAndTypeResolver
JoinPrefilterRewriter ..> PlannerUtils : calls getVariableHash
PrefilterForLimitingAggregation ..> PlannerUtils : calls getVariableHash
JoinPrefilter *-- JoinPrefilterRewriter
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In PlannerUtils.getVariableHash, consider explicitly validating that inputVariables is non-empty (e.g., via a Preconditions.checkArgument) to avoid opaque failures if this utility is ever called with an empty list.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In PlannerUtils.getVariableHash, consider explicitly validating that inputVariables is non-empty (e.g., via a Preconditions.checkArgument) to avoid opaque failures if this utility is ever called with an empty list.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java:582-586` </location>
<code_context>
return new SpecialFormExpression(COALESCE, VARCHAR, ImmutableList.of(castToVarchar, concatExpression));
}
+
+ public static RowExpression getVariableHash(List<VariableReferenceExpression> inputVariables, FunctionAndTypeManager functionAndTypeManager)
+ {
+ List<CallExpression> hashExpressionList = inputVariables.stream().map(keyVariable ->
+ callOperator(functionAndTypeManager.getFunctionAndTypeResolver(), OperatorType.XX_HASH_64, BIGINT, keyVariable)).collect(toImmutableList());
+ RowExpression hashExpression = hashExpressionList.get(0);
+ if (hashExpressionList.size() > 1) {
+ hashExpression = orNullHashCode(hashExpression);
</code_context>
<issue_to_address>
**issue:** Guard against empty inputVariables now that this helper is public and reused
`inputVariables` may be empty, which will cause `IndexOutOfBoundsException` on `hashExpressionList.get(0)`. That assumption was safe when this lived inside `JoinPrefilter`, but as a shared utility it’s more likely to be called with an empty list. Please add an explicit precondition (e.g., `checkArgument(!inputVariables.isEmpty(), "...")`) so failures are clear and occur at the call site.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java
Show resolved
Hide resolved
…it that worns both in java and cpp
3717ccc to
4ec96f5
Compare
feilong-liu
approved these changes
Jan 26, 2026
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixup the hash code used for this optimization
Description
We were using hashcode which is avialble only in java. Changed it to use the same function that we use for join prefilter which works for both java and cpp.
If release note is NOT required, use:
Summary by Sourcery
Use a shared hash computation utility for join and aggregation prefilters to ensure consistent hashing across Java and C++ implementations.
Bug Fixes:
Enhancements: