Skip to content

fix(optimizer): Fix hash function for prefilter groupby limit#27033

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:xxhash-for-groupby-limit-opt
Jan 26, 2026
Merged

fix(optimizer): Fix hash function for prefilter groupby limit#27033
feilong-liu merged 1 commit intoprestodb:masterfrom
kaikalur:xxhash-for-groupby-limit-opt

Conversation

@kaikalur
Copy link
Copy Markdown
Contributor

@kaikalur kaikalur commented Jan 26, 2026

Fixup the hash code used for this optimization

Description

We were using hashcode which is avialble only in java. Changed it to use the same function that we use for join prefilter which works for both java and cpp.

If release note is NOT required, use:

== NO RELEASE NOTE ==

Summary by Sourcery

Use a shared hash computation utility for join and aggregation prefilters to ensure consistent hashing across Java and C++ implementations.

Bug Fixes:

  • Fix incorrect or incompatible hash computation used for prefiltering group-by limit aggregations by switching to a working XX_HASH_64-based hash.

Enhancements:

  • Extract variable hash computation into a reusable PlannerUtils helper and reuse it in join and aggregation prefilter optimizations.

@kaikalur kaikalur requested review from a team, feilong-liu and jaystarshot as code owners January 26, 2026 19:56
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 26, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Refactors and centralizes the variable hashing logic into PlannerUtils and updates join and prefilter-for-aggregation optimizations to use the shared hash function with FunctionAndTypeManager, ensuring a consistent, working hash implementation in both Java and C++ paths.

Class diagram for shared variable hash utility in planner

classDiagram
    class PlannerUtils {
        +static RowExpression getVariableHash(inputVariables, functionAndTypeManager)
    }

    class JoinPrefilter {
    }

    class JoinPrefilterRewriter {
        -boolean planChanged
        +PlanNode visitJoin(node, context)
        +boolean isPlanChanged()
        ..removed..
        -RowExpression getVariableHash(inputVariables)
    }

    class PrefilterForLimitingAggregation {
        +PlanNode addPrefilter(aggregationNode, count)
    }

    class FunctionAndTypeManager {
        +FunctionAndTypeResolver getFunctionAndTypeResolver()
    }

    class FunctionAndTypeResolver {
        +Operator XX_HASH_64
    }

    PlannerUtils ..> FunctionAndTypeManager : uses
    PlannerUtils ..> FunctionAndTypeResolver : uses via getFunctionAndTypeResolver
    JoinPrefilterRewriter ..> PlannerUtils : calls getVariableHash
    PrefilterForLimitingAggregation ..> PlannerUtils : calls getVariableHash
    JoinPrefilter *-- JoinPrefilterRewriter
Loading

File-Level Changes

Change Details Files
Refactor variable hash computation into a shared utility and update join prefilter optimization to use it via FunctionAndTypeManager.
  • Replace local getVariableHash usage in JoinPrefilter with calls that pass FunctionAndTypeManager into the shared hashing helper
  • Remove the private getVariableHash implementation from JoinPrefilter since hashing is now provided by PlannerUtils
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/JoinPrefilter.java
Introduce a reusable getVariableHash helper in PlannerUtils that builds xx_hash_64-based hash expressions (with combine_hash and orNullHashCode) for a list of variables.
  • Add public static getVariableHash(List, FunctionAndTypeManager) that constructs xx_hash_64 calls for each key
  • Combine multiple key hashes using orNullHashCode and the combine_hash function, returning a RowExpression representing the composite hash
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java
Switch prefilter-for-limiting-aggregation optimization to use the shared variable hash helper instead of its previous hash-expression builder.
  • Replace left and right hash expression construction with calls to PlannerUtils.getVariableHash using group-by keys and timed distinct limit node outputs
  • Ensure both sides of the prefilter join now use the same centralized hashing logic as other planner components
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/PrefilterForLimitingAggregation.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@kaikalur kaikalur changed the title fix(optimizer): Use a working hash function for prefilter groupby lim… fix(optimizer): Fix hash function for prefilter groupby limit Jan 26, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In PlannerUtils.getVariableHash, consider explicitly validating that inputVariables is non-empty (e.g., via a Preconditions.checkArgument) to avoid opaque failures if this utility is ever called with an empty list.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In PlannerUtils.getVariableHash, consider explicitly validating that inputVariables is non-empty (e.g., via a Preconditions.checkArgument) to avoid opaque failures if this utility is ever called with an empty list.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlannerUtils.java:582-586` </location>
<code_context>
         return new SpecialFormExpression(COALESCE, VARCHAR, ImmutableList.of(castToVarchar, concatExpression));
     }
+
+    public static RowExpression getVariableHash(List<VariableReferenceExpression> inputVariables, FunctionAndTypeManager functionAndTypeManager)
+    {
+        List<CallExpression> hashExpressionList = inputVariables.stream().map(keyVariable ->
+                callOperator(functionAndTypeManager.getFunctionAndTypeResolver(), OperatorType.XX_HASH_64, BIGINT, keyVariable)).collect(toImmutableList());
+        RowExpression hashExpression = hashExpressionList.get(0);
+        if (hashExpressionList.size() > 1) {
+            hashExpression = orNullHashCode(hashExpression);
</code_context>

<issue_to_address>
**issue:** Guard against empty inputVariables now that this helper is public and reused

`inputVariables` may be empty, which will cause `IndexOutOfBoundsException` on `hashExpressionList.get(0)`. That assumption was safe when this lived inside `JoinPrefilter`, but as a shared utility it’s more likely to be called with an empty list. Please add an explicit precondition (e.g., `checkArgument(!inputVariables.isEmpty(), "...")`) so failures are clear and occur at the call site.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@kaikalur kaikalur force-pushed the xxhash-for-groupby-limit-opt branch from 3717ccc to 4ec96f5 Compare January 26, 2026 20:10
@feilong-liu feilong-liu merged commit b11d857 into prestodb:master Jan 26, 2026
86 of 88 checks passed
@ethanyzhang ethanyzhang added the from:Meta PR from Meta label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:Meta PR from Meta

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants