Optimize hash code generation for exchanges#27657
Optimize hash code generation for exchanges#27657raunaqmorarka merged 4 commits intotrinodb:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes hash code generation for data exchanges and aggregations by introducing bytecode generation and batched loop processing through a new NullSafeHashCompiler class. The optimization focuses on improving the performance of hash computation across multiple positions in a single operation rather than computing hashes one position at a time.
Key changes:
- Introduced
NullSafeHashCompilerandNullSafeHashclasses for bytecode-based hash generation with batched operations - Enhanced
InterpretedHashGeneratorto use batched hash computation with specialized handling for RLE and dictionary blocks - Added batched methods (
getBuckets,getPartitions) toBucketFunctionandPartitionFunctioninterfaces with default implementations
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| core/trino-main/src/main/java/io/trino/operator/NullSafeHashCompiler.java | New compiler class that generates bytecode for efficient batched hash computation |
| core/trino-main/src/main/java/io/trino/operator/NullSafeHash.java | New interface defining batched hash methods for single blocks |
| core/trino-main/src/main/java/io/trino/operator/InterpretedHashGenerator.java | Refactored to use batched hash operations with optimized RLE and dictionary handling |
| core/trino-main/src/main/java/io/trino/operator/FlatHashStrategyCompiler.java | Updated to delegate batched hash operations to InterpretedHashGenerator |
| core/trino-spi/src/main/java/io/trino/spi/connector/BucketFunction.java | Added getBuckets method for batch bucket computation |
| core/trino-main/src/main/java/io/trino/operator/PartitionFunction.java | Added getPartitions method for batch partition computation |
| core/trino-main/src/main/java/io/trino/operator/HashGenerator.java | Added hash and getPartitions methods for batch operations |
| core/trino-main/src/main/java/io/trino/operator/output/PagePartitioner.java | Simplified to use batched partition computation, removing dictionary-specific optimization |
| core/trino-main/src/main/java/io/trino/sql/planner/SystemPartitioningHandle.java | Updated to use NullSafeHashCompiler instead of TypeOperators |
| plugin/trino-hive/src/main/java/io/trino/plugin/hive/HivePageSink.java | Updated to use batched bucket computation |
| Various test files | Updated test setup to provide NullSafeHashCompiler instances |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
core/trino-main/src/main/java/io/trino/operator/InterpretedHashGenerator.java
Show resolved
Hide resolved
|
Started benchmark workflow for this PR with test type =
|
|
Started benchmark workflow for this PR with test type =
|
c7c6711 to
0518ffa
Compare
pettyjamesm
left a comment
There was a problem hiding this comment.
Mostly LGTM, some small comments and I do think there are some places where the temporary allocations might be ripe for further improvements.
| for (int position = 0; position < partitionPage.getPositionCount(); position++) { | ||
| int partition = partitionFunction.getPartition(partitionPage, position); | ||
| partitionAssignments[partition].add(position); | ||
| partitionAssignments[partitions[position]].add(position); |
There was a problem hiding this comment.
There's probably an improvement to be gained by fusing this loop with partitionAssignments.getPartitions, but we can worry about that as a follow up.
There was a problem hiding this comment.
That is better explored as a follow-up, I expect that might need a different method in PartitionFunction
|
|
||
| default void getPartitions(int partitionCount, int positionOffset, Page page, int length, int[] partitions) | ||
| { | ||
| long[] hashes = new long[length]; |
There was a problem hiding this comment.
Reusing this array is likely to be beneficial for most operator use cases, since allocating a new instance on each invocation is non-trivial allocation pressure. Having this default implementation seems like a performance hazard
There was a problem hiding this comment.
The allocation here is very short-lived and the JVM is pretty good at optimizing for that. Trying to reuse array adds some complexity as it has to be passed down from the calling operator, where it potentially needs to be tracked as a retained memory allocation. Since we didn't observe a problem with this in production for a while, I'm inclined to keep it simple for now and explore reuse as a follow-up.
core/trino-main/src/main/java/io/trino/operator/InterpretedHashGenerator.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/FlatHashStrategyCompiler.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/operator/NullSafeHashCompiler.java
Outdated
Show resolved
Hide resolved
|
I also got a chance to run the micro benchmark that I had performed for my PR(#27610) on your PR and the results align with what was stated on my PR. Thank you @raunaqmorarka. |
Introduce batched implementations for hashing pages for exchange
0518ffa to
ef51fdc
Compare
Use byte code generation to avoid megamorphic call sites in hash code generation
ef51fdc to
01014c5
Compare
Uses separate loop per type
01014c5 to
70c2e76
Compare
70c2e76 to
8009b5c
Compare
|
Started benchmark workflow for this PR with test type =
|
|
Started benchmark workflow for this PR with test type =
|


Description
Optimize InterpretedHashGenerator using byte code generation and batched loops.
This also brings optimised handling of dictionaries to FlatGroupByHash
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: