Skip to content

feat: Add session property for dynamic merge join output batching#27086

Merged
tanjialiang merged 1 commit intoprestodb:masterfrom
tanjialiang:export-D92302366
Feb 10, 2026
Merged

feat: Add session property for dynamic merge join output batching#27086
tanjialiang merged 1 commit intoprestodb:masterfrom
tanjialiang:export-D92302366

Conversation

@tanjialiang
Copy link
Copy Markdown
Contributor

@tanjialiang tanjialiang commented Feb 5, 2026

Summary: Dynamic Merge Join Output Batching introduces a session property, merge_join_output_batch_start_size, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets.

Differential Revision: D92302366

Summary by Sourcery

Introduce a configurable starting batch size for native MergeJoin output to enable optional dynamic output batching based on observed row sizes.

New Features:

  • Add a native session property to control the initial MergeJoin output batch size and enable or disable dynamic adjustment of output batches.

Documentation:

  • Document the new MergeJoin output batch start size session property in the Presto native execution session properties reference.

Tests:

  • Extend SessionProperties tests to cover the new MergeJoin output batch start size mapping.

@tanjialiang tanjialiang requested review from a team, elharo and steveburnett as code owners February 5, 2026 18:07
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 5, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 5, 2026

Reviewer's Guide

Adds a new session property to control the initial MergeJoin output batch size and wires it through native Presto C++ session properties, the Java native worker session property provider, and the C++/Velox config mapping, enabling dynamic MergeJoin output batch sizing to be configured per session.

Class diagram for updated Cpp SessionProperties with MergeJoin batch start size

classDiagram

class SessionProperties {
  +static const char* kMaxOutputBatchRows
  +static const char* kMergeJoinOutputBatchStartSize
  +SessionProperties()
  +void addSessionProperty(const char* name, const char* description, Type* type, bool isHidden, const std::string& configKey, const std::string& defaultValue)
}
Loading

Class diagram for updated NativeWorkerSessionPropertyProvider with MergeJoin batch start size

classDiagram

class NativeWorkerSessionPropertyProvider {
  +static final String NATIVE_USE_VELOX_GEOSPATIAL_JOIN
  +static final String NATIVE_AGGREGATION_COMPACTION_BYTES_THRESHOLD
  +static final String NATIVE_AGGREGATION_COMPACTION_UNUSED_MEMORY_RATIO
  +static final String NATIVE_MERGE_JOIN_OUTPUT_BATCH_START_SIZE
  -List<PropertyMetadata<?>> sessionProperties
  +NativeWorkerSessionPropertyProvider(FeaturesConfig featuresConfig)
}
Loading

File-Level Changes

Change Details Files
Introduce a native session property key and wiring for MergeJoin initial output batch size in C++ SessionProperties.
  • Define kMergeJoinOutputBatchStartSize constant with documentation in SessionProperties.h.
  • Register the merge_join_output_batch_start_size session property in SessionProperties.cpp with integer type, defaulting to the corresponding QueryConfig value.
  • Document the semantics as disabling dynamic adjustment when set to zero and using preferredOutputBatchRows as fixed batch size.
presto-native-execution/presto_cpp/main/SessionProperties.h
presto-native-execution/presto_cpp/main/SessionProperties.cpp
Expose the MergeJoin output batch start size as a Java session property for native workers.
  • Add NATIVE_MERGE_JOIN_OUTPUT_BATCH_START_SIZE constant to NativeWorkerSessionPropertyProvider.
  • Register an integer session property with default 0 and a description matching the C++ property, gated by !nativeExecution.
presto-main-base/src/main/java/com/facebook/presto/sessionpropertyproviders/NativeWorkerSessionPropertyProvider.java
Extend the session-to-QueryConfig mapping tests to cover the new MergeJoin output batch start size property.
  • Add the kMergeJoinOutputBatchStartSize to expectedMappings with core::QueryConfig::kMergeJoinOutputBatchStartSize in SessionPropertiesTest.
presto-native-execution/presto_cpp/main/tests/SessionPropertiesTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

@steveburnett
Copy link
Copy Markdown
Contributor

Please add a release note to pass the (not required, but failing) CI check.

Copy link
Copy Markdown
Member

@zacw7 zacw7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for making this configurable!

tanjialiang added a commit to tanjialiang/presto that referenced this pull request Feb 9, 2026
…estodb#27086)

Summary:

Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets.

Differential Revision: D92302366
tanjialiang added a commit to tanjialiang/presto that referenced this pull request Feb 9, 2026
…estodb#27086)

Summary:

Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets.

Differential Revision: D92302366
…estodb#27086)

Summary:

Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets.

Differential Revision: D92302366
@tanjialiang tanjialiang merged commit 277d03c into prestodb:master Feb 10, 2026
88 of 91 checks passed
@hantangwangd
Copy link
Copy Markdown
Member

Hi @tanjialiang, thanks for this PR! As part of the release process — do you think this change warrants a release note? If so, would you like to add one? Happy to help if you'd prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants