feat: Add session property for dynamic merge join output batching#27086
feat: Add session property for dynamic merge join output batching#27086tanjialiang merged 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideAdds a new session property to control the initial MergeJoin output batch size and wires it through native Presto C++ session properties, the Java native worker session property provider, and the C++/Velox config mapping, enabling dynamic MergeJoin output batch sizing to be configured per session. Class diagram for updated Cpp SessionProperties with MergeJoin batch start sizeclassDiagram
class SessionProperties {
+static const char* kMaxOutputBatchRows
+static const char* kMergeJoinOutputBatchStartSize
+SessionProperties()
+void addSessionProperty(const char* name, const char* description, Type* type, bool isHidden, const std::string& configKey, const std::string& defaultValue)
}
Class diagram for updated NativeWorkerSessionPropertyProvider with MergeJoin batch start sizeclassDiagram
class NativeWorkerSessionPropertyProvider {
+static final String NATIVE_USE_VELOX_GEOSPATIAL_JOIN
+static final String NATIVE_AGGREGATION_COMPACTION_BYTES_THRESHOLD
+static final String NATIVE_AGGREGATION_COMPACTION_UNUSED_MEMORY_RATIO
+static final String NATIVE_MERGE_JOIN_OUTPUT_BATCH_START_SIZE
-List<PropertyMetadata<?>> sessionProperties
+NativeWorkerSessionPropertyProvider(FeaturesConfig featuresConfig)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
|
Please add a release note to pass the (not required, but failing) CI check. |
zacw7
left a comment
There was a problem hiding this comment.
LGTM. thanks for making this configurable!
…estodb#27086) Summary: Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets. Differential Revision: D92302366
…estodb#27086) Summary: Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets. Differential Revision: D92302366
…estodb#27086) Summary: Dynamic Merge Join Output Batching introduces a session property, `merge_join_output_batch_start_size`, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets. Differential Revision: D92302366
0100d32 to
e4b8aa1
Compare
|
Hi @tanjialiang, thanks for this PR! As part of the release process — do you think this change warrants a release note? If so, would you like to add one? Happy to help if you'd prefer. |
Summary: Dynamic Merge Join Output Batching introduces a session property,
merge_join_output_batch_start_size, enabling dynamic adjustment of MergeJoin output batch size based on previous output row sizes. When set to zero (default), batching remains fixed. This improves efficiency and memory usage for large datasets.Differential Revision: D92302366
Summary by Sourcery
Introduce a configurable starting batch size for native MergeJoin output to enable optional dynamic output batching based on observed row sizes.
New Features:
Documentation:
Tests: