feat(native-pos): Add capability for parallel shuffle checksum#27078
Merged
shrinidhijoshi merged 1 commit intoprestodb:masterfrom Feb 11, 2026
Merged
feat(native-pos): Add capability for parallel shuffle checksum#27078shrinidhijoshi merged 1 commit intoprestodb:masterfrom
shrinidhijoshi merged 1 commit intoprestodb:masterfrom
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideAdds driver-aware row access in shuffle pages to support per-consumer (parallel) checksum tracking, while keeping the local shuffle implementation compatible via a defaulted driverId parameter. Sequence diagram for ShuffleRead getOutput with driver-aware rowssequenceDiagram
participant SR as ShuffleRead
participant OC as OperatorCtx
participant DC as DriverCtx
participant SP as ShuffleSerializedPage
participant LSP as LocalShuffleSerializedPage
SR->>OC: operatorCtx()
OC-->>SR: OperatorCtx*
SR->>OC: driverCtx()
OC-->>SR: DriverCtx*
SR->>DC: get driverId
DC-->>SR: driverId int32_t
loop for each page in currentPages_
SR->>SP: rows(driverId)
activate SP
SP-->>SR: std::vector std::string_view&
deactivate SP
end
note over SP,LSP: LocalShuffleSerializedPage implements rows(driverId = 0) and ignores driverId
Class diagram for driver-aware ShuffleSerializedPage rowsclassDiagram
class ShuffleSerializedPage {
<<abstract>>
+rows(driverId int32_t = 0) std::vector~std::string_view~&
}
class LocalShuffleSerializedPage {
-rows_ std::vector~std::string_view~
-buffer_ velox::BufferPtr
+LocalShuffleSerializedPage(rows std::vector~std::string_view~, buffer velox::BufferPtr)
+rows(driverId int32_t = 0) std::vector~std::string_view~&
}
class ShuffleRead {
-currentPages_ std::vector~std::unique_ptr~<ShuffleSerializedPage>~
-rows_ std::vector~std::string_view~
+getOutput() RowVectorPtr
}
class OperatorCtx {
+driverCtx() DriverCtx*
}
class DriverCtx {
+driverId int32_t
}
ShuffleSerializedPage <|-- LocalShuffleSerializedPage
ShuffleRead --> ShuffleSerializedPage : uses
ShuffleRead --> OperatorCtx : operatorCtx()
OperatorCtx --> DriverCtx : driverCtx()
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider avoiding a default argument on the virtual
ShuffleSerializedPage::rowsinterface, since it can hide call sites that should explicitly passdriverIdand makes future interface changes harder to track; a non-defaulted parameter would force all callers to be intentional about the driver context. - Instead of using the magic value
0as the legacydriverIdsentinel, consider introducing a named constant or enum to clarify its meaning and reduce the risk of misuse in future callers or implementations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider avoiding a default argument on the virtual `ShuffleSerializedPage::rows` interface, since it can hide call sites that should explicitly pass `driverId` and makes future interface changes harder to track; a non-defaulted parameter would force all callers to be intentional about the driver context.
- Instead of using the magic value `0` as the legacy `driverId` sentinel, consider introducing a named constant or enum to clarify its meaning and reduce the risk of misuse in future callers or implementations.
## Individual Comments
### Comment 1
<location> `presto-native-execution/presto_cpp/main/operators/LocalShuffle.cpp:115` </location>
<code_context>
: rows_{std::move(rows)}, buffer_{std::move(buffer)} {}
- const std::vector<std::string_view>& rows() override {
+ const std::vector<std::string_view>& rows(int32_t /*driverId*/ = 0) override {
return rows_;
}
</code_context>
<issue_to_address>
**suggestion:** Consider simplifying the unused parameter handling and avoiding a redundant default in the override.
You can rely on the base-class default and just name the unused parameter, e.g. `rows(int32_t /*driverId*/)` or `rows(int32_t driverId [[maybe_unused]])`. This avoids duplicate defaults that could drift apart and keeps the override’s signature clearer.
```suggestion
const std::vector<std::string_view>& rows(int32_t /*driverId*/) override {
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-native-execution/presto_cpp/main/operators/LocalShuffle.cpp
Outdated
Show resolved
Hide resolved
shrinidhijoshi
added a commit
to shrinidhijoshi/presto
that referenced
this pull request
Feb 4, 2026
…odb#27078) Summary: Pull Request resolved: prestodb#27078 Differential Revision: D92208394
b793489 to
674b409
Compare
…odb#27078) Summary: Pull Request resolved: prestodb#27078 Differential Revision: D92208394
674b409 to
38bd240
Compare
xiaoxmeng
approved these changes
Feb 10, 2026
This was referenced Mar 31, 2026
15 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Differential Revision: D92208394
Introduce a driverId-aware rows() interface on shuffle serialized pages to enable per-consumer checksum tracking in parallel shuffle reads.