Skip to content

feat(native-pos): Add capability for parallel shuffle checksum#27078

Merged
shrinidhijoshi merged 1 commit intoprestodb:masterfrom
shrinidhijoshi:export-D92208394
Feb 11, 2026
Merged

feat(native-pos): Add capability for parallel shuffle checksum#27078
shrinidhijoshi merged 1 commit intoprestodb:masterfrom
shrinidhijoshi:export-D92208394

Conversation

@shrinidhijoshi
Copy link
Copy Markdown
Collaborator

@shrinidhijoshi shrinidhijoshi commented Feb 3, 2026

Differential Revision: D92208394

Introduce a driverId-aware rows() interface on shuffle serialized pages to enable per-consumer checksum tracking in parallel shuffle reads.

== NO RELEASE NOTES ==

@shrinidhijoshi shrinidhijoshi requested review from a team as code owners February 3, 2026 23:04
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Feb 3, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 3, 2026

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adds driver-aware row access in shuffle pages to support per-consumer (parallel) checksum tracking, while keeping the local shuffle implementation compatible via a defaulted driverId parameter.

Sequence diagram for ShuffleRead getOutput with driver-aware rows

sequenceDiagram
  participant SR as ShuffleRead
  participant OC as OperatorCtx
  participant DC as DriverCtx
  participant SP as ShuffleSerializedPage
  participant LSP as LocalShuffleSerializedPage

  SR->>OC: operatorCtx()
  OC-->>SR: OperatorCtx*
  SR->>OC: driverCtx()
  OC-->>SR: DriverCtx*
  SR->>DC: get driverId
  DC-->>SR: driverId int32_t

  loop for each page in currentPages_
    SR->>SP: rows(driverId)
    activate SP
    SP-->>SR: std::vector std::string_view&
    deactivate SP
  end

  note over SP,LSP: LocalShuffleSerializedPage implements rows(driverId = 0) and ignores driverId
Loading

Class diagram for driver-aware ShuffleSerializedPage rows

classDiagram
  class ShuffleSerializedPage {
    <<abstract>>
    +rows(driverId int32_t = 0) std::vector~std::string_view~&
  }

  class LocalShuffleSerializedPage {
    -rows_ std::vector~std::string_view~
    -buffer_ velox::BufferPtr
    +LocalShuffleSerializedPage(rows std::vector~std::string_view~, buffer velox::BufferPtr)
    +rows(driverId int32_t = 0) std::vector~std::string_view~&
  }

  class ShuffleRead {
    -currentPages_ std::vector~std::unique_ptr~<ShuffleSerializedPage>~
    -rows_ std::vector~std::string_view~
    +getOutput() RowVectorPtr
  }

  class OperatorCtx {
    +driverCtx() DriverCtx*
  }

  class DriverCtx {
    +driverId int32_t
  }

  ShuffleSerializedPage <|-- LocalShuffleSerializedPage
  ShuffleRead --> ShuffleSerializedPage : uses
  ShuffleRead --> OperatorCtx : operatorCtx()
  OperatorCtx --> DriverCtx : driverCtx()
Loading

File-Level Changes

Change Details Files
Make ShuffleSerializedPage row access driver-aware to support per-driver checksum tracking while preserving the existing single-consumer interface via a default parameter.
  • Extend the ShuffleSerializedPage::rows interface to accept an optional driverId parameter with default 0, updating the virtual method signature and documenting its semantics.
  • Update ShuffleRead to retrieve the current operator driverId from the DriverCtx and pass it down when accessing rows from ShuffleSerializedPage instances.
  • Adjust LocalShuffleSerializedPage to implement the new rows(int32_t driverId) interface, ignoring driverId for local shuffle while maintaining existing behavior.
presto-native-execution/presto_cpp/main/operators/ShuffleInterface.h
presto-native-execution/presto_cpp/main/operators/ShuffleRead.cpp
presto-native-execution/presto_cpp/main/operators/LocalShuffle.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider avoiding a default argument on the virtual ShuffleSerializedPage::rows interface, since it can hide call sites that should explicitly pass driverId and makes future interface changes harder to track; a non-defaulted parameter would force all callers to be intentional about the driver context.
  • Instead of using the magic value 0 as the legacy driverId sentinel, consider introducing a named constant or enum to clarify its meaning and reduce the risk of misuse in future callers or implementations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider avoiding a default argument on the virtual `ShuffleSerializedPage::rows` interface, since it can hide call sites that should explicitly pass `driverId` and makes future interface changes harder to track; a non-defaulted parameter would force all callers to be intentional about the driver context.
- Instead of using the magic value `0` as the legacy `driverId` sentinel, consider introducing a named constant or enum to clarify its meaning and reduce the risk of misuse in future callers or implementations.

## Individual Comments

### Comment 1
<location> `presto-native-execution/presto_cpp/main/operators/LocalShuffle.cpp:115` </location>
<code_context>
       : rows_{std::move(rows)}, buffer_{std::move(buffer)} {}

-  const std::vector<std::string_view>& rows() override {
+  const std::vector<std::string_view>& rows(int32_t /*driverId*/ = 0) override {
     return rows_;
   }
</code_context>

<issue_to_address>
**suggestion:** Consider simplifying the unused parameter handling and avoiding a redundant default in the override.

You can rely on the base-class default and just name the unused parameter, e.g. `rows(int32_t /*driverId*/)` or `rows(int32_t driverId [[maybe_unused]])`. This avoids duplicate defaults that could drift apart and keeps the override’s signature clearer.

```suggestion
  const std::vector<std::string_view>& rows(int32_t /*driverId*/) override {
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

shrinidhijoshi added a commit to shrinidhijoshi/presto that referenced this pull request Feb 4, 2026
…odb#27078)

Summary: Pull Request resolved: prestodb#27078

Differential Revision: D92208394
…odb#27078)

Summary: Pull Request resolved: prestodb#27078

Differential Revision: D92208394
@shrinidhijoshi shrinidhijoshi merged commit d23b021 into prestodb:master Feb 11, 2026
84 of 85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants