Skip to content

Reindex Relocation: Prevent source task override task result#145947

Open
samxbr wants to merge 14 commits intoelastic:mainfrom
samxbr:reindex/store-source-result-prevent-override
Open

Reindex Relocation: Prevent source task override task result#145947
samxbr wants to merge 14 commits intoelastic:mainfrom
samxbr:reindex/store-source-result-prevent-override

Conversation

@samxbr
Copy link
Copy Markdown
Contributor

@samxbr samxbr commented Apr 8, 2026

This is a follow-up of #145488. This prevents the source reindex task overriding the result written by the destination task, effectively ensuring the destination always wins if there's a race. This is to ensure the relocation chain is preserved in the case where the source task observes another error (eg. network error if the resume response from destination to source is lost).

@samxbr samxbr added :Distributed/Reindex Issues relating to reindex that are not caused by issues further down >non-issue labels Apr 8, 2026
@samxbr samxbr marked this pull request as ready for review April 9, 2026 21:00
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team. label Apr 9, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens reindex relocation task-result persistence by ensuring the relocation destination’s write to .tasks cannot be overwritten by a racing source-node write, preserving the relocation chain even under partial failures (e.g., lost resume response).

Changes:

  • Add create-if-absent result storage (opType(CREATE)) for tasks that must not overwrite an existing .tasks document, treating version conflicts as success.
  • Wire task-specific “create semantics” into TaskManager and enable it for reindex source tasks once relocation handoff begins.
  • Expand unit + integration tests to cover race orderings (destination-first and source-first) and validate stored content/chain behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
server/src/main/java/org/elasticsearch/tasks/TaskResultsService.java Adds storeResultIfAbsent() using CREATE semantics and version-conflict-as-success handling.
server/src/main/java/org/elasticsearch/tasks/TaskManager.java Routes result persistence through new task-level create/overwrite decision.
server/src/main/java/org/elasticsearch/tasks/Task.java Introduces overridable hook to request create-if-absent result storage.
server/src/main/java/org/elasticsearch/index/reindex/BulkByScrollTask.java Tracks relocation handoff initiation and opts into create-if-absent result storage afterwards.
modules/reindex/src/main/java/org/elasticsearch/reindex/Reindexer.java Sets the handoff flag prior to sending resume request; adds assertion around when source-task results are stored.
server/src/main/java/org/elasticsearch/tasks/TaskResult.java Enforces invariant: cannot have both error and response bytes.
server/src/test/java/org/elasticsearch/tasks/TaskResultTests.java Adds a test validating the new TaskResult invariant.
server/src/main/java/org/elasticsearch/index/reindex/ResumeInfo.java Updates documentation around source task result propagation.
modules/reindex/src/test/java/org/elasticsearch/reindex/ReindexerTests.java Adds coverage ensuring relocation handoff flips the create-semantics flag; strengthens source-task-result assertions.
modules/reindex-management/src/test/java/org/elasticsearch/reindex/management/TransportGetReindexActionTests.java Adjusts relocation-chain test setup consistent with updated chain preservation semantics.
modules/reindex-management/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java Adds IT coverage for destination-first/source-first write races; extends test plugin to block/defer .tasks writes and capture sources.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@szybia szybia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

only thing, copilot might be cooking with suggestion to use generic threadpool rather than stranded thread

samxbr and others added 2 commits April 10, 2026 13:09
…o.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Reindex Issues relating to reindex that are not caused by issues further down >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Meta label for distributed team. v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants