Reindex relocation: store source TaskResult at destination node#145488
Reindex relocation: store source TaskResult at destination node#145488samxbr merged 10 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
modules/reindex/src/main/java/org/elasticsearch/reindex/TaskRelocatedException.java
Outdated
Show resolved
Hide resolved
|
just looked at overall approach, and spotted this: #145488 (comment) so asking before proceeding since might mean a rework, unless i'm missing something |
There was a problem hiding this comment.
Pull request overview
This PR improves reindex relocation resilience by ensuring the relocation destination node persists the relocation source task’s result into the .tasks system index, so reindex management APIs can still follow the relocation chain if the source node fails before writing its own result.
Changes:
- Plumbs
TaskResultsServiceinto reindex transport/actions andReindexerso the destination can write the source task result into.tasks. - Extends relocation resume payload (
ResumeInfo) to carry the sourceTaskResultacross to the destination node. - Updates tests to validate that the source task result is transmitted/stored and that management APIs continue to follow relocation chains.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/action/TransportEnrichReindexAction.java | Adds TaskResultsService wiring for enrich’s specialized reindex transport action. |
| server/src/main/java/org/elasticsearch/tasks/TaskResult.java | Adds a helper to produce a new TaskResult with a replaced error payload. |
| server/src/main/java/org/elasticsearch/index/reindex/ResumeInfo.java | Adds sourceTaskResult to relocation resume info and serializes it over the wire. |
| modules/reindex/src/main/java/org/elasticsearch/reindex/TransportReindexAction.java | Injects TaskResultsService and passes it into Reindexer. |
| modules/reindex/src/main/java/org/elasticsearch/reindex/Reindexer.java | Stores the source task result on the destination node before continuing execution; includes source result in resume request. |
| modules/reindex/src/main/java/org/elasticsearch/reindex/TaskRelocatedException.java | Introduces a constructor to populate relocation metadata directly (replaces setter-style API). |
| modules/reindex/src/test/java/org/elasticsearch/reindex/ReindexerTests.java | Adds unit tests for transmitting/storing the source task result and failure handling. |
| modules/reindex-management/src/test/java/org/elasticsearch/reindex/management/TransportGetReindexActionTests.java | Updates tests to use the new TaskRelocatedException constructor. |
| modules/reindex-management/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java | Adds an IT intended to verify destination-side .tasks write preserves the relocation chain. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
modules/reindex/src/main/java/org/elasticsearch/reindex/TaskRelocatedException.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
szybia
left a comment
There was a problem hiding this comment.
nothing blocking, LGTM!
i guess have a scan at comments and see whether you agree, can be done as follow-up PR too
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
...ement/src/test/java/org/elasticsearch/reindex/management/TransportGetReindexActionTests.java
Show resolved
Hide resolved
...t/src/internalClusterTest/java/org/elasticsearch/reindex/management/ReindexRelocationIT.java
Show resolved
Hide resolved
Thanks for the comments. I am inclined to merge this PR first since it's blocking my next PRs, in which I will also address theses comments. |
…cation * upstream/main: Reindex relocation: store source TaskResult at destination node (elastic#145488) Bump versions after 9.2.8 release [CI] DLMFrozenTransitionServiceTests testCheckForFrozenIndicesReturnsEarlyWhenCapacityExhausted failing [elastic#145778] (elastic#145906) Update branches.json for 9.2.8 release ESQL: Clarify inheriting from Attributes (elastic#145898) Bump versions after 9.3.3 release Update branches.json for 9.3.3 release
* upstream/main: Mute org.elasticsearch.xpack.esql.expression.function.aggregate.FirstDocIdGroupingAggregatorFunctionTests testSimple elastic#145923 Reindex relocation: store source TaskResult at destination node (elastic#145488) Bump versions after 9.2.8 release [CI] DLMFrozenTransitionServiceTests testCheckForFrozenIndicesReturnsEarlyWhenCapacityExhausted failing [elastic#145778] (elastic#145906) Update branches.json for 9.2.8 release ESQL: Clarify inheriting from Attributes (elastic#145898) Bump versions after 9.3.3 release Update branches.json for 9.3.3 release Prune changelogs after 8.19.14 release Bump versions after 8.19.14 release Update branches.json for 8.19.14 release [ML] Call old inference API (elastic#145690) ESQL: Unmute CsvIT sumWithOverflowRow (elastic#145893) Index a document when testing runtime fields shadowing dimensions & metrics (elastic#145882) [TEST] Fix version check in testSequenceNumbersDisabled (elastic#145879) [ESQL] Per-file filter pushdown awareness (elastic#145755) Unmute testGetReindexFollowsRelocation (elastic#145841) Correctly ignore system indices when validating dot-prefixed indices (elastic#128868) [Transform] Remove tests for deleted code (elastic#145685) ESQL: Add generative tests for LIMIT BY (elastic#144238)
I merged elastic#145803 but it conflicted with elastic#145488 merged yesterday
Follow up PR #145947 that includes improvements suggested in this PR. |
In order to make reindex management API resilient to failures, the relocation destination node needs to write the relocation source task result into
.tasksindex. This ensures the relocated task can be found by management APIs from following the relocation chain stored in.tasks, if source node has ungracefully failed before writing the task result into.tasks.We also need to ensure the source task does not override the result stored by destination, this will be handled in a separate PR.