Add helper to pick node for reindex relocation#139081
Add helper to pick node for reindex relocation#139081PeteGillinElastic merged 8 commits intoelastic:mainfrom
Conversation
This adds a helper class which can be used to pick a node to relocate a reindex task to on shutdown. The plugin exposes an implementation of the `ReindexRelocationNodePicker` interface as a component, so that in a future change it can be injected into the transport action, and from there passed into teh `Reindexer`, and from there used to actually relocate the task on shutdown. By default, it uses the `DefaultReindexRelocationNodePicker`. Another plugin may override this by extending the `reindex` plugin and making the implementation available via SPI.
24788c9 to
f86685f
Compare
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
| ); | ||
| return null; | ||
| } | ||
| List<String> eligibleDedicatedCoordinatingNodes = nodes.getCoordinatingOnlyNodes() |
There was a problem hiding this comment.
Question for @Tim-Brooks: Is this the correct way to find what the doc describes as "dedicated coordinating nodes"? The name sounds right, but the implementation just filters out data, master, and ingest nodes — so nodes with roles like ml or voting_only would be included.
There was a problem hiding this comment.
IMO - it is likely the case that this logic is bugged. A dedicated coordinating node should have no roles as outlined by the documentation: https://www.elastic.co/docs/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles#coordinating-only-node-role.
Looking at this logic I don't think you're introducing that issue. But you are changing this code (nodes.getCoordinatingOnlyNodes ) to be used in production opposed to just tests. So either I think the logic needs to be fixed here or you need to write your own function.
There was a problem hiding this comment.
I don't think it's actually 100% true that DiscoveryNodes.getCoordinatingOnlyNodes() is used only in tests now. I think there are ways it is exposed to the user, although the only ways I've found are somewhat obscure. I created #139587 for this. I agree with you that, on the face of it, this is a bug which we should fix. But since that would be a behavioural change, I think it deserves its own issue, so that we can make sure we're aligned on it, and so that it can go in the release notes.
In the meantime, I simply won't use the method in this PR.
| logger.debug("Chose dedicated coordinating node ID {} for relocating a reindex task from node {}", newNodeId, currentNodeId); | ||
| return newNodeId; | ||
| } | ||
| List<String> eligibleDataNodes = nodes.getDataNodes().keySet().stream().filter(id -> id.equals(currentNodeId) == false).toList(); |
There was a problem hiding this comment.
Question for @Tim-Brooks: The doc says "a valid node for index coordinating (data node, no dedicated master nodes)". Is this the correct interpretation of that? (I wasn't quite sure what "no dedicated master nodes" meant, as my understanding was that would mean nodes with only the master role, which therefore could never be a data node.)
There was a problem hiding this comment.
Yes. This looks fine to me (can contain data excluded dedicated master nodes).
Tim-Brooks
left a comment
There was a problem hiding this comment.
All the plugin infrastructure here looks good. But I do think we need to use the proper definition of coordinating only node which is nodes without any assigned roles.
| logger.debug("Chose dedicated coordinating node ID {} for relocating a reindex task from node {}", newNodeId, currentNodeId); | ||
| return newNodeId; | ||
| } | ||
| List<String> eligibleDataNodes = nodes.getDataNodes().keySet().stream().filter(id -> id.equals(currentNodeId) == false).toList(); |
There was a problem hiding this comment.
Yes. This looks fine to me (can contain data excluded dedicated master nodes).
| ); | ||
| return null; | ||
| } | ||
| List<String> eligibleDedicatedCoordinatingNodes = nodes.getCoordinatingOnlyNodes() |
There was a problem hiding this comment.
IMO - it is likely the case that this logic is bugged. A dedicated coordinating node should have no roles as outlined by the documentation: https://www.elastic.co/docs/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles#coordinating-only-node-role.
Looking at this logic I don't think you're introducing that issue. But you are changing this code (nodes.getCoordinatingOnlyNodes ) to be used in production opposed to just tests. So either I think the logic needs to be fixed here or you need to write your own function.
PeteGillinElastic
left a comment
There was a problem hiding this comment.
Thanks, PTAL.
| ); | ||
| return null; | ||
| } | ||
| List<String> eligibleDedicatedCoordinatingNodes = nodes.getCoordinatingOnlyNodes() |
There was a problem hiding this comment.
I don't think it's actually 100% true that DiscoveryNodes.getCoordinatingOnlyNodes() is used only in tests now. I think there are ways it is exposed to the user, although the only ways I've found are somewhat obscure. I created #139587 for this. I agree with you that, on the face of it, this is a bug which we should fix. But since that would be a behavioural change, I think it deserves its own issue, so that we can make sure we're aligned on it, and so that it can go in the release notes.
In the meantime, I simply won't use the method in this PR.
Just in case, repeating this reply which I left but might have been stranded 'cos the code it was attached to went away:
|
* upstream/main: (253 commits) Adds ST_SIMPLIFY geo spatial function (elastic#136309) Take control of max clause count verification in Lucene searcher (elastic#139752) [ML] Unmute Inference Test (elastic#139765) Parameterize the vector operation benchmark tests (elastic#139735) Fix node reduction pushdown tests for release tests (elastic#139548) Fix flakiness in TSDataGenerationHelper (elastic#139759) CPS: Copy existing resolved index expressions when constructing a new `SearchRequest` from an existing one (elastic#139596) Add release notes for v9.1.9 release (elastic#139674) Add lucene query for wildcards on high cardinality keyword fields. (elastic#139746) Suppress Tika entitlement warnings from AWT (elastic#139711) Check field storage when synthetic source is enabled, in tests (elastic#139715) Refactor VectorSimilarityType to know about its corresponding Function (elastic#139678) Merge fixes from patch branch back into main (elastic#139721) Define native bulk operations for vector square distance (elastic#139198) Use LongUpDownCounter for Linked Project Error Metrics (elastic#139657) ESQL: Add javadoc that explains version-aware planning (elastic#139706) Add helper to pick node for reindex relocation (elastic#139081) Fix auth serialization randomized version test (elastic#139182) ES|QL - Add parsing, preanalysis and analysis timing information to profile (elastic#139540) Mute org.elasticsearch.persistent.ClusterPersistentTasksCustomMetadataTests testMinVersionSerialization elastic#139741 ...
This adds an implmentation of the helper which picks a node to relocate a reindex task to on shutdown to be used in stateless. This changes the mechanism for choosing the correct implementation that was added in elastic#139081. The previous mechanism assumed that we'd want *serverless* to behave differently, and so it defined a default implementation and allowed for that to be swapped out via another plugin. But we actually want *stateless* to behave differently, so we can take the simpler approach of doing everything inside the `reindex` plugin and just pick the correct implementation based on settings. The implementation added previously has been renamed, because it is now explicitly used in stateful rather than just being used by default.
This adds an implmentation of the helper which picks a node to relocate a reindex task to on shutdown to be used in stateless. This changes the mechanism for choosing the correct implementation that was added in elastic#139081. The previous mechanism assumed that we'd want *serverless* to behave differently, and so it defined a default implementation and allowed for that to be swapped out via another plugin. But we actually want *stateless* to behave differently, so we can take the simpler approach of doing everything inside the `reindex` plugin and just pick the correct implementation based on settings. The implementation added previously has been renamed, because it is now explicitly used in stateful rather than just being used by default.
This adds an implmentation of the helper which picks a node to relocate a reindex task to on shutdown to be used in stateless. This changes the mechanism for choosing the correct implementation that was added in elastic#139081. The previous mechanism assumed that we'd want *serverless* to behave differently, and so it defined a default implementation and allowed for that to be swapped out via another plugin. But we actually want *stateless* to behave differently, so we can take the simpler approach of doing everything inside the `reindex` plugin and just pick the correct implementation based on settings. The implementation added previously has been renamed, because it is now explicitly used in stateful rather than just being used by default.
This adds an implmentation of the helper which picks a node to relocate a reindex task to on shutdown to be used in stateless. This changes the mechanism for choosing the correct implementation that was added in #139081. The previous mechanism assumed that we'd want *serverless* to behave differently, and so it defined a default implementation and allowed for that to be swapped out via another plugin. But we actually want *stateless* to behave differently, so we can take the simpler approach of doing everything inside the `reindex` plugin and just pick the correct implementation based on settings. The implementation added previously has been renamed, because it is now explicitly used in stateful rather than just being used by default. Co-authored-by: Szymon Bialkowski <szybia@tuta.io> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This adds an implmentation of the helper which picks a node to relocate a reindex task to on shutdown to be used in stateless. This changes the mechanism for choosing the correct implementation that was added in elastic#139081. The previous mechanism assumed that we'd want *serverless* to behave differently, and so it defined a default implementation and allowed for that to be swapped out via another plugin. But we actually want *stateless* to behave differently, so we can take the simpler approach of doing everything inside the `reindex` plugin and just pick the correct implementation based on settings. The implementation added previously has been renamed, because it is now explicitly used in stateful rather than just being used by default. Co-authored-by: Szymon Bialkowski <szybia@tuta.io> Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
This adds a helper class which can be used to pick a node to relocate a reindex task to on shutdown.
The plugin exposes an implementation of the
ReindexRelocationNodePickerinterface as a component, so that in a future change it can be injected into the transport action, and from there passed into theReindexer, and from there used to actually relocate the task on shutdown.By default, it uses the
DefaultReindexRelocationNodePicker. Another plugin may override this by extending thereindexplugin and making the implementation available via SPI.Closes elastic/elasticsearch-team#2092