Skip to content

Unpromotables skip replication and peer recovery#93210

Merged
kingherc merged 25 commits intoelastic:mainfrom
kingherc:feature/promotable-skip-recovery-replication
Jan 31, 2023
Merged

Unpromotables skip replication and peer recovery#93210
kingherc merged 25 commits intoelastic:mainfrom
kingherc:feature/promotable-skip-recovery-replication

Conversation

@kingherc
Copy link
Contributor

For skipping replication:

  • ReplicationTracker and Group filter promotable
  • Remove from in sync allocations in metadata

Fixes ES-4861

For skipping peer recovery:

  • Shards pass directly to STARTED skipping intermediate peer recovery stages and messages

Fixes ES-5257

For skipping replication:
* ReplicationTracker and Group filter promotable
* Remove from in sync allocations in metadata

Fixes ES-4861

For skipping peer recovery:
* Shards pass directly to STARTED skipping
  intermediate peer recovery stages and messages

Fixes ES-5257
@kingherc kingherc added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team. v8.7.0 labels Jan 24, 2023
@kingherc kingherc self-assigned this Jan 24, 2023
@kingherc kingherc requested a review from DaveCTurner January 24, 2023 16:53
@elasticsearchmachine
Copy link
Collaborator

Hi @kingherc, I've created a changelog YAML for you.

@kingherc
Copy link
Contributor Author

Oh BTW, the source branch should read feature/unpromotable-skip-recovery-replication but I'll leave it as is.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, pretty much how I hoped. I left some comments but nothing structural. I suspect this breaks the refresh API tho, ideally we'd address that here (or beforehand) too.

@kingherc kingherc requested review from pxsalehi and removed request for original-brownbear January 25, 2023 13:21
@kingherc
Copy link
Contributor Author

Did not handle yet the feedback, apart from the comment about Refresh. Just pushed a commit that re-enables Refresh for unpromotable shards. Feel free to review that if you'd like as well.

I will continue to handle the remaining feedback, and then invite you to review finally.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a handful of comments, mostly tiny stuff.

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Iraklis. Just left some minor comments/questions.

@pxsalehi
Copy link
Member

pxsalehi commented Jan 27, 2023

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

@kingherc
Copy link
Contributor Author

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

Oh thanks! Totally unaware of this. I do see indices:admin/refresh, and indices:admin/refresh[s] there but no mention of
[r]. I wonder if this is for client-driven actions rather than internally-generated ones (like the refresh [r] and [u] ones)?

@DaveCTurner might you know if I should add both [r] and [u] above?

@DaveCTurner
Copy link
Contributor

I believe that registering these actions is only necessary for actions that are exposed to a Client, which I think means those with a corresponding ActionType registered in ActionModule#setupActions. These subsidiary actions are not registered here, it doesn't make sense for a client to call them directly, so they don't need to be defined as operator-only or not.

@DaveCTurner
Copy link
Contributor

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

final Set<String> redundant = Sets.difference(labelledActions, allActions);
assertTrue(
"Actions may no longer be valid: ["
+ redundant
+ "]. They should be removed from either the operator-only action registry in ["
+ OperatorOnlyRegistry.class.getName()
+ "] or the non-operator action list in ["
+ Constants.class.getName()
+ "]",
redundant.isEmpty()

@pxsalehi
Copy link
Member

pxsalehi commented Jan 27, 2023

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

I'm not sure, I understand this. But I think, if the new internal action is not in NON_OPERATOR_ACTIONS, testEveryActionIsEitherOperatorOnlyOrNonOperator fails before it reaches the part you quoted.

@DaveCTurner
Copy link
Contributor

Probably simplest to see what happens if you try it:

./gradlew ':x-pack:plugin:security:qa:operator-privileges-tests:javaRestTest' --tests "org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT.testEveryActionIsEitherOperatorOnlyOrNonOperator"

Copy link
Contributor Author

@kingherc kingherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi all, I handled your comments and this PR is now ready for review. Please feel free to review and also the previous unresolved conversations.

shardRoutings.addAll(routingTable.assignedShards()); // include relocation targets

// include relocation targets
shardRoutings.addAll(routingTable.assignedShards().stream().filter(ShardRouting::isPromotableToPrimary).toList());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just pull out the relocation targets in that loop instead of adding all the shards again.

I now realize that iterating the shards above already contains the relocation targets, right? I am not even sure why the old code added again the assigned shards for relocation targets.

So we do not need this line at all in the end, I will simply remove it. Please tell me if I'm assuming wrong here.

@kingherc kingherc marked this pull request as ready for review January 27, 2023 17:48
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, there's just one more comment (an easy trap to fall into) and a few more nits.

@kingherc
Copy link
Contributor Author

Hi all, I handled your remaining feedback. Feel free to check latest commits. It'd be great if you could review today if possible! Thanks

@kingherc kingherc requested review from DaveCTurner and tlrx January 31, 2023 07:54
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kingherc kingherc merged commit cb966ef into elastic:main Jan 31, 2023
@kingherc kingherc deleted the feature/promotable-skip-recovery-replication branch January 31, 2023 09:31
mark-vieira pushed a commit to mark-vieira/elasticsearch that referenced this pull request Jan 31, 2023
For skipping replication:
* ReplicationTracker and Group filter shards that are promotable to primary
* Remove unpromotable shards from in sync allocations in metadata
* There is a new Refresh action for unpromotable replica shards

Fixes ES-4861

For skipping peer recovery:
* Unpromotable shards pass directly to STARTED skipping some intermediate peer recovery stages and messages

Fixes ES-5257
.stream()
.filter(Predicate.not(ShardRouting::isPromotableToPrimary))
.map(ShardRouting::currentNodeId)
.collect(Collectors.toUnmodifiableSet())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this intermediate collection? I think it could be replaced with distinct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we dropped this. We don't even need distinct(), we're looking at node IDs for the copies of a single shard which are necessarily distinct anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we dropped it from another place, but not here. I can handle this in another PR.

breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
This RecoveryPlannerService was a workaround to allow 
search shards to bootstrap from peers; it is now unused 
since peer recovery is fully skipped for search shards 
(elastic#93210).

Relates
breskeby pushed a commit to breskeby/elasticsearch that referenced this pull request Feb 11, 2026
…erations (elastic#149)

Since elastic#93210 no operations 
should be replicated to search shards. I think we 
can make stronger assertions in the SearchEngine 
if an index/delete/noop operation is executed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Distributed Meta label for distributed team. v8.7.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants