Unpromotables skip replication and peer recovery by kingherc · Pull Request #93210 · elastic/elasticsearch

kingherc · 2023-01-24T16:53:42Z

For skipping replication:

ReplicationTracker and Group filter promotable
Remove from in sync allocations in metadata

Fixes ES-4861

For skipping peer recovery:

Shards pass directly to STARTED skipping intermediate peer recovery stages and messages

Fixes ES-5257

For skipping replication: * ReplicationTracker and Group filter promotable * Remove from in sync allocations in metadata Fixes ES-4861 For skipping peer recovery: * Shards pass directly to STARTED skipping intermediate peer recovery stages and messages Fixes ES-5257

elasticsearchmachine · 2023-01-24T16:54:07Z

Hi @kingherc, I've created a changelog YAML for you.

kingherc · 2023-01-25T08:54:04Z

Oh BTW, the source branch should read feature/unpromotable-skip-recovery-replication but I'll leave it as is.

DaveCTurner

Looks great, pretty much how I hoped. I left some comments but nothing structural. I suspect this breaks the refresh API tho, ideally we'd address that here (or beforehand) too.

server/src/internalClusterTest/java/org/elasticsearch/cluster/routing/ShardRoutingRoleIT.java

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

…very-replication

kingherc · 2023-01-26T18:00:34Z

Did not handle yet the feedback, apart from the comment about Refresh. Just pushed a commit that re-enables Refresh for unpromotable shards. Feel free to review that if you'd like as well.

I will continue to handle the remaining feedback, and then invite you to review finally.

DaveCTurner

I left a handful of comments, mostly tiny stuff.

...rc/main/java/org/elasticsearch/action/admin/indices/refresh/TransportShardRefreshAction.java

...rg/elasticsearch/action/admin/indices/refresh/TransportUnpromotableReplicaRefreshAction.java

pxsalehi

Thanks Iraklis. Just left some minor comments/questions.

server/src/main/java/org/elasticsearch/action/ActionModule.java

...rc/main/java/org/elasticsearch/action/admin/indices/refresh/TransportShardRefreshAction.java

...rg/elasticsearch/action/admin/indices/refresh/TransportUnpromotableReplicaRefreshAction.java

server/src/internalClusterTest/java/org/elasticsearch/cluster/routing/ShardRoutingRoleIT.java

…very-replication

pxsalehi · 2023-01-27T11:41:17Z

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

kingherc · 2023-01-27T11:45:08Z

Regarding adding a new action: you'd need to add the new action to NON_OPERATOR_ACTIONS or OperatorOnlyRegistry. I think probably the former, since this is just an internal action.

Oh thanks! Totally unaware of this. I do see indices:admin/refresh, and indices:admin/refresh[s] there but no mention of
[r]. I wonder if this is for client-driven actions rather than internally-generated ones (like the refresh [r] and [u] ones)?

@DaveCTurner might you know if I should add both [r] and [u] above?

DaveCTurner · 2023-01-27T11:52:00Z

I believe that registering these actions is only necessary for actions that are exposed to a Client, which I think means those with a corresponding ActionType registered in ActionModule#setupActions. These subsidiary actions are not registered here, it doesn't make sense for a client to call them directly, so they don't need to be defined as operator-only or not.

DaveCTurner · 2023-01-27T11:54:41Z

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

elasticsearch/x-pack/plugin/security/qa/operator-privileges-tests/src/javaRestTest/java/org/elasticsearch/xpack/security/operator/OperatorPrivilegesIT.java

Lines 117 to 126 in 52e2e37

    
           final Set<String> redundant = Sets.difference(labelledActions, allActions); 
        
           assertTrue( 
        
               "Actions may no longer be valid: [" 
        
                   + redundant 
        
                   + "]. They should be removed from either the operator-only action registry in [" 
        
                   + OperatorOnlyRegistry.class.getName() 
        
                   + "] or the non-operator action list in [" 
        
                   + Constants.class.getName() 
        
                   + "]", 
        
               redundant.isEmpty()

pxsalehi · 2023-01-27T12:10:07Z

In fact, I think registering these transport actions is not just unnecessary, it's actively forbidden:

I'm not sure, I understand this. But I think, if the new internal action is not in NON_OPERATOR_ACTIONS, testEveryActionIsEitherOperatorOnlyOrNonOperator fails before it reaches the part you quoted.

DaveCTurner · 2023-01-27T12:20:33Z

Probably simplest to see what happens if you try it:

./gradlew ':x-pack:plugin:security:qa:operator-privileges-tests:javaRestTest' --tests "org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT.testEveryActionIsEitherOperatorOnlyOrNonOperator"

kingherc

Hi all, I handled your comments and this PR is now ready for review. Please feel free to review and also the previous unresolved conversations.

server/src/internalClusterTest/java/org/elasticsearch/cluster/routing/ShardRoutingRoleIT.java

kingherc · 2023-01-27T15:46:36Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

-            shardRoutings.addAll(routingTable.assignedShards()); // include relocation targets
+
+            // include relocation targets
+            shardRoutings.addAll(routingTable.assignedShards().stream().filter(ShardRouting::isPromotableToPrimary).toList());


let's just pull out the relocation targets in that loop instead of adding all the shards again.

I now realize that iterating the shards above already contains the relocation targets, right? I am not even sure why the old code added again the assigned shards for relocation targets.

So we do not need this line at all in the end, I will simply remove it. Please tell me if I'm assuming wrong here.

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

server/src/main/java/org/elasticsearch/indices/recovery/PeerRecoveryTargetService.java

elasticsearchmachine · 2023-01-27T17:48:59Z

Pinging @elastic/es-distributed (Team:Distributed)

…very-replication

DaveCTurner

Looks great, there's just one more comment (an easy trap to fall into) and a few more nits.

server/src/internalClusterTest/java/org/elasticsearch/cluster/routing/ShardRoutingRoleIT.java

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

…very-replication

kingherc · 2023-01-31T07:54:17Z

Hi all, I handled your remaining feedback. Feel free to check latest commits. It'd be great if you could review today if possible! Thanks

DaveCTurner

LGTM

For skipping replication: * ReplicationTracker and Group filter shards that are promotable to primary * Remove unpromotable shards from in sync allocations in metadata * There is a new Refresh action for unpromotable replica shards Fixes ES-4861 For skipping peer recovery: * Unpromotable shards pass directly to STARTED skipping some intermediate peer recovery stages and messages Fixes ES-5257

idegtiarenko · 2023-02-01T08:03:25Z

...rc/main/java/org/elasticsearch/action/admin/indices/refresh/TransportShardRefreshAction.java

+                .stream()
+                .filter(Predicate.not(ShardRouting::isPromotableToPrimary))
+                .map(ShardRouting::currentNodeId)
+                .collect(Collectors.toUnmodifiableSet())


Do we need this intermediate collection? I think it could be replaced with distinct

I thought we dropped this. We don't even need distinct(), we're looking at node IDs for the copies of a single shard which are necessarily distinct anyway.

I think we dropped it from another place, but not here. I can handle this in another PR.

This RecoveryPlannerService was a workaround to allow search shards to bootstrap from peers; it is now unused since peer recovery is fully skipped for search shards (elastic#93210). Relates

…erations (elastic#149) Since elastic#93210 no operations should be replicated to search shards. I think we can make stronger assertions in the SearchEngine if an index/delete/noop operation is executed.

kingherc added >enhancement :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team. v8.7.0 labels Jan 24, 2023

kingherc self-assigned this Jan 24, 2023

kingherc requested a review from DaveCTurner January 24, 2023 16:53

Update docs/changelog/93210.yaml

2188af0

kingherc requested review from original-brownbear and tlrx January 24, 2023 16:54

DaveCTurner reviewed Jan 25, 2023

View reviewed changes

kingherc requested review from pxsalehi and removed request for original-brownbear January 25, 2023 13:21

DaveCTurner mentioned this pull request Jan 26, 2023

Do not verify unpromotable shards #93266

Draft

kingherc added 2 commits January 26, 2023 19:56

New Refresh action for unpromotable replicas

4a2f8b3

Merge remote-tracking branch 'main' into feature/promotable-skip-reco…

04928d2

…very-replication

DaveCTurner reviewed Jan 26, 2023

View reviewed changes

pxsalehi reviewed Jan 27, 2023

View reviewed changes

kingherc added 3 commits January 27, 2023 13:05

Reviewer comments on refresh

06fb59b

Merge remote-tracking branch 'main' into feature/promotable-skip-reco…

7ce55fb

…very-replication

Fixes

25d50cb

kingherc added 3 commits January 27, 2023 18:15

Small cleanup

9da083b

Small cleanup

f02d527

Only one small refresh

5b1a0d1

kingherc commented Jan 27, 2023

View reviewed changes

kingherc marked this pull request as ready for review January 27, 2023 17:48

kingherc requested review from DaveCTurner and pxsalehi January 27, 2023 17:48

kingherc added 4 commits January 27, 2023 19:52

Fix checkstyle

cefcd23

Fix test with promotable shards only

df74f9e

Better split in PeerRecoveryTargetService

7fae710

Merge remote-tracking branch 'main' into feature/promotable-skip-reco…

8f81146

…very-replication

DaveCTurner reviewed Jan 28, 2023

View reviewed changes

DaveCTurner reviewed Jan 30, 2023

View reviewed changes

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java Outdated Show resolved Hide resolved

tlrx reviewed Jan 30, 2023

View reviewed changes

.../org/elasticsearch/action/admin/indices/refresh/TransportUnpromotableShardRefreshAction.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Outdated Show resolved Hide resolved

kingherc added 4 commits January 31, 2023 08:56

Test with more than 1 unpromotable nodes

515f7e3

Simplify useRetentionLeasesInPeerRecovery

b8bf26c

Merge remote-tracking branch 'main' into feature/promotable-skip-reco…

4b74661

…very-replication

Adopt new ActionListener.run

fb6a63c

kingherc requested review from DaveCTurner and tlrx January 31, 2023 07:54

DaveCTurner approved these changes Jan 31, 2023

View reviewed changes

kingherc merged commit cb966ef into elastic:main Jan 31, 2023

kingherc deleted the feature/promotable-skip-recovery-replication branch January 31, 2023 09:31

idegtiarenko reviewed Feb 1, 2023

View reviewed changes

Conversation

kingherc commented Jan 24, 2023

Uh oh!

elasticsearchmachine commented Jan 24, 2023

Uh oh!

kingherc commented Jan 25, 2023

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingherc commented Jan 26, 2023

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pxsalehi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pxsalehi commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kingherc commented Jan 27, 2023

Uh oh!

DaveCTurner commented Jan 27, 2023

Uh oh!

DaveCTurner commented Jan 27, 2023

Uh oh!

pxsalehi commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DaveCTurner commented Jan 27, 2023

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kingherc Jan 27, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Jan 27, 2023

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kingherc commented Jan 31, 2023

Uh oh!

DaveCTurner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Feb 1, 2023

pxsalehi commented Jan 27, 2023 •

edited

Loading

pxsalehi commented Jan 27, 2023 •

edited

Loading

DaveCTurner left a comment •

edited

Loading