Allow relocation to NOT_PREFERRED node for evacuating shards#140197
Allow relocation to NOT_PREFERRED node for evacuating shards#140197nicktindall merged 8 commits intoelastic:mainfrom
Conversation
DiannaHohensee
left a comment
There was a problem hiding this comment.
didn't look at test, but I think we should fix the other call-site?
| final var shardRouting = storedShardMovement.shardRouting(); | ||
| final var index = projectIndex(shardRouting); | ||
| final var moveDecision = refreshDecisionIfRequired(index, storedShardMovement, shardMoved); | ||
| if (moveDecision.isDecisionTaken() && moveDecision.cannotRemainAndCanMove()) { |
There was a problem hiding this comment.
We've got to change this, too, I believe?
refreshDecisionIfRequired seems to wind down to the same decideMove method, and even if not-preferred didn't happen now, it could easily happen in future.
|
|
||
| if (moveDecision.isDecisionTaken() && moveDecision.cannotRemainAndCanMove()) { | ||
| if (moveDecision.isDecisionTaken() | ||
| && (moveDecision.cannotRemainAndCanMove() || moveDecision.cannotRemainAndNotPreferredMove())) { |
There was a problem hiding this comment.
cannotRemainAndCanMove caught my eye a while ago, and I wanted to make it something like cannotRemainAndCanMoveYes. However, now, I wonder if we should just keep cannotRemainAndCanMove and implement that single method as
cannotRemain() && (canMoveDecision == AllocationDecision.YES || canMoveDecision == AllocationDecision.NOT_PREFERRED); ?
Both production callers of cannotRemainAndCanMove now need to check both, I think.
There was a problem hiding this comment.
Yeah there will need to be some consideration of how to do this because it's used in other places too. I'm not sure what the tidiest approach is
There was a problem hiding this comment.
These two callers in BalancedShardsAllocator, MoveDecision printing. So it's pretty contained.
My preference would be to combine both under cannotRemainAndCanMove, or rename cannotRemainAndCanMove to include '-Yes-' someplace -- so it's too obvious to make this mistake again. But not a commit blocker to me if you strongly prefer something else.
There was a problem hiding this comment.
Yep good call I think it makes more sense that way too. Changed in cb607f1
DiannaHohensee
left a comment
There was a problem hiding this comment.
testing looks fine with a couple superficial nits.
...down/src/internalClusterTest/java/org/elasticsearch/xpack/shutdown/ShutdownEvacuationIT.java
Outdated
Show resolved
Hide resolved
...down/src/internalClusterTest/java/org/elasticsearch/xpack/shutdown/ShutdownEvacuationIT.java
Outdated
Show resolved
Hide resolved
DiannaHohensee
left a comment
There was a problem hiding this comment.
Approving, straightforward feedback. Pls fix the commit title before merging :)
|
Hi @nicktindall, I've created a changelog YAML for you. |
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
ywangd
left a comment
There was a problem hiding this comment.
LGTM
Nit: Maybe the PR title can more directly say something like "allow relocation to NOT_PREFERRED node for evacuating shards"?
|
@elasticsearchmachine test this please |
💚 Backport successful
|
A change in #137228 meant we sometimes return
canMove=NOT_PREFERREDfromBalancedShardsAllocator#decideMove, but we still would only execute a move whencanMove=YES, this meant when we were unable to relocate shards from a shutting down node to a not-preferred allocation.This PR fixes the logic to execute the move when
canMoveisYESorNOT_PREFERRED