Return NOT_PREFERRED decisions in allocation explain#137228
Return NOT_PREFERRED decisions in allocation explain#137228elasticsearchmachine merged 36 commits intoelastic:mainfrom
Conversation
b03f11f to
be58da7
Compare
13322ec to
17de33c
Compare
Adds numerous NOT_PREFERRED options to allocation decision / status types. Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729). Closes ES-12833, ES-13288, ES-12729
17de33c to
4a0ee8a
Compare
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
|
Adding @DaveCTurner as a reviewer in case he wants to take a high level look for anything I'm missing -- I saw in git-blame that he wrote the Explanations.java file a while ago. But optional. |
|
I think I might be missing test coverage (if not functionality) of the allocate unassigned code path. However, I've been hacking on the rebalancing code path for a while now to get this all to work sensibly, so if agreeable I might just file another ticket to explore that. |
.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDeciderIT.java
Outdated
Show resolved
Hide resolved
.../java/org/elasticsearch/action/admin/cluster/allocation/ClusterAllocationExplainRequest.java
Outdated
Show resolved
Hide resolved
| } | ||
| if (allocationDecision.type().higherThan(bestDecision)) { | ||
| assert allocationDecision.type() != Type.THROTTLE | ||
| : "DesiredBalance computations run in a simulation mode and should not encounter throttling"; |
There was a problem hiding this comment.
Sorry, how do we know we're simulating here?
There was a problem hiding this comment.
I added this as extra protection because I changed the Decision.Type enum ordering, such that the higherThan gate above would now prefer (return true for) THROTTLE over NOT_PREFERRED.
Because this is simulation, we should never see THROTTLE.
There was a problem hiding this comment.
I updated this. Tripped an explain test, because explain uses the decideMove path, too.
I moved the !Throttle check higher up next to the canAllocate call, and added || !(IsSimulating)
| assert decider.canRebalance(shardRouting, allocation).type() != Decision.Type.THROTTLE | ||
| : decider.getClass().getSimpleName() + " throttled unexpectedly in canRebalance"; | ||
| return decider.canRebalance(shardRouting, allocation); | ||
| }, (decider, decision) -> Strings.format("Can not rebalance [%s]. [%s]: %s", shardRouting, decider, decision)); |
There was a problem hiding this comment.
ConcurrentRebalanceAllocationDecider appears to return THROTTLE from canRebalance ?
Nit: I'd prefer if we stored the result of canRebalance then ran the assertion after it rather than running canRebalance twice?
There was a problem hiding this comment.
You're right, reverted 👍
| NOT_PREFERRED, | ||
| // Temporarily throttled is a better choice than choosing a not-preferred node, | ||
| // but NOT_PREFERRED and THROTTLED are generally not comparable. | ||
| THROTTLE, |
There was a problem hiding this comment.
I don't think we can do this can we? It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED, which I think is bad for non-desired-balance allocation, because there we should wait for a THROTTLE to become a yes before allocating to a NOT_PREFERRED?
There was a problem hiding this comment.
I don't believe there's currently any direct comparison between THROTTLE and NOT_PREFERRED. But we still need to specify an ordering here, to not break how one or the other gets compared to YES and NO.
It'll mean in the presence of a THROTTLE and a NOT_PREFERRED, canAllocate will return NOT_PREFERRED
I think this would be the other way around, THROTTLE would be higher value, closer to YES, and chosen over NOT_PREFERRED. So, IIUC, we both want it the current way.
I've a little blurb for the commit message:
"A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared."
There was a problem hiding this comment.
Ah yeah I think it all makes sense now.
Though in reconciler in allocateUnassigned we still don't handle NOT_PREFERRED but there's a ticket to fix that, and I don't think it's important.
...r/src/main/java/org/elasticsearch/cluster/routing/allocation/AllocateUnassignedDecision.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/routing/allocation/Explanations.java
Show resolved
Hide resolved
| } | ||
| // TODO (ES-13482): clusterRebalanceDecision is set to the result of AllocationDecider#canRebalance, which does not return | ||
| // NOT_PREFERRED or THROTTLE. This switch statement, and how MoveDecision uses clusterRebalanceDecision, should be | ||
| // refactored. |
There was a problem hiding this comment.
I think it still does? see org/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java:182
There was a problem hiding this comment.
Looks like Simon committed some changes concurrently with my PR.
But I think I was wrong, anyway: I didn't realize the complexities of canRebalance.
Removed 👍 Thanks!
DiannaHohensee
left a comment
There was a problem hiding this comment.
Made a logic change / bug fix, per my comment. Your assertion suggestion got me rethinking.
server/src/main/java/org/elasticsearch/cluster/routing/allocation/MoveDecision.java
Show resolved
Hide resolved
| } else if (bestDecision == Type.NOT_PREFERRED) { | ||
| assert remainDecision.type() != Type.NOT_PREFERRED; | ||
| assert remainDecision.type() != Type.NOT_PREFERRED || allocation.isSimulating() == false; | ||
| // If we don't ever find a YES decision, we'll settle for NOT_PREFERRED as preferable to NO. |
There was a problem hiding this comment.
| // If we don't ever find a YES decision, we'll settle for NOT_PREFERRED as preferable to NO. | |
| // If we don't ever find a YES/THROTTLE decision, we'll settle for NOT_PREFERRED as preferable to NO. |
| } | ||
| } else if (bestDecision == Type.NOT_PREFERRED) { | ||
| assert remainDecision.type() != Type.NOT_PREFERRED; | ||
| assert remainDecision.type() != Type.NOT_PREFERRED || allocation.isSimulating() == false; |
There was a problem hiding this comment.
This assertion is a bit confusing
assert remainDecision.type() != Type.NOT_PREFERRED || allocation.isSimulating() == false;
Do we need that isSimulating() == false clause? It looks like we should never get here when
allocationDecision.type() == Type.NOT_PREFERRED && remainDecision.type() == Type.NOT_PREFERRED
Or have I missed something?
There was a problem hiding this comment.
You're right, that scenario is short circuited on L1049.
I forgot and got !isSimulating-happy after adding throttle below for the explain path.
nicktindall
left a comment
There was a problem hiding this comment.
Changes LGTM, a couple of questions
Adds numerous NOT_PREFERRED options to allocation decision / status types.
Adds NOT_PREFERRED option to AllocationDecision (resolving ES-12729).
A significant change is to re-order comparison of Decision.Type enum
values, such that THROTTLE is chosen over NOT_PREFERRED. Functionally
this change should not matter because simulation (DesiredBalance
computation) does not throttle and reconciliation (real shard movement)
treats not-preferred essentially as a YES: they are not compared.
Closes ES-12833, ES-13288, ES-12729