Skip to content

Overall Decision for Deciders prioritizes THROTTLE#140237

Merged
nicktindall merged 12 commits intoelastic:mainfrom
DiannaHohensee:2026/01/06/return-original-type-ordering
Jan 7, 2026
Merged

Overall Decision for Deciders prioritizes THROTTLE#140237
nicktindall merged 12 commits intoelastic:mainfrom
DiannaHohensee:2026/01/06/return-original-type-ordering

Conversation

@DiannaHohensee
Copy link
Contributor

Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

Relates ES-13903

Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

Relates ES-13903
@DiannaHohensee DiannaHohensee self-assigned this Jan 6, 2026
@elasticsearchmachine elasticsearchmachine added v9.4.0 needs:triage Requires assignment of a team area label labels Jan 6, 2026
@DiannaHohensee DiannaHohensee added >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. and removed needs:triage Requires assignment of a team area label labels Jan 6, 2026
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

@elasticsearchmachine
Copy link
Collaborator

Hi @DiannaHohensee, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jan 6, 2026
@DiannaHohensee
Copy link
Contributor Author

Henning pointed out that I've missed the Type#min comparison usage -- so I'll add a fix for that momentarily. I should add test coverage, too.

…ator/reconciler, as well as unit test the method directly.
);
}

public static class NotPreferredPlugin extends Plugin implements ClusterPlugin {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: seeing as this does throttle and not preferred, perhaps the name should be changed to reflect that?

if (newDecision.type().isWorseForTheSameNode(worstDecision.type())) {
worstDecision = newDecision;
if (worstDecision.type() == Decision.Type.NO) {
traceNoDecisions(decider, newDecision, logMessageCreator);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of keeping the change to a minimum, can we leave the variable naming as-is? It just makes it easier to see what changed that way IMO. I don't think there is a signficant difference in readability between "worst" and "mostNegative", or "decision" and "newDecision" to warrant the additional delta (thinking about people doing archaeology later)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0434558

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more in 3da6822

public static Type min(Type a, Type b) {
return a.compareTo(b) < 0 ? a : b;
return a.isWorseForTheSameNode(b) ? a : b;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should get rid of min altogether, higherThan and min suggest there is a natural order, when clearly there isn't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0d30b5c

case YES -> {
yield true;
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we declare these orders like

sameNodeOrder = [NO, THROTTLE, NOT_PREFERRED, YES]

and use the indices to do the comparison?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted in 76c2b99

I understand this might be considered slightly harder to read, but it does follow the Comparable#compareTo convention so should be something Java people are familiar with. I think I prefer it to isWorseThan... and isBetterThan... but we could always add them as convenience methods if we think that makes it nicer?

@nicktindall nicktindall requested a review from ywangd January 7, 2026 05:48
@nicktindall nicktindall added the auto-backport Automatically create backport pull requests when merged label Jan 7, 2026
Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question on the exact behaviour change of this PR compared to existing code and the code before #137228. I may be missing something. Appologies if I am asking the obvious.

Comment on lines +1577 to +1578
final Decision.Type canAllocateOrRebalance = allocationDecision.type() == Type.THROTTLE
|| rebalanceDecision.type() == Type.THROTTLE ? Type.THROTTLE : Type.YES;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we capture this as a method on Type? Or should this use compareToBetweenDecisions since it's technically still decision aggreation for a single node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to use compareToBetweenDecisions because it sort-of implies that the order used by that method is significant, here we are dealing only with THROTTLE and YES so I think it's good to make it clear that it only cares about that.

I think we could add it to Type but perhaps as a subsequent PR to prevent holding things up any further.

for (AllocationDecider decider : deciders) {
var decision = deciderAction.apply(decider);
if (mostNegativeDecision.type().higherThan(decision.type())) {
if (mostNegativeDecision.type().compareToBetweenDecisions(decision.type()) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, this is a behaviour change to the existing code, right? It currently takes a THROTTLE over NOT_PREFERRED. But it will be the opposite after this change?

Essentially, the two places where compareToBetweenDecisions is used indicates behaviour change compared to the existing code? It is effectively reverting to the behaviour before #137228. Is that the target here? Although it does not align with the PR description which says

Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

I am a bit confused ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is effectively reverting to the behaviour before #137228

Yep, that's the goal. It turns out there are two prioritisation orders, so a single order can't define them here we make them explicit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the target, to revert the between decision comparison to prior to that PR but keep the node-level comparison from the PR.

With the change of this PR, we will end up with THROTTLE overruling NOT-PREFERRED, like it used to. Notice that here the most negative decision we've seen so far is "higher" than the current decision and we thus pick the decision moving forward. NOT-PREFERRED.compareToBetweenDecisions(THROTTLE) is > 0, making us keep THROTTLE. (sorry for spelling this out, trying to save a round of interaction).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok thanks both for explaining, especially the explict call out from Henning! I got it now.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…routing/allocation/allocator/NonDesiredBalanceIT.java

Co-authored-by: Yang Wang <ywangd@gmail.com>
Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the PR as-is, will implement any required tidy up as a follow up

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +51 to +56
protected Settings nodeSettings(int nodeOrdinal, Settings otherSettings) {
return Settings.builder()
.put(super.nodeSettings(nodeOrdinal, otherSettings))
.put(ClusterModule.SHARDS_ALLOCATOR_TYPE_SETTING.getKey(), ClusterModule.BALANCED_ALLOCATOR)
.build();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since we are starting node manually for eadch test. I think we can merge the two test classes and start node in each test method with its own node settings. That seems less clutter to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah definitely some de-duplication due here. I'll add it to the follow-up

@nicktindall nicktindall enabled auto-merge (squash) January 7, 2026 10:49
@nicktindall nicktindall merged commit 3d81e05 into elastic:main Jan 7, 2026
35 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
9.3

DiannaHohensee added a commit to DiannaHohensee/elasticsearch that referenced this pull request Jan 7, 2026
Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

Relates ES-13903

Co-authored-by: Nick Tindall <nick.tindall@elastic.co>
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
szybia added a commit to szybia/elasticsearch that referenced this pull request Jan 7, 2026
* upstream/main: (191 commits)
  Overall Decision for Deciders prioritizes THROTTLE (elastic#140237)
  Apply group by all logic not only to top-level aggregates (elastic#140248)
  [ES|QL] Refactor MV_UNION and MV_INTERSECTION to use shared set operation helper (elastic#139982)
  Avoid reading entire bloom filter file on reader open (elastic#139374)
  Mark bloom filter files for random access (elastic#139375)
  Ensure that the buffer used for ES93BloomFilterStoredFieldsFormat is zeroed (elastic#139034)
  Add busy assertion to avoid race condition for testStalledShardMigrationProperlyDetected (elastic#140230)
  Remove line number check for testTransitiveFindsDeepCallChain (elastic#140228)
  Allow a slight difference in rescored docs (elastic#139931)
  Mute org.elasticsearch.xpack.inference.integration.AuthorizationTaskExecutorIT testCreatesEisChatCompletion_DoesNotRemoveEndpointWhenNoLongerAuthorized elastic#138480
  Start exchange sink fetchers concurrently (elastic#140196)
  Allow allocation to replacement target node on vacate completion (elastic#140150)
  Ignore JNA cleaner threads in SecureHdfsRepositoryAnalysisRestIT (elastic#139925)
  DeterministicQueue refactor and enhancement (elastic#140151)
  Always error out if CCS expression shows up when CCS is not supported (elastic#139009)
  Use IllegalArgumentException over RepositoryException for readonly-repository checks (elastic#140200)
  Guard promql capabilities in AnalyzerTests (elastic#140232)
  [Inference API] Fix flaky AuthorizationTaskExecutorIT tests (elastic#139978)
  Cleaning up exitable vector value impls (elastic#140190)
  [Inference API] Fix auth exception listener not called bug (elastic#139966)
  ...
@szybia
Copy link
Contributor

szybia commented Jan 7, 2026

fwiw if this is the causing PR, assuming will be muted soon, but failing quite frequently on main if i'm not mistaken:

~/repos/elasticsearch $ ./gradlew ":server:test" --tests "org.elasticsearch.cluster.routing.allocation.decider.AllocationDecidersTests.testCheckAllDecidersBeforeReturningNotPreferred" -Dtests.iters=10000

...

  2> REPRODUCE WITH: ./gradlew ":server:test" --tests "org.elasticsearch.cluster.routing.allocation.decider.AllocationDecidersTests.testCheckAllDecidersBeforeReturningNotPreferred {seed=[85009956502A5DC7:200A75E136AA6C09]}" -Dtests.seed=85009956502A5DC7 -Dtests.locale=ca-FR -Dtests.timezone=Indian/Chagos -Druntime.java=25
  2> java.lang.AssertionError:
    Expected: <NOT_PREFERRED()>
         but: was <THROTTLE()>
        at __randomizedtesting.SeedInfo.seed([85009956502A5DC7:200A75E136AA6C09]:0)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
        at org.elasticsearch.test.ESTestCase.assertThat(ESTestCase.java:2853)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDecidersTests.lambda$verifyDecidersCall$21(AllocationDecidersTests.java:194)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDecidersTests.verifyDecidersCall(AllocationDecidersTests.java:182)

...

10000 tests completed, 3210 failed

burqen added a commit to burqen/elasticsearch that referenced this pull request Jan 7, 2026
THROTTLE to take precedence over NOT_PREFERRED. This behaviour was
changed in elastic#140237 but this randomized test was not updated. Fixed here.
burqen added a commit that referenced this pull request Jan 7, 2026
THROTTLE to take precedence over NOT_PREFERRED. This behaviour was
changed in #140237 but this randomized test was not updated. Fixed here.
burqen added a commit to burqen/elasticsearch that referenced this pull request Jan 7, 2026
THROTTLE to take precedence over NOT_PREFERRED. This behaviour was
changed in elastic#140237 but this randomized test was not updated. Fixed here.
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

Relates ES-13903

Co-authored-by: Nick Tindall <nick.tindall@elastic.co>
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
sidosera pushed a commit to sidosera/elasticsearch that referenced this pull request Jan 7, 2026
THROTTLE to take precedence over NOT_PREFERRED. This behaviour was
changed in elastic#140237 but this randomized test was not updated. Fixed here.
elasticsearchmachine pushed a commit that referenced this pull request Jan 8, 2026
Fixing a bug where AllocationDeciders could summarize
AllocationDecider responses as NOT_PREFERRED, which allows shard
movement, when an AllocationDecider responded THROTTLE.

Relates ES-13903

Co-authored-by: Nick Tindall <nick.tindall@elastic.co>
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
elasticsearchmachine pushed a commit that referenced this pull request Jan 8, 2026
* Fix flaky test: AllocationDecidersTests (#140271)

THROTTLE to take precedence over NOT_PREFERRED. This behaviour was
changed in #140237 but this randomized test was not updated. Fixed here.

* Unmute test

---------

Co-authored-by: Nick Tindall <nick.tindall@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) serverless-linked Added by automation, don't add manually Team:Distributed Coordination (obsolete) Meta label for Distributed Coordination team. Obsolete. Please do not use. v9.3.0 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments