Shard started reroute high priority #137306

henningandersen · 2025-10-29T07:36:00Z

We executed shard started at urgent priority, but the subsequent reroute were at normal priority, causing subsequent recoveries to possibly come after other less important actions. Now do the reroute at high priority.

elasticsearchmachine · 2025-10-29T07:36:24Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

elasticsearchmachine · 2025-10-29T07:37:04Z

Hi @henningandersen, I've created a changelog YAML for you.

ywangd

I have a question

ywangd · 2025-10-29T07:43:16Z

server/src/main/java/org/elasticsearch/cluster/action/shard/ShardStateAction.java

            rerouteService.reroute(
                "reroute after starting shards",
-                Priority.NORMAL,
+                Priority.HIGH,


Do we want to check whether there is unassigned shard before promote it to High priority? I'd be ok to have it as Urgent if there is unassigned shards. But we can take one step at a time.

I decided not to do so, for a couple reasons:

Simplicity.

Seems safe enough - shard started is not that frequent and is batched - and so are the reroutes.

Relocations off a shutting down node could also run into this and thus delay vacating a node.

Every shard initialization has some time on the data node where the cluster can attend to other things before shard started comes back (ofc assuming things otherwise work well).

But happy to change this if you find it important.

I prefer to only go to HIGH to avoid bumping the priority too much until we have evidence we need it.

I am suggesting that mostly trying to see whether we can bump it higher, e.g. Urgent, so that it does not get blocked by put-mappings requests. It is something that we observed a few times in production clusters. But if we are sticking to High, the conditional priority is probably not entirely necessary.

For the record, I still think conditional urgent is useful. But we can iterate on this.

HIGH is sufficient to avoid getting blocked by a stream of other HIGH priority requests such as put-mapping ones, because all the HIGH tasks run in submission order.

Err I mis-remembered put-mapping to be URGENT. Thanks for explaining.

DaveCTurner

Henning and I discussed this yesterday and convinced me this is a reasonable change. It LGTM but this does not mean we are setting a precedent for other master-task-priority tweaks.

We cannot go higher than HIGH here without regressing #44433, but I can see an argument that maybe we should have gone with HIGH rather than NORMAL in the first place.

Also the desired balance allocator means that #44433 should be less of a big deal now than it was at the time.

ywangd

LGTM

* Shard started reroute high priority We executed shard started at urgent priority, but the subsequent reroute were at normal priority, causing subsequent recoveries to possibly come after other less important actions. Now do the reroute at high priority.

Shard started reroute high priority

0c0dc54

We executed shard started at urgent priority, but the subsequent reroute were at normal priority, causing subsequent recoveries to possibly come after other less important actions. Now do the reroute at high priority.

henningandersen requested review from DaveCTurner and ywangd October 29, 2025 07:36

henningandersen added the :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Oct 29, 2025

elasticsearchmachine added v9.3.0 Team:Distributed Coordination Meta label for Distributed Coordination team labels Oct 29, 2025

henningandersen added the >enhancement label Oct 29, 2025

Update docs/changelog/137306.yaml

cfb46d7

ywangd reviewed Oct 29, 2025

View reviewed changes

DaveCTurner approved these changes Oct 29, 2025

View reviewed changes

ywangd approved these changes Oct 29, 2025

View reviewed changes

henningandersen merged commit c5b0360 into elastic:main Oct 29, 2025
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shard started reroute high priority #137306

Shard started reroute high priority #137306

henningandersen commented Oct 29, 2025

Uh oh!

elasticsearchmachine commented Oct 29, 2025

Uh oh!

elasticsearchmachine commented Oct 29, 2025

Uh oh!

ywangd left a comment

Uh oh!

ywangd Oct 29, 2025

Uh oh!

henningandersen Oct 29, 2025

Uh oh!

ywangd Oct 29, 2025

Uh oh!

ywangd Oct 29, 2025

Uh oh!

DaveCTurner Oct 29, 2025

Uh oh!

ywangd Oct 29, 2025

Uh oh!

DaveCTurner left a comment

Uh oh!

ywangd left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Shard started reroute high priority #137306

Shard started reroute high priority #137306

Conversation

henningandersen commented Oct 29, 2025

Uh oh!

elasticsearchmachine commented Oct 29, 2025

Uh oh!

elasticsearchmachine commented Oct 29, 2025

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

ywangd Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants