Skip to content

Faster access to started shards count of the index in each node #53559

@kkewwei

Description

@kkewwei

ES_VERSION: 7.6.0
JVM version : JDK1.8.0_112
OS version : linux
Description of the problem including expected versus actual behavior:
As it's known, Updating ClusterState on master may cost too much time, which is not good for cluster. During the updating ClusterState, ShardsLimitAllocationDecider deciders iterate through all the shards on a node to find STARTED ones belonging to the index when cluster.routing.allocation.total_shards_per_node > 0, Which will cost too much time.

In out product, There are 39 nodes and 2,000 indices, 50,000 shards, but the time to update cluster state reach at 3.4min, It's intolerable.

To find out why it cost so much time on updating cluste state, I get the thread stack about updateTask, such that:

"[node-1][clusterService#updateTask][T#1]" #21 daemon prio=5 os_prio=0 tid=0x00007fc5c88fa800 nid=0x3369 runnable [0x00007fc58431a000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Collections$UnmodifiableCollection$1.hasNext(Collections.java:1041)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.doDecide(ShardsLimitAllocationDecider.java:112)
        at org.elasticsearch.cluster.routing.allocation.decider.ShardsLimitAllocationDecider.canAllocate(ShardsLimitAllocationDecider.java:88)
        at org.elasticsearch.cluster.routing.allocation.decider.AllocationDeciders.canAllocate(AllocationDeciders.java:73)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.decideMove(BalancedShardsAllocator.java:707)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator$Balancer.moveShards(BalancedShardsAllocator.java:648)
        at org.elasticsearch.cluster.routing.allocation.allocator.BalancedShardsAllocator.allocate(BalancedShardsAllocator.java:123)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.reroute(AllocationService.java:329)
        at org.elasticsearch.cluster.routing.allocation.AllocationService.applyStartedShards(AllocationService.java:100)
        at org.elasticsearch.cluster.action.shard.ShardStateAction$ShardStartedClusterStateTaskExecutor.execute(ShardStateAction.java:438)
        at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634)
        at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612)

I try several times and get the same thread stack, it seems that ShardsLimitAllocationDecider.doDecide will cost too much time, the related code:

       if (indexShardLimit <= 0 && clusterShardLimit <= 0) {
            return allocation.decision(Decision.YES, NAME, "total shard limits are disabled: [index: %d, cluster: %d] <= 0",
                    indexShardLimit, clusterShardLimit);
        }
        int indexShardCount = 0;
        int nodeShardCount = 0;
        for (ShardRouting nodeShard : node) {
            // don't count relocating shards...
            if (nodeShard.relocating()) {
                continue;
            }
            nodeShardCount++;
            if (nodeShard.index().equals(shardRouting.index())) {
                indexShardCount++;
            }
        }

It will iterate 50000*50000/39 = 64,000,000 times, which will cost too much time.

There is room for optimization to avoid iterating the node:
1.If indexShardLimit=-1 and clusterShardLimit>0, we need't to count indexShardCount and nodeShardCount by iterating, nodeShardCount = node.size() - node.numberOfShardsWithState(ShardRoutingState.RELOCATING), indexShardCount is useless.
2. If we could count the started shards of each index in each node in RoutingNode to avoid the iteration?

Metadata

Metadata

Assignees

Labels

:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions