Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Jul 8, 2022

Fix a bug when a shard could be re-balanced using outdated node weights in case 2 or more shards movements in a row for a single index were simulated due to THROTTLE.

Closes: #88384

Add a test that checks a corner situation when currently after moving
some shards around we end up with exactly the same balance as before.
@idegtiarenko idegtiarenko added :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Jul 8, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @idegtiarenko, I've created a changelog YAML for you.

@idegtiarenko idegtiarenko marked this pull request as ready for review July 12, 2022 11:44
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

addIndex(metadataBuilder, routingTableBuilder, "index-3", 10, Map.of("node-0", 3, "node-1", 3, "node-2", 4));
addIndex(metadataBuilder, routingTableBuilder, "index-4", 10, Map.of("node-0", 4, "node-1", 3, "node-2", 3));
addIndex(metadataBuilder, routingTableBuilder, "index-5", 10, Map.of("node-0", 3, "node-1", 4, "node-2", 3));
addIndex(metadataBuilder, routingTableBuilder, "index-6", 10, Map.of("node-0", 3, "node-1", 3, "node-2", 4));
Copy link
Contributor Author

@idegtiarenko idegtiarenko Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am checking if it is possible to reproduce it with a simpler setup

assert decision.type() == Type.THROTTLE;
minNode.addShard(shard.relocate(minNode.getNodeId(), shardSize));
return false;
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add a bit of context:

return false is used when exiting this method without moving a shard and as a result not updating the node weights.
This means that previously the weights here were not updated if a move was simulated. This could lead to rebalancing using outdated weights if two or more moves in a row for a single index are simulated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The two branches of the if (decision.type() == Type.YES) test are now almost the same. Does it work to consolidate them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's more to consolidate. How about this?

diff --git a/server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java b/server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
index 97bdbd7367a..66c1c7f2001 100644
--- a/server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
+++ b/server/src/main/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java
@@ -1073,18 +1073,21 @@ public class BalancedShardsAllocator implements ShardsAllocator {
                     maxNode.removeShard(shard);
                     long shardSize = allocation.clusterInfo().getShardSize(shard, ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE);

-                    if (decision.type() == Type.YES) {
-                        /* only allocate on the cluster if we are not throttled */
-                        logger.debug("Relocate [{}] from [{}] to [{}]", shard, maxNode.getNodeId(), minNode.getNodeId());
-                        minNode.addShard(routingNodes.relocateShard(shard, minNode.getNodeId(), shardSize, allocation.changes()).v1());
-                        return true;
-                    } else {
-                        /* allocate on the model even if throttled */
-                        logger.debug("Simulate relocation of [{}] from [{}] to [{}]", shard, maxNode.getNodeId(), minNode.getNodeId());
-                        assert decision.type() == Type.THROTTLE;
-                        minNode.addShard(shard.relocate(minNode.getNodeId(), shardSize));
-                        return false;
-                    }
+                    assert decision.type() == Type.YES || decision.type() == Type.THROTTLE : decision.type();
+                    logger.debug(
+                        "decision [{}]: relocate [{}] from [{}] to [{}]",
+                        decision.type(),
+                        shard,
+                        maxNode.getNodeId(),
+                        minNode.getNodeId()
+                    );
+                    minNode.addShard(
+                        decision.type() == Type.YES
+                            /* only allocate on the cluster if we are not throttled */
+                            ? routingNodes.relocateShard(shard, minNode.getNodeId(), shardSize, allocation.changes()).v1()
+                            : shard.relocate(minNode.getNodeId(), shardSize)
+                    );
+                    return true;
                 }
             }
             logger.trace("No shards of [{}] can relocate from [{}] to [{}]", idx, maxNode.getNodeId(), minNode.getNodeId());

@idegtiarenko idegtiarenko changed the title Test balance improves after rebalancing Recalculate balance using up-to date weights when THROTTLE Jul 12, 2022
@idegtiarenko idegtiarenko requested a review from DaveCTurner July 12, 2022 12:49
Comment on lines 99 to 105
addIndex(metadataBuilder, routingTableBuilder, "index-0", Map.of("node-0", 4, "node-1", 4, "node-2", 2));
addIndex(metadataBuilder, routingTableBuilder, "index-1", Map.of("node-0", 4, "node-1", 4, "node-2", 2));
addIndex(metadataBuilder, routingTableBuilder, "index-2", Map.of("node-0", 3, "node-1", 3, "node-2", 4));
addIndex(metadataBuilder, routingTableBuilder, "index-3", Map.of("node-0", 3, "node-1", 3, "node-2", 4));
addIndex(metadataBuilder, routingTableBuilder, "index-4", Map.of("node-0", 4, "node-1", 3, "node-2", 3));
addIndex(metadataBuilder, routingTableBuilder, "index-5", Map.of("node-0", 3, "node-1", 4, "node-2", 3));
addIndex(metadataBuilder, routingTableBuilder, "index-6", Map.of("node-0", 3, "node-1", 3, "node-2", 4));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be quite dependent on the ordering of these indices in a map. Should we randomise their names and the order in which they're added?

assert decision.type() == Type.THROTTLE;
minNode.addShard(shard.relocate(minNode.getNodeId(), shardSize));
return false;
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

The two branches of the if (decision.type() == Type.YES) test are now almost the same. Does it work to consolidate them?

@idegtiarenko idegtiarenko requested a review from DaveCTurner July 12, 2022 14:32
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@idegtiarenko idegtiarenko merged commit a23e416 into elastic:master Jul 13, 2022
@idegtiarenko idegtiarenko deleted the test_balance_improves branch July 13, 2022 06:59
@@ -0,0 +1,5 @@
pr: 88385
summary: Test balance improves after rebalancing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve the summary here? It doesn't tell me much about the bug or the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, opened a pr for this: #88709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BalancedShardsAllocator rebalancing might move shards but not improve the balance

5 participants