Add a guardrail to limit maximum number of shard on the cluster#6143
Conversation
5fcf038 to
0ee3948
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
0d89de4 to
ad53c45
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Bukhtawar
left a comment
There was a problem hiding this comment.
Overall LGTM, ensure we handle min case of node and cluster limit
ad53c45 to
9e5a63e
Compare
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #6143 +/- ##
============================================
- Coverage 70.87% 70.86% -0.01%
+ Complexity 58830 58823 -7
============================================
Files 4776 4776
Lines 280993 281020 +27
Branches 40598 40599 +1
============================================
Hits 199141 199141
- Misses 65488 65557 +69
+ Partials 16364 16322 -42
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
9e5a63e to
367840e
Compare
| final String inValidMaxShardSettingErrorMessage = "Max shard per cluster threshold " | ||
| + maxShardsInCluster | ||
| + "should be greater than or equal to max shard per node " | ||
| + maxShardsPerNode; | ||
| return Optional.of(inValidMaxShardSettingErrorMessage); |
There was a problem hiding this comment.
Curious on how would this behave since the node count is dynamic. Lets says initially max shard per node is 100 and there are 10 nodes with max shard per cluster as 1010. Now another node is added to the cluster in which case 1010 (max shard per cluster) is lesser than 1000*11. May be we can just put a condition as mentioned in this comment and remove this condition here? wdyt @sachinpkale @Bukhtawar @reta?
There was a problem hiding this comment.
and maxShardsInCluster = Math.min(maxShardsInCluster, maxShardsPerNode * nodeCount) can stay as a single else block.
There was a problem hiding this comment.
Addressed the comment.
Gradle Check (Jenkins) Run Completed with:
|
367840e to
dd5e23d
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishav Sagar <rissag@amazon.com>
dd5e23d to
b1397cf
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Rishav Sagar <rissag@amazon.com> (cherry picked from commit e42b76f) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Rishav Sagar <rissag@amazon.com> (cherry picked from commit e42b76f) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit e42b76f) Signed-off-by: Rishav Sagar <rissag@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Rishav Sagar rissag@amazon.com
Description
We have observed that as shard count on the cluster increases, it becomes more prone to instability and availability. We have observed multiple instances where user had issues in the past because of too many shards on their cluster.
As of now, OpenSearch allows us to restrict the number of shards per node using setting
cluster.max_shards_per_nodeinsideShardLimitValidator. The maximum number of shards that can be present on the cluster is determined bymaxShardsPerNode * nodeCount. Issue here is, we can increase this total shard limit on the cluster just by increasing the number of node. There is no way we can control the maximum number of shards on an entire cluster.I would suggest to add a similar setting inside
ShardLimitValidatorwhich will control the maximum number of shards that can be present on a cluster. If the total number of shards exceeds this threshold,ShardLimitValidatorwill block any operation which creates new shards. The threshold value will be an optional configurable cluster setting, which user can set dynamically.Issues Resolved
#6050
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.