Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards #8916

Merged

Conversation

amkhar
Copy link
Contributor

@amkhar amkhar commented Jul 27, 2023

Description

This PR is a continuation of the project #5098
Meta Issue : #8098 (Also contains in which order this PR should be merged)
This PR can be reviewed after #9760

As part of node drop and joins, we fetch the metadata of unassigned shards on per shard basis.

public abstract class AsyncShardFetch<T extends BaseNodeResponse> implements Releasable {

PrimaryShardAllocator can build allocation decision for a single unassigned shard. We need to add support for building decisions for a batch of shards, so adding new class for that.

Note :

  1. UTs and ITs will be added in upcoming commits.
  2. Gradle check will fail on compilation because this PR is dependent on multiple other PRs :
  3. Refactoring of existing PSA class is done in PrimaryShardAllocator refactor to abstract out shard state and method calls #9760
  4. CHANGELOG will be updated only after last PR is raised(being merged) for this project.

Additional context
Please go through the discussion #5098 to understand the overall enhancement approach and check #8098 to see the sub tasks of overall project.

Related Issues

Resolves #8960

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity. Remove stalled label or comment or this will be closed in 7 days.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Aug 27, 2023
@amkhar
Copy link
Contributor Author

amkhar commented Aug 27, 2023

PR will be updated in few days.

@github-actions github-actions bot added bug Something isn't working Cluster Manager labels Sep 21, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 21, 2023

Compatibility status:

Checks if related components are compatible with change d372c54

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Copy link
Contributor

❕ Gradle check result for 6112f4b: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Mar 15, 2024

Codecov Report

Attention: Patch coverage is 58.21918% with 61 lines in your changes are missing coverage. Please review.

Project coverage is 71.37%. Comparing base (b15cb0c) to head (f702715).
Report is 44 commits behind head on main.

❗ Current head f702715 differs from pull request most recent head d372c54. Consider uploading reports for the commit d372c54 to get more accurate results

Files Patch % Lines
...teway/TransportNodesGatewayStartedShardHelper.java 29.03% 44 Missing ⚠️
...ateway/TransportNodesListGatewayStartedShards.java 63.15% 5 Missing and 2 partials ⚠️
...ices/shards/TransportIndicesShardStoresAction.java 0.00% 4 Missing ⚠️
...opensearch/gateway/PrimaryShardBatchAllocator.java 91.89% 3 Missing ⚠️
...y/TransportNodesListGatewayStartedShardsBatch.java 0.00% 2 Missing ⚠️
.../org/opensearch/gateway/PrimaryShardAllocator.java 95.45% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8916      +/-   ##
============================================
- Coverage     71.42%   71.37%   -0.05%     
- Complexity    59978    60087     +109     
============================================
  Files          4985     4993       +8     
  Lines        282275   282751     +476     
  Branches      40946    40998      +52     
============================================
+ Hits         201603   201825     +222     
- Misses        63999    64179     +180     
- Partials      16673    16747      +74     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Aman Khare <[email protected]>
Copy link
Contributor

❌ Gradle check result for e798d7a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for f702715: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@amkhar
Copy link
Contributor Author

amkhar commented Mar 16, 2024

testConcurrentDecommissionAction

Flaky test : #12197

Signed-off-by: Aman Khare <[email protected]>
Copy link
Contributor

❌ Gradle check result for d372c54: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@amkhar
Copy link
Contributor Author

amkhar commented Mar 18, 2024

❌ Gradle check result for d372c54: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test testConnectAndExecuteRequest #12338

@shwetathareja shwetathareja removed the bug Something isn't working label Mar 18, 2024
@github-actions github-actions bot added the bug Something isn't working label Mar 18, 2024
Copy link
Contributor

❌ Gradle check result for d372c54: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@amkhar
Copy link
Contributor Author

amkhar commented Mar 19, 2024

❌ Gradle check result for d372c54: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Flaky test testDelayWithALargeAmountOfShards : #10558
Reason and stack trace is same as previous failures.

@shwetathareja shwetathareja merged commit a499d1e into opensearch-project:main Mar 19, 2024
34 of 40 checks passed
@shiv0408 shiv0408 added backport 2.x Backport to 2.x branch and removed bug Something isn't working labels Mar 19, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 19, 2024
…ch of shards (#8916)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards

Signed-off-by: Aman Khare <[email protected]>
(cherry picked from commit a499d1e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@amkhar amkhar added backport 2.x Backport to 2.x branch and removed backport 2.x Backport to 2.x branch labels Mar 21, 2024
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 21, 2024
…ch of shards (#8916)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards

Signed-off-by: Aman Khare <[email protected]>
(cherry picked from commit a499d1e)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shwetathareja pushed a commit that referenced this pull request Mar 21, 2024
…ch of shards (#8916) (#12813)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards


(cherry picked from commit a499d1e)

Signed-off-by: Aman Khare <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 21, 2024
…ch of shards (#8916) (#12813)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards

(cherry picked from commit a499d1e)

Signed-off-by: Aman Khare <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit b2d22d4)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shwetathareja pushed a commit that referenced this pull request Mar 21, 2024
…ch of shards (#8916) (#12813) (#12823)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards

(cherry picked from commit a499d1e)
(cherry picked from commit b2d22d4)

Signed-off-by: Aman Khare <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…ch of shards (opensearch-project#8916)

* Add PrimaryShardBatchAllocator to take allocation decisions for a batch of shards

Signed-off-by: Aman Khare <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

[Enhancement] Add a batch allocator for building allocation decisions for multiple primary & replica shards
5 participants