Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore remote index shards with ExistingStoreRecoverySource after restore from remote state #10665

Conversation

linuxpi
Copy link
Collaborator

@linuxpi linuxpi commented Oct 17, 2023

Description

After Quorum loss recovery, remote index shards will be restore in cluster metadata as EXISTING_STORE as RecoverySource. This is same as non remote shards.

Related Issues

Resolves #10658

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 17, 2023

Compatibility status:

Checks if related components are compatible with change 91b3ea2

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions github-actions bot added bug Something isn't working Cluster Manager labels Oct 18, 2023
@linuxpi
Copy link
Collaborator Author

linuxpi commented Oct 19, 2023

@linuxpi can this be simplified further where indices are restored in cluster state after metadata recovery from remote as "CLUSTER_RECOVERED" (which is default behavior ), now if there is a latest copy of a shard available on any index, it will be assigned else shard will turn red. Essentially, let it behave like shards are recovering from local disk after cluster restart. So when recovering metadata, if you don't add any entry for shard RoutingTable then ClusterStateUpdaters will add a recovery entry own it own

routingTableBuilder.addAsRecovery(cursor);

(lets remove the code we added here earlier for special handling for remote indices)

and when user trigger the restore API, it will set the recovery source and everything properly.

So if i got it correctly, in case of remote state auto restore, we want to initialize shards with ExistingStore recovery. If any of the shards find data in local disk on data node it will comeup and others will fail?

so thinking from the perspective of split primaries issue, if the shard comes up in the network partition with cluster manager in it, the isolated primary in other partition would try to do primary term validation with this new shard that just came up and it wont be able to?

@linuxpi linuxpi requested a review from abbashus as a code owner October 20, 2023 03:11
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: bansvaru <[email protected]>
@linuxpi linuxpi changed the title introduce new REMOTE_METADATA_RECOVERED UnassignedInfo Reason to control remote shard recovery Restore remote index shards with ExistingStoreRecoverySource after restore from remote state Oct 20, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: bansvaru <[email protected]>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@linuxpi
Copy link
Collaborator Author

linuxpi commented Oct 20, 2023

Flaky Test - #9464

org.opensearch.search.query.QueryPhaseTests.testQueryTimeoutChecker {p0=0 p1=org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher@42b43a6}

@shwetathareja shwetathareja merged commit 6641ef8 into opensearch-project:main Oct 20, 2023
17 of 19 checks passed
@shwetathareja shwetathareja added the backport 2.x Backport to 2.x branch label Oct 20, 2023
@linuxpi linuxpi deleted the ensure-no-split-primaries-state-restore branch October 20, 2023 12:37
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 20, 2023
…store from remote state (#10665)

* Restore remote index shards with ExistingStoreRecoverySource after restore from remote state

Signed-off-by: bansvaru <[email protected]>
(cherry picked from commit 6641ef8)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
shwetathareja pushed a commit that referenced this pull request Oct 22, 2023
…store from remote state (#10665) (#10779)

* Restore remote index shards with ExistingStoreRecoverySource after restore from remote state

(cherry picked from commit 6641ef8)

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
austintlee pushed a commit to austintlee/OpenSearch that referenced this pull request Oct 23, 2023
…store from remote state (opensearch-project#10665)

* Restore remote index shards with ExistingStoreRecoverySource after restore from remote state

Signed-off-by: bansvaru <[email protected]>
@linuxpi linuxpi restored the ensure-no-split-primaries-state-restore branch January 29, 2024 12:51
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…store from remote state (opensearch-project#10665)

* Restore remote index shards with ExistingStoreRecoverySource after restore from remote state

Signed-off-by: bansvaru <[email protected]>
Signed-off-by: Shivansh Arora <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working Cluster Manager skip-changelog Storage:Remote Storage Issues and PRs relating to data and metadata storage v2.12.0 Issues and PRs related to version 2.12.0
Projects
None yet
3 participants