Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Shallow copy snapshot failures on closed index #16868

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

astute-decipher
Copy link
Contributor

@astute-decipher astute-decipher commented Dec 17, 2024

Description

We observed for a remote-store backed index, if the index get’s closed, the shallow snapshot fails for the shard with error :
java.nio.file.NoSuchFileException: Metadata file is not present for given primary term <X> and generation <Y> .
On root causing the issue we found :

  • There is difference in last segment generation on local node directory & remote store directory. Where remote store lags by 1 generation.
  • The shallow_v1 snapshot tries to find the latest segment generation on remote store, which was failing since it never got uploaded.
  • The last segment_N file while closing the index got uploaded to the remote store.
  • But post successful close, we open a read_only engine for the index, which performs the recovery and creates a new segment_N file, but since it will not be having any refresh_listener available the new file will not get uploaded to remote store ever.

Approach :

  • Take snapshot with last successfully uploaded segment generation
  • We fetch the latest metadata file from remote directory and take lock on that commit generation.

Related Issues

Resolves [#13805]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for 88c3280: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@astute-decipher astute-decipher changed the title Fix shallow v1 snapshot failures on closed index Fix Shallow copy snapshot failures on closed index Dec 17, 2024
@astute-decipher astute-decipher self-assigned this Dec 17, 2024
Copy link
Contributor

❕ Gradle check result for e4fd52b: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 53.84615% with 24 lines in your changes missing coverage. Please review.

Project coverage is 72.25%. Comparing base (e7e19f7) to head (4bd2acb).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...rg/opensearch/snapshots/SnapshotShardsService.java 59.37% 13 Missing ⚠️
...ch/repositories/blobstore/BlobStoreRepository.java 53.84% 5 Missing and 1 partial ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 33.33% 2 Missing and 2 partials ⚠️
...n/java/org/opensearch/repositories/Repository.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16868      +/-   ##
============================================
+ Coverage     72.20%   72.25%   +0.04%     
+ Complexity    65289    65262      -27     
============================================
  Files          5299     5299              
  Lines        303536   303565      +29     
  Branches      43941    43947       +6     
============================================
+ Hits         219180   219329     +149     
+ Misses        66441    66227     -214     
- Partials      17915    18009      +94     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Jan 2, 2025

✅ Gradle check result for 3e86036: SUCCESS

Copy link
Contributor

github-actions bot commented Jan 6, 2025

❕ Gradle check result for ec0f609: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Shubh Sahu added 5 commits January 7, 2025 11:01
Signed-off-by: Shubh Sahu <[email protected]>
Signed-off-by: Shubh Sahu <[email protected]>
Signed-off-by: Shubh Sahu <[email protected]>
Signed-off-by: Shubh Sahu <[email protected]>
Copy link
Contributor

github-actions bot commented Jan 7, 2025

✅ Gradle check result for f495cad: SUCCESS

Signed-off-by: Shubh Sahu <[email protected]>
Copy link
Contributor

github-actions bot commented Jan 7, 2025

❌ Gradle check result for 3947531: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jan 8, 2025

✅ Gradle check result for 3947531: SUCCESS

Copy link
Contributor

github-actions bot commented Jan 8, 2025

✅ Gradle check result for 4bd2acb: SUCCESS

@astute-decipher astute-decipher added the backport 2.x Backport to 2.x branch label Jan 8, 2025
@astute-decipher
Copy link
Contributor Author

Added ITs & UTs to cover most of the new lines added, but since IT doesn't account for code-coverage, codecov is failing.

@ashking94 ashking94 merged commit 2eadf12 into opensearch-project:main Jan 9, 2025
41 of 42 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 9, 2025
* Fix shallow v1 snapshot failures on closed index

Signed-off-by: Shubh Sahu <[email protected]>

* UT fix

Signed-off-by: Shubh Sahu <[email protected]>

* Adding UT

Signed-off-by: Shubh Sahu <[email protected]>

* small fix

Signed-off-by: Shubh Sahu <[email protected]>

* Addressing comments

Signed-off-by: Shubh Sahu <[email protected]>

* Addressing comments

Signed-off-by: Shubh Sahu <[email protected]>

* Modifying IT to restore snapshot

Signed-off-by: Shubh Sahu <[email protected]>

---------

Signed-off-by: Shubh Sahu <[email protected]>
Co-authored-by: Shubh Sahu <[email protected]>
(cherry picked from commit 2eadf12)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ashking94 pushed a commit that referenced this pull request Jan 9, 2025
* Fix shallow v1 snapshot failures on closed index



* UT fix



* Adding UT



* small fix



* Addressing comments



* Addressing comments



* Modifying IT to restore snapshot



---------



(cherry picked from commit 2eadf12)

Signed-off-by: Shubh Sahu <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shubh Sahu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants