Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snapshot Interop] Shallow copy snapshots failing for closed indices #13805

Open
harishbhakuni opened this issue May 23, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working Storage:Snapshots

Comments

@harishbhakuni
Copy link
Contributor

harishbhakuni commented May 23, 2024

Describe the bug

We recently found out a issue where shallow copy snapshots are failing for closed indices. However full copy snapshots succeeds for those indices.

Snapshot shard failed
java.nio.file.NoSuchFileException: Metadata file is not present for given primary term 2 and generation 6
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.getMetadataFileForCommit(RemoteSegmentStoreDirectory.java:527)
    at org.opensearch.index.store.RemoteSegmentStoreDirectory.acquireLock(RemoteSegmentStoreDirectory.java:480)
    at org.opensearch.index.shard.IndexShard.acquireLockOnCommitData(IndexShard.java:1655)
    at org.opensearch.snapshots.SnapshotShardsService.snapshot(SnapshotShardsService.java:631)
    at org.opensearch.snapshots.SnapshotShardsService$1.doRun(SnapshotShardsService.java:393)
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractPrioritizedRunnable.doRun(ThreadContext.java:979)
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:840)

For shallow copy snapshots, we refer latest remote store data and acquire a lock on that data. since the indices are closed no new data is being written to remote store which should get triggered as part of snapshot flush. this is causing snapshots to fail.

Related component

Storage:Snapshots

To Reproduce

  1. Create a remote store enabled cluster.
  2. Create indices and close them.
  3. Register a snapshot repository and enable shallow copy snapshots or use system repository created during cluster creation.
  4. Trigger snapshot, it will fail.

Expected behavior

Snapshots should pass.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6
@harishbhakuni Thanks for creating this issue, we would welcome a pull request to address this bug

@sachinpkale sachinpkale moved this from 🆕 New to Next (Next Quarter) in Storage Project Board May 30, 2024
@sachinpkale
Copy link
Member

[Storage Triage - attendees 1 2 3 4 5 6 7 8 9 10 ]

Added release target 2.16

@moritzzimmer
Copy link

We face the same issue on AWS OpenSearch service. Unfortunately this also seems to prevent domain upgrades, since the service tries to make a shallow snapshot first.

As a workaround, we needed to (re-) open and/or delete closed indices to be able to upgrade our domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Snapshots
Projects
Status: Next (Next Quarter)
Development

No branches or pull requests

5 participants