Skip to content

Conversation

@aswinshakil
Copy link
Member

What changes were proposed in this pull request?

  1. SnapshotDeletingService should exit early if the bucket is out of scope and avoid expanding buckets that are out of scope.
  2. SnapshotUtils.getNextActiveSnapshot() throws NoSuchElement exception if the snapshot is deleted in the previous run and OMDoubleBuffer is not flushed yet, so the transactions are not yet persisted.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9636

How was this patch tested?

Existing Tests.

@aswinshakil aswinshakil added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Nov 6, 2023
Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @aswinshakil.

Just curious how you found the issue. It would be great if we can add unit or integration test for it.

String deletedDirKey = deletedDir.getKey();

// Exit for dirs out of snapshot scope.
if (!deletedDirKey.startsWith(dbBucketKeyForDir)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use Table#iterator(prefix)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not aware this existed. Yes, we can do it, but we need to change this in multiple places and test it. I can do it as a part of a larger separate patch.

String tableKey = chainManager.getTableKey(nextPathSnapshot);
SnapshotInfo nextSnapshotInfo =
omSnapshotManager.getSnapshotInfo(tableKey);
UUID nextPathSnapshot =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswinshakil When would getNextActiveSnapshot be called and the current snapshot also deleted in parallel. I guess this should not happen right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called when the current snapshot is deleted. This is to set the deep clean flag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are deleting the current snapshot from the chain then why call this function at the first place, it would return null anyhow right? So do you mean to say we are deleting snapshots which are not deep cleaned? So are we not taking any lock on the portion of the chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #5339, KeyDeletingService runs every 100ms. So even before the 1st request is flushed by the OMDoubleBuffer, The second request(same) comes again. At that time the chain already removed the current deleted snapshot (from the 1st request). When the 2nd request (same as 1st) comes this will throw NoSuchElementException

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @aswinshakil

It would be great if you can add this in description. Or explain how batch delete/purge is caused this failure.

Copy link
Contributor

@prashantpogde prashantpogde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aswinshakil
Copy link
Member Author

Thanks for the review @hemantk-12 @swamirishi @prashantpogde

@aswinshakil aswinshakil merged commit ad24146 into apache:master Nov 14, 2023
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Feb 1, 2024
…d exit loop early in SnapshotDeletingService (apache#5554)

(cherry picked from commit ad24146)
Change-Id: I4dbe9a51b4383730e5146266b1bcf82e3d5ba8b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants