Skip to content

HDDS-13785. Remove orphan versions from SnapshotLocalData Yaml file#9150

Merged
jojochuang merged 155 commits intoapache:masterfrom
swamirishi:HDDS-13785
Nov 6, 2025
Merged

HDDS-13785. Remove orphan versions from SnapshotLocalData Yaml file#9150
jojochuang merged 155 commits intoapache:masterfrom
swamirishi:HDDS-13785

Conversation

@swamirishi
Copy link
Contributor

@swamirishi swamirishi commented Oct 14, 2025

What changes were proposed in this pull request?

Snapshot Versions which don't have any predecessors can be safely removed since the version's value won't be used and it can just add more data to be stored unncecessarily in the yaml file increasing serialization and derserialization time.
Depends on #9193 & #9140

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13785

How was this patch tested?

Adding additional unit tests

Change-Id: Iba47aeb21663dfa407ab71339cef02c0d74b49f2
Change-Id: Ifd2feca1fddb144e4955db025f0b15a2ab1f3bfe
Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
…otLocalDataManager

Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
Change-Id: I32bcaf2a1fb290f1790c02872a0230cd65586636
Change-Id: I105a2e8178c0444d52de41b99801f4ceb6d57ffd

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/Checksum.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotLocalDataYaml.java
Change-Id: I985170e38fb8beeb784048e85a08a4c79e1aec97
Change-Id: I33e6e6e825bf23c323ad7ed593d800a11720fa4f
Change-Id: If30b2c766db82adde72145c8ecd3e590ef54cc2d
Change-Id: Id3f2c49050bc3476b9e0f5f51dacb6d9acc4c2f7
Change-Id: I432960725b4c6c55aa906b5780cc3027e41e10db
Change-Id: I3c5514e5bbd251a2b5297d8f074cfde5c71fa543
Change-Id: Ib5a9e6c91bdccba17820263c47eaf2c8400e930d
Change-Id: Ica36e0615c7bc6aa9b6a7f6fafafd0f830d4bafb
Change-Id: I26b66f266bb7677e4b1078f5fcd9f2ce3a651a70

# Conflicts:
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: I1d93dbc048a42cc55ff1f8ffa420e52f967527b8
Change-Id: I34202928a7a367dd0a1e57219317ff34de352b78
Change-Id: Iad6f26cb71ec921c51ee2d138745df1a2663533f
Change-Id: Ic5f7e249cfb9cb3973cbcd4abd36b22a6ff8f5aa
…calDataProvider

Change-Id: I3a004b4b435075a4348960aeed642e8da71e7e72
Change-Id: I06990bc9ab8fc7e1eb7bec255646a650bd8c35fe
Change-Id: I4c6c61c83aa9fadab8ecef854b99dcc0a89a2208
Change-Id: I0e476322372a302572f1fe79cbf2e874bfeac2ed
Change-Id: I31004e0c95dad64411c6fe848501a82f2f773cba
Change-Id: Id317c8b56e8b25c122b68eaf96599b9690d08f79
Change-Id: I3849387d064e093634e69cdaf870d27c1934cda5

# Conflicts:
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java
#	hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: Ie5e5f3dab4324103e8855dd15619d7755f0422e6
Change-Id: I55bd5c3ef7fc32910a9111328638de2edffcd541
Change-Id: I880997d3eebdf378f14c203c61c2d63b2d17552e
Change-Id: I13ba8e2fd012a3c964d657e83496c93a4f55a3be
Comment on lines +338 to +340
int countBeforeCheck = entry.getValue();
checkOrphanSnapshotVersions(metadataManager, chainManager, snapshotId);
decrementOrphanCheckCount(snapshotId, countBeforeCheck);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the count always decremented to 0?

Copy link
Contributor Author

@swamirishi swamirishi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, If there was another update just after the snapshot versions were updated or while the entries were updated then we don't want to decrement it completely to 0. For instance when a snapshot has been purged but it has not flushed because of double buffer thread we know we have to delete it but only after flush so the commit can just increment the value again meaning the orphan check needs to be performed again in some time.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an orphan snapshot version cleanup mechanism for the Ozone Manager snapshot local data manager. The changes add periodic background scanning to detect and remove unused snapshot versions that are no longer referenced.

  • Added a scheduled service to periodically clean up orphan snapshot versions
  • Introduced tracking of snapshots that need orphan version checks
  • Added logic to delete entire YAML files when all versions are removed

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
AbstractReclaimableFilterTest.java Updated test to mock OmMetadataManagerImpl instead of interface and add snapshot chain manager mock
TestOmSnapshotLocalDataManager.java Added comprehensive tests for orphan version deletion scenarios and updated all test instantiations to pass new required parameters
OmSnapshotLocalDataManager.java Implemented core orphan cleanup logic with scheduler, tracking map, and version removal during commit
OmSnapshotManager.java Added static utility method to check if a snapshot has been purged
OMConfigKeys.java Added configuration constants for the cleanup service interval
ozone-default.xml Added configuration property for cleanup service interval with 5-minute default

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +352 to +353
for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap()
.get(snapshotId).getSnapshotVersions().entrySet()) {
Copy link

Copilot AI Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NullPointerException if getVersionNodeMap().get(snapshotId) returns null. This can occur if the snapshot is removed from the map between the time it's added to snapshotToBeCheckedForOrphans and when this method executes. Add a null check before iterating.

Suggested change
for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap()
.get(snapshotId).getSnapshotVersions().entrySet()) {
LocalDataVersionNodeMap versionNodeMap = getVersionNodeMap().get(snapshotId);
if (versionNodeMap == null) {
// The snapshotId is no longer present; skip orphan check for this snapshot.
return;
}
for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : versionNodeMap.getSnapshotVersions().entrySet()) {

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@swamirishi swamirishi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot happen since this happens inside a write lock. The inmemory structure would always be same as the file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about this? Looks like write lock is held later inside commit() --> upsert(). Right at here there's no lock.

Comment on lines 860 to 863
// If the transactionInfo set this means the snapshot has been purged and the entire yaml file could have
// become an orphan if the version is also updated it
// could mean that there could be some orphan version present within the
// same snapshot.
Copy link

Copilot AI Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has unclear grammar and structure. Consider rewording to: 'If the transactionInfo is set, this means the snapshot has been purged and the entire YAML file could have become an orphan. If the version is also updated, it could mean that there are orphan versions present within the same snapshot.'

Suggested change
// If the transactionInfo set this means the snapshot has been purged and the entire yaml file could have
// become an orphan if the version is also updated it
// could mean that there could be some orphan version present within the
// same snapshot.
// If the transactionInfo is set, this means the snapshot has been purged and the entire YAML file could have become an orphan.
// If the version is also updated, it could mean that there are orphan versions present within the same snapshot.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment

assertTrue(localDataManager.getSnapshotToBeCheckedForOrphans().containsKey(secondSnapId));
localDataManager.checkOrphanSnapshotVersions(omMetadataManager, null, secondSnapId);
if (purgeSnapshot) {
NoSuchFileException e = assertThrows(NoSuchFileException.class,
Copy link

Copilot AI Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable 'NoSuchFileException e' is never read.

Suggested change
NoSuchFileException e = assertThrows(NoSuchFileException.class,
assertThrows(NoSuchFileException.class,

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Change-Id: I39e1d4a1f504241a2ebbfb5b36c84e887512c8ed
Change-Id: I8debbcb14edd88ad0542fbc5d1f5c9152788016c
Change-Id: I6870b5263d104e4179dbc13b24bd923238ff9171
Change-Id: I5fb8bcc7beb9941e67df0d02a40b09e30a4c3880
Change-Id: I30f586045a567b0e0e3200a449d7627d96f59dd5
Change-Id: Ia13da4aa1fb25768b0dac64fd4203ed1ca596284
Change-Id: I88f541c712d847515f6ef5b1d094cc5168936966
Change-Id: I71e2210c6398aa9b64f93137bb5655e1c41026e3
Change-Id: I5541c7f931bb0b60af238102b5afc8086be53a6c
Change-Id: I7077460d4b9d87d460244e3df51d51f078a83970
Change-Id: If9cd8e82083f90997d7ce052408b33648d9b2c2a
Change-Id: I011b0b80bc646c94b422ab808935de4fbee2bf30

# Conflicts:
#	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMDbCheckpointServletInodeBasedXfer.java
#	hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java
Change-Id: Ia02449ab5237ddb6021fa3a62ed66290c1dff83d
Change-Id: Iba53e903a7d656cadb9944505a4371a8a252eeaa

# Conflicts:
#	hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestOmSnapshotLocalDataManager.java
Change-Id: I93602a49fc217d6167ed9629b6ff3ffcaccc8afa
@swamirishi swamirishi marked this pull request as ready for review November 3, 2025 19:10
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posting a few comments.
I think there are a few corner cases/ potential race condition that need further scrutiny.


private void upsertNode(UUID snapshotId, SnapshotVersionsMeta snapshotVersions) throws IOException {
private void upsertNode(UUID snapshotId, SnapshotVersionsMeta snapshotVersions,
boolean transactionInfoSet) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name 'transactionInfoSet' does not convey what it is for. I wasn't able to tell what it's for until I read further down in the comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don

localDataGraph.putEdge(predecessor, entry.getValue());
}
}
if (existingSnapVersions != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like that this method does two very different tasks, which is hard to tell what exactly it does by looking at the method name. please consider refactor this block of code out to another method, call it something like 'scheduleOrphanMetadataCheck'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure makes sense let me work on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

super.snapshotId, snapshotLocalDataFile.getAbsolutePath());
if (!snapshotLocalDataFile.delete()) {
throw new IOException("Unable to delete file " + snapshotLocalDataFile.getAbsolutePath());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else: clean: no prior snapshot versions and no local meta file. nothing to do here.

Comment on lines +352 to +353
for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap()
.get(snapshotId).getSnapshotVersions().entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure about this? Looks like write lock is held later inside commit() --> upsert(). Right at here there's no lock.

@swamirishi
Copy link
Contributor Author

are you sure about this? Looks like write lock is held later inside commit() --> upsert(). Right at here there's no lock.
Oh the write lock is just for synchronizing reads and writes on the graph data structure. The write lock is already held while opening the WritableProvider

Change-Id: Ied3c728c1f566bdd9327b6b5340892bed993cbf6
Change-Id: I1105c70ad060e8a344d042eadc3fa65b1469b716
Change-Id: Idc77f8a9a9a6cecb43b69d5532cf1bb2c679ce78
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jojochuang jojochuang merged commit d85440b into apache:master Nov 6, 2025
43 checks passed
@jojochuang
Copy link
Contributor

Merged. Thanks @swamirishi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants