HDDS-13785. Remove orphan versions from SnapshotLocalData Yaml file#9150
HDDS-13785. Remove orphan versions from SnapshotLocalData Yaml file#9150jojochuang merged 155 commits intoapache:masterfrom
Conversation
Change-Id: Iba47aeb21663dfa407ab71339cef02c0d74b49f2
Change-Id: Ifd2feca1fddb144e4955db025f0b15a2ab1f3bfe
Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
…otLocalDataManager Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
Change-Id: I32bcaf2a1fb290f1790c02872a0230cd65586636
Change-Id: I105a2e8178c0444d52de41b99801f4ceb6d57ffd # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/Checksum.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotLocalDataYaml.java
Change-Id: I985170e38fb8beeb784048e85a08a4c79e1aec97
Change-Id: I33e6e6e825bf23c323ad7ed593d800a11720fa4f
Change-Id: If30b2c766db82adde72145c8ecd3e590ef54cc2d
Change-Id: Id3f2c49050bc3476b9e0f5f51dacb6d9acc4c2f7
Change-Id: I432960725b4c6c55aa906b5780cc3027e41e10db
Change-Id: I3c5514e5bbd251a2b5297d8f074cfde5c71fa543
Change-Id: Ib5a9e6c91bdccba17820263c47eaf2c8400e930d
Change-Id: Ica36e0615c7bc6aa9b6a7f6fafafd0f830d4bafb
Change-Id: I26b66f266bb7677e4b1078f5fcd9f2ce3a651a70 # Conflicts: # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: I1d93dbc048a42cc55ff1f8ffa420e52f967527b8
Change-Id: I34202928a7a367dd0a1e57219317ff34de352b78
Change-Id: Iad6f26cb71ec921c51ee2d138745df1a2663533f
Change-Id: Ic5f7e249cfb9cb3973cbcd4abd36b22a6ff8f5aa
…calDataProvider Change-Id: I3a004b4b435075a4348960aeed642e8da71e7e72
Change-Id: I06990bc9ab8fc7e1eb7bec255646a650bd8c35fe
Change-Id: I4c6c61c83aa9fadab8ecef854b99dcc0a89a2208
Change-Id: I0e476322372a302572f1fe79cbf2e874bfeac2ed
Change-Id: I31004e0c95dad64411c6fe848501a82f2f773cba
Change-Id: Id317c8b56e8b25c122b68eaf96599b9690d08f79
Change-Id: I3849387d064e093634e69cdaf870d27c1934cda5 # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: Ie5e5f3dab4324103e8855dd15619d7755f0422e6
Change-Id: I55bd5c3ef7fc32910a9111328638de2edffcd541
Change-Id: I880997d3eebdf378f14c203c61c2d63b2d17552e
Change-Id: I13ba8e2fd012a3c964d657e83496c93a4f55a3be
| int countBeforeCheck = entry.getValue(); | ||
| checkOrphanSnapshotVersions(metadataManager, chainManager, snapshotId); | ||
| decrementOrphanCheckCount(snapshotId, countBeforeCheck); |
There was a problem hiding this comment.
Is the count always decremented to 0?
There was a problem hiding this comment.
No, If there was another update just after the snapshot versions were updated or while the entries were updated then we don't want to decrement it completely to 0. For instance when a snapshot has been purged but it has not flushed because of double buffer thread we know we have to delete it but only after flush so the commit can just increment the value again meaning the orphan check needs to be performed again in some time.
There was a problem hiding this comment.
Pull Request Overview
This PR introduces an orphan snapshot version cleanup mechanism for the Ozone Manager snapshot local data manager. The changes add periodic background scanning to detect and remove unused snapshot versions that are no longer referenced.
- Added a scheduled service to periodically clean up orphan snapshot versions
- Introduced tracking of snapshots that need orphan version checks
- Added logic to delete entire YAML files when all versions are removed
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| AbstractReclaimableFilterTest.java | Updated test to mock OmMetadataManagerImpl instead of interface and add snapshot chain manager mock |
| TestOmSnapshotLocalDataManager.java | Added comprehensive tests for orphan version deletion scenarios and updated all test instantiations to pass new required parameters |
| OmSnapshotLocalDataManager.java | Implemented core orphan cleanup logic with scheduler, tracking map, and version removal during commit |
| OmSnapshotManager.java | Added static utility method to check if a snapshot has been purged |
| OMConfigKeys.java | Added configuration constants for the cleanup service interval |
| ozone-default.xml | Added configuration property for cleanup service interval with 5-minute default |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Show resolved
Hide resolved
| for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap() | ||
| .get(snapshotId).getSnapshotVersions().entrySet()) { |
There was a problem hiding this comment.
Potential NullPointerException if getVersionNodeMap().get(snapshotId) returns null. This can occur if the snapshot is removed from the map between the time it's added to snapshotToBeCheckedForOrphans and when this method executes. Add a null check before iterating.
| for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap() | |
| .get(snapshotId).getSnapshotVersions().entrySet()) { | |
| LocalDataVersionNodeMap versionNodeMap = getVersionNodeMap().get(snapshotId); | |
| if (versionNodeMap == null) { | |
| // The snapshotId is no longer present; skip orphan check for this snapshot. | |
| return; | |
| } | |
| for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : versionNodeMap.getSnapshotVersions().entrySet()) { |
There was a problem hiding this comment.
This cannot happen since this happens inside a write lock. The inmemory structure would always be same as the file
There was a problem hiding this comment.
are you sure about this? Looks like write lock is held later inside commit() --> upsert(). Right at here there's no lock.
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Outdated
Show resolved
Hide resolved
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Outdated
Show resolved
Hide resolved
| // If the transactionInfo set this means the snapshot has been purged and the entire yaml file could have | ||
| // become an orphan if the version is also updated it | ||
| // could mean that there could be some orphan version present within the | ||
| // same snapshot. |
There was a problem hiding this comment.
The comment has unclear grammar and structure. Consider rewording to: 'If the transactionInfo is set, this means the snapshot has been purged and the entire YAML file could have become an orphan. If the version is also updated, it could mean that there are orphan versions present within the same snapshot.'
| // If the transactionInfo set this means the snapshot has been purged and the entire yaml file could have | |
| // become an orphan if the version is also updated it | |
| // could mean that there could be some orphan version present within the | |
| // same snapshot. | |
| // If the transactionInfo is set, this means the snapshot has been purged and the entire YAML file could have become an orphan. | |
| // If the version is also updated, it could mean that there are orphan versions present within the same snapshot. |
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Show resolved
Hide resolved
| assertTrue(localDataManager.getSnapshotToBeCheckedForOrphans().containsKey(secondSnapId)); | ||
| localDataManager.checkOrphanSnapshotVersions(omMetadataManager, null, secondSnapId); | ||
| if (purgeSnapshot) { | ||
| NoSuchFileException e = assertThrows(NoSuchFileException.class, |
There was a problem hiding this comment.
Variable 'NoSuchFileException e' is never read.
| NoSuchFileException e = assertThrows(NoSuchFileException.class, | |
| assertThrows(NoSuchFileException.class, |
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Show resolved
Hide resolved
Change-Id: I39e1d4a1f504241a2ebbfb5b36c84e887512c8ed
Change-Id: I8debbcb14edd88ad0542fbc5d1f5c9152788016c
Change-Id: I6870b5263d104e4179dbc13b24bd923238ff9171
Change-Id: I5fb8bcc7beb9941e67df0d02a40b09e30a4c3880
Change-Id: I30f586045a567b0e0e3200a449d7627d96f59dd5
Change-Id: Ia13da4aa1fb25768b0dac64fd4203ed1ca596284
Change-Id: I88f541c712d847515f6ef5b1d094cc5168936966
Change-Id: I71e2210c6398aa9b64f93137bb5655e1c41026e3
Change-Id: I5541c7f931bb0b60af238102b5afc8086be53a6c
Change-Id: I7077460d4b9d87d460244e3df51d51f078a83970
Change-Id: If9cd8e82083f90997d7ce052408b33648d9b2c2a
Change-Id: I011b0b80bc646c94b422ab808935de4fbee2bf30 # Conflicts: # hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMDbCheckpointServletInodeBasedXfer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java
Change-Id: Ia02449ab5237ddb6021fa3a62ed66290c1dff83d
Change-Id: Iba53e903a7d656cadb9944505a4371a8a252eeaa # Conflicts: # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestOmSnapshotLocalDataManager.java
Change-Id: I93602a49fc217d6167ed9629b6ff3ffcaccc8afa
jojochuang
left a comment
There was a problem hiding this comment.
posting a few comments.
I think there are a few corner cases/ potential race condition that need further scrutiny.
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Show resolved
Hide resolved
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Show resolved
Hide resolved
|
|
||
| private void upsertNode(UUID snapshotId, SnapshotVersionsMeta snapshotVersions) throws IOException { | ||
| private void upsertNode(UUID snapshotId, SnapshotVersionsMeta snapshotVersions, | ||
| boolean transactionInfoSet) throws IOException { |
There was a problem hiding this comment.
the variable name 'transactionInfoSet' does not convey what it is for. I wasn't able to tell what it's for until I read further down in the comments.
| localDataGraph.putEdge(predecessor, entry.getValue()); | ||
| } | ||
| } | ||
| if (existingSnapVersions != null) { |
There was a problem hiding this comment.
i don't like that this method does two very different tasks, which is hard to tell what exactly it does by looking at the method name. please consider refactor this block of code out to another method, call it something like 'scheduleOrphanMetadataCheck'
There was a problem hiding this comment.
yeah sure makes sense let me work on this
| super.snapshotId, snapshotLocalDataFile.getAbsolutePath()); | ||
| if (!snapshotLocalDataFile.delete()) { | ||
| throw new IOException("Unable to delete file " + snapshotLocalDataFile.getAbsolutePath()); | ||
| } |
There was a problem hiding this comment.
else: clean: no prior snapshot versions and no local meta file. nothing to do here.
| for (Map.Entry<Integer, LocalDataVersionNode> integerLocalDataVersionNodeEntry : getVersionNodeMap() | ||
| .get(snapshotId).getSnapshotVersions().entrySet()) { |
There was a problem hiding this comment.
are you sure about this? Looks like write lock is held later inside commit() --> upsert(). Right at here there's no lock.
|
Change-Id: Ied3c728c1f566bdd9327b6b5340892bed993cbf6
Change-Id: I1105c70ad060e8a344d042eadc3fa65b1469b716
Change-Id: Idc77f8a9a9a6cecb43b69d5532cf1bb2c679ce78
|
Merged. Thanks @swamirishi |
What changes were proposed in this pull request?
Snapshot Versions which don't have any predecessors can be safely removed since the version's value won't be used and it can just add more data to be stored unncecessarily in the yaml file increasing serialization and derserialization time.
Depends on #9193 & #9140
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13785
How was this patch tested?
Adding additional unit tests