HDDS-13867. SnapshotDiff delta file computation should happen based on LocalDataYaml file#9268
HDDS-13867. SnapshotDiff delta file computation should happen based on LocalDataYaml file#9268swamirishi merged 186 commits intoapache:masterfrom
Conversation
Change-Id: Iba47aeb21663dfa407ab71339cef02c0d74b49f2
Change-Id: Ifd2feca1fddb144e4955db025f0b15a2ab1f3bfe
Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
…otLocalDataManager Change-Id: I34536ff06efb7d5a4942853f0fd83942ab398b5f
Change-Id: I32bcaf2a1fb290f1790c02872a0230cd65586636
Change-Id: I105a2e8178c0444d52de41b99801f4ceb6d57ffd # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/Checksum.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOmSnapshotLocalDataYaml.java
Change-Id: I985170e38fb8beeb784048e85a08a4c79e1aec97
Change-Id: I33e6e6e825bf23c323ad7ed593d800a11720fa4f
Change-Id: If30b2c766db82adde72145c8ecd3e590ef54cc2d
Change-Id: Id3f2c49050bc3476b9e0f5f51dacb6d9acc4c2f7
Change-Id: I432960725b4c6c55aa906b5780cc3027e41e10db
Change-Id: I3c5514e5bbd251a2b5297d8f074cfde5c71fa543
Change-Id: Ib5a9e6c91bdccba17820263c47eaf2c8400e930d
Change-Id: Ica36e0615c7bc6aa9b6a7f6fafafd0f830d4bafb
Change-Id: I26b66f266bb7677e4b1078f5fcd9f2ce3a651a70 # Conflicts: # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: I1d93dbc048a42cc55ff1f8ffa420e52f967527b8
Change-Id: I34202928a7a367dd0a1e57219317ff34de352b78
Change-Id: Iad6f26cb71ec921c51ee2d138745df1a2663533f
Change-Id: Ic5f7e249cfb9cb3973cbcd4abd36b22a6ff8f5aa
…calDataProvider Change-Id: I3a004b4b435075a4348960aeed642e8da71e7e72
Change-Id: I06990bc9ab8fc7e1eb7bec255646a650bd8c35fe
Change-Id: I4c6c61c83aa9fadab8ecef854b99dcc0a89a2208
Change-Id: I0e476322372a302572f1fe79cbf2e874bfeac2ed
Change-Id: I31004e0c95dad64411c6fe848501a82f2f773cba
Change-Id: Id317c8b56e8b25c122b68eaf96599b9690d08f79
Change-Id: I3849387d064e093634e69cdaf870d27c1934cda5 # Conflicts: # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/ObjectSerializer.java # hadoop-hdds/common/src/main/java/org/apache/hadoop/ozone/util/YamlSerializer.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalData.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotLocalDataYaml.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/OmSnapshotLocalDataManager.java
Change-Id: Ie5e5f3dab4324103e8855dd15619d7755f0422e6
Change-Id: I55bd5c3ef7fc32910a9111328638de2edffcd541
Change-Id: I880997d3eebdf378f14c203c61c2d63b2d17552e
Change-Id: I13ba8e2fd012a3c964d657e83496c93a4f55a3be
30b5293 to
151e887
Compare
…nable to resolve to a previous snapshot id Change-Id: I9382a816415880596534060a841b2840fc5c2b1f
Change-Id: Icf0be64110f56a095dff6954fe3af281873bfc49
Change-Id: I98a74e0d6815f28792096f0992e1aca933706963
Change-Id: I85d6072d0826ccc7f81a2579af307bfad4f049a7
Change-Id: Ibcc1fbd3e44d8b2f49afd83d5222c8d76ec51c6a
Change-Id: I06301f780ff7fff2ab652722872b1707501d61ec
Change-Id: Ia4eeab3598fa78fc2606e6efb9a30a12b6a944c3
Change-Id: Ibef23601b26e5e7062a9cf3a6f0a9c5af4862bb0 # Conflicts: # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Change-Id: Ic5ea1918e91b86a0b2c410bfc1b4c4206858a2f1
Change-Id: Ic1a5ef57f528fa5e6be0a35b603943666d717ac3
Change-Id: I13eb719258714697f875d0b64d38e1185ba75db1
Change-Id: I8e93e6ec779ef027edd8125fe8cd33e84e49ba95
Change-Id: Id465f7d04daeb61f3638a5345d58fb95ebdfb223 # Conflicts: # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdb/util/RdbUtil.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/DifferSnapshotInfo.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDiffUtils.java # hadoop-hdds/rocksdb-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDBCheckpointDiffer.java # hadoop-hdds/rocksdb-checkpoint-differ/src/test/java/org/apache/ozone/rocksdiff/TestRocksDiffUtils.java # hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/freon/TestOMSnapshotDAG.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmSnapshotManager.java # hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java # hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/snapshot/TestSnapshotDiffManager.java
Change-Id: I450d29c5871cc8bb84a067651c6802b74bcde8bd
Change-Id: I83fe0a362530c9aeddee5f959b135a7b282e617f # Conflicts: # hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/utils/db/StringInMemoryTestTable.java # hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the SnapshotDiff delta file computation to use information from LocalDataYaml files instead of directly reading from RocksDB. This architectural change improves efficiency and aligns with the snapshot management design.
Key changes:
- Refactored
DifferSnapshotInfoto use SST file information from YAML metadata instead of direct RocksDB access - Updated
RocksDBCheckpointDifferto work withSstFileInfoobjects and introducedDifferSnapshotVersionwrapper class - Modified
SnapshotDiffManagerto useOmSnapshotLocalDataManagerfor retrieving snapshot metadata
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
SnapshotDiffManager.java |
Updated to retrieve snapshot metadata from LocalDataManager and pass version mapping to differ; refactored delta file computation to use SstFileInfo objects |
DifferSnapshotInfo.java |
Completely refactored to store SST file information from YAML metadata instead of RocksDB references; changed from database path string to version-based path function |
RocksDBCheckpointDiffer.java |
Added DifferSnapshotVersion inner class; removed direct RocksDB file reading methods; updated diff computation to work with SstFileInfo maps instead of file name sets |
RocksDiffUtils.java |
Simplified filtering methods to work directly with SstFileInfo objects; removed RocksDB-dependent filtering logic |
RdbUtil.java |
Updated utility methods to return SstFileInfo objects instead of file path strings |
SstFileInfo.java |
Added getFilePath() helper method to construct full file paths |
OmSnapshotManager.java |
Updated SnapshotDiffManager instantiation to include LocalDataManager dependency |
TestSnapshotDiffManager.java |
Updated test mocks to work with new SstFileInfo-based APIs and added mock for LocalDataManager |
TestOMSnapshotDAG.java |
Refactored test to use LocalDataManager for snapshot metadata; updated differ API calls with new parameters |
TestRocksDiffUtils.java |
Updated tests to work with Map-based SstFileInfo filtering instead of Set-based string filtering |
TestRocksDBCheckpointDiffer.java |
Extensively refactored tests to work with SstFileInfo objects and DifferSnapshotVersion wrapper; removed RocksDB mocking |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| internalGetSSTDiffList(src, dest, fwdDAGSameFiles, fwdDAGDifferentFiles); | ||
| } else { | ||
| Set<SstFileInfo> srcSstFileInfos = new HashSet<>(src.getSstFileMap().values()); | ||
| Set<SstFileInfo> destSstFileInfos = new HashSet<>(src.getSstFileMap().values()); |
There was a problem hiding this comment.
The method getSSTDiffList is using src.getSstFileMap().values() for both srcSstFileInfos and destSstFileInfos on line 832. This appears to be a bug as it should be using dest.getSstFileMap().values() for the destination snapshot.
This will cause the diff computation to incorrectly compare the source snapshot against itself, resulting in inaccurate delta files.
| Set<SstFileInfo> destSstFileInfos = new HashSet<>(src.getSstFileMap().values()); | |
| Set<SstFileInfo> destSstFileInfos = new HashSet<>(dest.getSstFileMap().values()); |
| return generation; | ||
| } | ||
|
|
||
| private Set<String> getSstFiles() { |
There was a problem hiding this comment.
[nitpick] The variable name getSstFiles returns a Set<String> but reads more like it should return file objects. Consider renaming to getSstFileNames for better clarity about what is being returned.
| private Set<String> getSstFiles() { | |
| private Set<String> getSstFileNames() { |
There was a problem hiding this comment.
This is not the sstFileName but just the id
| String sstFilesDirForSnapDiffJob) { | ||
| DifferSnapshotInfo dest, Map<Integer, Integer> versionMap, TablePrefixInfo prefixInfo, | ||
| Set<String> tablesToLookup, String sstFilesDirForSnapDiffJob) throws IOException { | ||
| int srcVersion = src.getMaxVersion(); |
There was a problem hiding this comment.
The method getMaxVersion() on line 75 returns Integer which can be null if the versionSstFiles map is empty. However, the method is called on line 783 without null checking, which could result in a NullPointerException.
Although line 373-375 in SnapshotDiffManager.java includes an empty check, it's better to make getMaxVersion() return int and throw an exception internally if the map is empty, or document that it may return null and handle it at all call sites.
| private Path getDbPath() { | ||
| return dbPath; | ||
| } | ||
|
|
||
| private long getGeneration() { | ||
| return generation; | ||
| } | ||
|
|
||
| private Set<String> getSstFiles() { | ||
| return sstFiles.keySet(); | ||
| } | ||
|
|
||
| private Map<String, SstFileInfo> getSstFileMap() { |
There was a problem hiding this comment.
[nitpick] The newly added DifferSnapshotVersion class has private getter methods (getDbPath(), getGeneration(), getSstFiles(), getSstFileMap()) on lines 907-921. These private methods are only accessed within the RocksDBCheckpointDiffer class. Consider whether these should be package-private or if the encapsulation is appropriate for this inner/nested class design.
| private Path getDbPath() { | |
| return dbPath; | |
| } | |
| private long getGeneration() { | |
| return generation; | |
| } | |
| private Set<String> getSstFiles() { | |
| return sstFiles.keySet(); | |
| } | |
| private Map<String, SstFileInfo> getSstFileMap() { | |
| Path getDbPath() { | |
| return dbPath; | |
| } | |
| long getGeneration() { | |
| return generation; | |
| } | |
| Set<String> getSstFiles() { | |
| return sstFiles.keySet(); | |
| } | |
| Map<String, SstFileInfo> getSstFileMap() { |
There was a problem hiding this comment.
private functions is good for private class
| private final String dbPath; | ||
| private final UUID snapshotId; | ||
| private final long snapshotGeneration; | ||
| private final UUID id; |
There was a problem hiding this comment.
[nitpick] The variable name id on line 37 in DifferSnapshotInfo is too generic. Since this represents a snapshot identifier, it should be renamed to snapshotId to match the original naming and improve code clarity.
| deltaFiles.addAll(inputFiles); | ||
| Set<SstFileInfo> inputFiles = filterRelevantSstFiles(getSSTFileListForSnapshot(fromSnapshot, tablesToLookUp), | ||
| tablesToLookUp, tablePrefixes); | ||
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); |
There was a problem hiding this comment.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| // Avoid using getDbLocation() annotated with @VisibleForTesting. | |
| // Use a public API to get the DB path, or refactor to pass it in. | |
| Path fromSnapshotPath = Paths.get(fromSnapshot.getSnapshotPath()); |
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | ||
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); |
There was a problem hiding this comment.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDatabasePath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDatabasePath(); |
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | ||
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); |
There was a problem hiding this comment.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = Paths.get(fromSnapshot.getMetadataManager().getStore().getDbPath()); | |
| Path toSnapshotPath = Paths.get(toSnapshot.getMetadataManager().getStore().getDbPath()); |
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | ||
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); |
There was a problem hiding this comment.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = ((RDBStore) fromSnapshot.getMetadataManager().getStore()).getDbPath(); | |
| Path toSnapshotPath = ((RDBStore) toSnapshot.getMetadataManager().getStore()).getDbPath(); |
There was a problem hiding this comment.
Unnecessary change
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | ||
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); |
There was a problem hiding this comment.
Access of element annotated with VisibleForTesting found in production code.
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbLocation().getAbsoluteFile().toPath(); | |
| Path fromSnapshotPath = fromSnapshot.getMetadataManager().getStore().getDbPath().toAbsolutePath(); | |
| Path toSnapshotPath = toSnapshot.getMetadataManager().getStore().getDbPath().toAbsolutePath(); |
Change-Id: I1d7aded575ac6daaa5f78b3be9f1821d94cb24c6
jojochuang
left a comment
There was a problem hiding this comment.
I roughly follows what this is for, but the scope of change is very big.
As far as I can tell, it should not have visible change to end users. All changes are needed to convert the use of snapshot rocksdb to LocalDataYaml.
There are also refactors that shouldn't need to come in through the same PR. But apart from a few javadoc suggestion it looks fine from a quick look.
| return RdbUtil.getSSTFilesForComparison(((RDBStore)snapshot | ||
| .getMetadataManager().getStore()).getDb().getManagedRocksDb(), | ||
| tablesToLookUp); | ||
| protected Set<SstFileInfo> getSSTFileListForSnapshot(OmSnapshot snapshot, Set<String> tablesToLookUp) { |
There was a problem hiding this comment.
it's actually a set, not a list.
| */ | ||
| private String getSSTFullPath(String sstFilenameWithoutExtension, | ||
| String... dbPaths) { | ||
| private String getSSTFullPath(String sstFilenameWithoutExtension, Path... dbPaths) { |
There was a problem hiding this comment.
the update to this helper method is not relevant to this PR and it's semantically the same as before.
There was a problem hiding this comment.
yeah I just included the same in this PR since this would be really small change for another PR
| * | ||
| * @param src source snapshot | ||
| * @param dest destination snapshot | ||
| * @param tablesToLookup tablesToLookup set of table (column family) names used to restrict which SST files to return. |
There was a problem hiding this comment.
please update javadocs.
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
| .map(i -> i.stream().map(SstFileInfo::getFileName).sorted().collect(Collectors.toList())).orElse(null)); | ||
| } catch (RuntimeException rtEx) { | ||
| if (!expectingException) { | ||
| rtEx.printStackTrace(); |
There was a problem hiding this comment.
do not print to stdout.
|
|
||
| @ParameterizedTest | ||
| @MethodSource("values") | ||
| public void testFilterRelevantSstFilesFromDB(String validSSTColumnFamilyName, |
There was a problem hiding this comment.
why remove the test case?
There was a problem hiding this comment.
this test case is not valid anymore. There is only a filter function which doesn't read anything from the db anymore
Change-Id: Iba0dd30952d17d11eae900213437aa4c28564448
Change-Id: Idba640c859ed4906842f9fafebf04825409053ef
Some of the refactoring changes are unavoidable |
jojochuang
left a comment
There was a problem hiding this comment.
Looks good to me. Just a few javadoc suggestions. A few copilot comments look legit.
I'm okay to merge once the CI passes.
| /** | ||
| * Filter sst files based on prefixes. | ||
| */ | ||
| public static <T> Map<T, SstFileInfo> filterRelevantSstFiles(Map<T, SstFileInfo> inputFiles, |
There was a problem hiding this comment.
note: like the previous method, the parameter inputFiles is not just an input: the map can be mutated.
There was a problem hiding this comment.
changed the variable name and added a note in the javadoc as well
| * @param prefixInfo TablePrefixInfo to filter irrelevant SST files; can be null. | ||
| * @param tablesToLookup tablesToLookup Set of column-family (table) names to include when reading SST files; | ||
| * must be non-null. | ||
| * @return A list of SST files without extension. e.g. ["000050", "000060"] |
There was a problem hiding this comment.
| * @return A list of SST files without extension. e.g. ["000050", "000060"] | |
| * @param useCompactionDag whether to consult the compaction DAG to compute the SST diff. | |
| * If true, the method uses compaction history to produce an | |
| * incremental diff; if false, a full (conservative) diff will be returned. | |
| * @return A list of SST files without extension. e.g. ["000050", "000060"] |
| Optional<List<SstFileInfo>> sstDiffList = getSSTDiffList(srcSnapshotVersion, destSnapshotVersion, prefixInfo, | ||
| tablesToLookup, srcVersion == 0); |
There was a problem hiding this comment.
| Optional<List<SstFileInfo>> sstDiffList = getSSTDiffList(srcSnapshotVersion, destSnapshotVersion, prefixInfo, | |
| tablesToLookup, srcVersion == 0); | |
| // if the source snapshot is the first version (srcVersion == 0), uses the compaction DAG path; | |
| // otherwise, uses the fallback set-comparison path. | |
| boolean useCompactionDag = srcVersion == 0; | |
| Optional<List<SstFileInfo>> sstDiffList = getSSTDiffList(srcSnapshotVersion, destSnapshotVersion, prefixInfo, | |
| tablesToLookup, useCompactionDag); |
Change-Id: Icac0512a6b5663cfda6060b876dd9a331352481a
|
thank you @jojochuang for reviewing the PR |
What changes were proposed in this pull request?
SnapshotDiff delta file computation should happen based on LocalDataYaml file instead of reading the rocksdb.
Changing rocksDBCheckpointDiffer to get the SstFileInfos from yaml file to compute the diff directly.
Depends on #9235 and #9269
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13867
How was this patch tested?
Additional unit tests