HDDS-13612. Track SST file ranges in sst file metadata#8981
HDDS-13612. Track SST file ranges in sst file metadata#8981swamirishi merged 6 commits intoapache:masterfrom
Conversation
|
This commit makes significant changes to the way Ozone's snapshot local data tracks and serializes SST (Sorted String Table) file information. Here’s a summary of all changes: 1. Refactoring SST File Tracking Structure
2. Class and Code Structure Changes
3. YAML Serialization/Deserialization Improvements
4. API and Method Changes
5. Test Updates
6. Constants and Field Additions
7. Code Clean-Up
Impact and Motivation
In summary: |
Change-Id: I822eb168408b65364bc9e009a5818d7f5dfd907f
Change-Id: I2bc385065589e2ea2ffde0c09ace3ab7d4efb5ce
Change-Id: I4c04ad1f9482b4a4cb26364ed1bcaa93061d62b3 # Conflicts: # pom.xml
Change-Id: Icf245ff856f13ffab73beefc84b743f0de69ef58
Change-Id: I220b088e09d90c7dd86af0f27a0bdfe8243a5ac0
There was a problem hiding this comment.
Pull Request Overview
This PR enhances the snapshot metadata system by tracking SST file key ranges for snapshot diff functionality. The changes replace the previous approach of tracking simple SST file lists with a more detailed structure that includes key ranges and version metadata.
- Introduces SstFileInfo class to track file names, start/end keys, and column families
- Refactors snapshot metadata to use version-based tracking with VersionMeta class
- Implements object pooling for YAML operations to improve performance
Reviewed Changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pom.xml | Added commons-pool2 dependency for object pooling |
| hadoop-ozone/ozone-manager/pom.xml | Added commons-pool2 dependency to ozone-manager module |
| SstFileInfo.java | New class for tracking SST file metadata with key ranges |
| CompactionFileInfo.java | Refactored to extend SstFileInfo base class |
| OmSnapshotLocalData.java | Replaced SST file tracking with VersionMeta structure |
| OmSnapshotLocalDataYaml.java | Added YAML pooling and updated serialization for new metadata structure |
| OmSnapshotManager.java | Added YAML object pooling and updated snapshot creation |
| OzoneConsts.java | Added constants for new YAML field names |
| Test files | Updated tests to accommodate new metadata structure with mocks |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| // Get the YAML representation | ||
| final Yaml yaml = getYamlForSnapshotLocalData(); | ||
| try (UncheckedAutoCloseableSupplier<Yaml> yaml = snapshotManager.getSnapshotLocalYaml()) { |
There was a problem hiding this comment.
The verifyChecksum method should handle the IOException that can be thrown by getSnapshotLocalYaml() instead of letting it propagate as an unchecked exception. Consider wrapping the IOException in a RuntimeException or adding it to the method signature.
| private class RepresentVersionMeta implements Represent { | ||
| @Override | ||
| public Node representData(Object data) { | ||
| VersionMeta meta = (VersionMeta) data; | ||
| Map<String, Object> map = new java.util.LinkedHashMap<>(); | ||
| map.put(OzoneConsts.OM_SLD_VERSION_META_PREV_SNAP_VERSION, meta.getPreviousSnapshotVersion()); | ||
| map.put(OzoneConsts.OM_SLD_VERSION_META_SST_FILES, meta.getSstFiles()); | ||
|
|
||
| if (!isValid) { | ||
| LOG.warn("Checksum verification failed for snapshot local data. " + | ||
| "Stored: {}, Computed: {}", storedChecksum, computedChecksum); | ||
| return representMapping(SNAPSHOT_VERSION_META_TAG, map, DumperOptions.FlowStyle.BLOCK); | ||
| } | ||
| } |
There was a problem hiding this comment.
The cast to VersionMeta on line 159 is unsafe. Consider adding a type check before casting to prevent ClassCastException if an incorrect object type is passed.
| private class RepresentSstFileInfo implements Represent { | ||
| @Override | ||
| public Node representData(Object data) { | ||
| SstFileInfo info = (SstFileInfo) data; | ||
| Map<String, Object> map = new java.util.LinkedHashMap<>(); | ||
| map.put(OzoneConsts.OM_SST_FILE_INFO_FILE_NAME, info.getFileName()); | ||
| map.put(OzoneConsts.OM_SST_FILE_INFO_START_KEY, info.getStartKey()); | ||
| map.put(OzoneConsts.OM_SST_FILE_INFO_END_KEY, info.getEndKey()); | ||
| map.put(OzoneConsts.OM_SST_FILE_INFO_COL_FAMILY, info.getColumnFamily()); | ||
|
|
||
| // Explicitly create a mapping node with the desired tag | ||
| return representMapping(SST_FILE_INFO_TAG, map, DumperOptions.FlowStyle.BLOCK); | ||
| } | ||
| } |
There was a problem hiding this comment.
The cast to SstFileInfo on line 144 is unsafe. Consider adding a type check before casting to prevent ClassCastException if an incorrect object type is passed.
| if (!(o instanceof CompactionFileInfo)) { | ||
| return false; | ||
| } | ||
| return super.equals(o) && pruned == ((CompactionFileInfo)o).pruned; |
There was a problem hiding this comment.
[nitpick] The cast to CompactionFileInfo is unsafe since the type check was performed for CompactionFileInfo. However, consider storing the cast result in a variable for better readability: CompactionFileInfo other = (CompactionFileInfo) o; return super.equals(o) && pruned == other.pruned;
| return super.equals(o) && pruned == ((CompactionFileInfo)o).pruned; | |
| CompactionFileInfo other = (CompactionFileInfo) o; | |
| return super.equals(o) && pruned == other.pruned; |
| * Creates a OmSnapshotLocalData object with default values. | ||
| */ | ||
| public OmSnapshotLocalData(Map<String, Set<String>> uncompactedSSTFileList) { | ||
| public OmSnapshotLocalData(List<LiveFileMetaData> uncompactedSSTFileList, UUID previousSnapshotId) { |
There was a problem hiding this comment.
What is the reason to store previousSnapshotId in the snapshot YAML file when it is already available in the SnapshotInfoTable?
Do we plan to update the previous snapshot ID in the YAML whenever there is a change to the snapshot chain as well i.e., snapshot delete?
There was a problem hiding this comment.
yeah the previousSnapshotId in the snapshot chain could be different. Since we are not updating the previous snapshot version value within the same ratis txn it is impossible to achieve atomicity when done via ratis when we have an independent yaml file.
|
@smengcl @SaketaChalamchala @jojochuang Can you take a look at this patch? |
jojochuang
left a comment
There was a problem hiding this comment.
I don't think i am doing a quality review, but this looks good to me.
| } | ||
| }); | ||
| return sstFileList; | ||
| return store.getDb().getLiveFilesMetaData().stream() |
There was a problem hiding this comment.
I remembered seeing a case where getLiveFilesMetaData() took too long (too many sst files?) and the everything came to a halt. This is used inside OMSnapshotCreateResponse() so if that happens it's going to slow down OM for sure.
But I guess there's no way around it. Perhaps we can add a metric to measure the time spent in snapshot create response handler (do we have that already? not sure) so at least we know what happened.
There was a problem hiding this comment.
Eventually we can move this out from the createSnapshot flow and move this to a background service after HDDS-13628
| // Map of version to compacted SST file list | ||
| // Map<version, Map<Table, sstFileList>> | ||
| private Map<Integer, Map<String, Set<String>>> compactedSSTFileList; | ||
| private UUID previousSnapshotId; |
| // Map of version to compacted SST file list | ||
| // Map<version, Map<Table, sstFileList>> | ||
| private Map<Integer, Map<String, Set<String>>> compactedSSTFileList; | ||
| private UUID previousSnapshotId; |
There was a problem hiding this comment.
The snapshot meta should also track the previous snapshotId meta to know the version built from which snapshot and the corresponding version.
It occurred to be that we should have a defrag log somewhere to keep track of thing happening on each OM.
There was a problem hiding this comment.
I guess we can think about adding some kind of defrag log to a separate file to debug operations similar to the container log file in the datanode
smengcl
left a comment
There was a problem hiding this comment.
Thanks @swamirishi . Structurally lgtm. Comments inline.
Change-Id: Id4b3d6fb8b3138eef663cc35ba8690ae6c6cd5dc
|
thank you for reviewing the patch @jojochuang @smengcl @SaketaChalamchala |
What changes were proposed in this pull request?
Snapshot metadata should track sst file key ranges. These ranges would be used for snapshot diff. The snapshot meta should also track the previous snapshotId meta to know the version built from which snapshot and the corresponding version.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13612
How was this patch tested?
Changed existing unit tests to account for the additional fields