HDDS-13849. Refactor getTablePrefix function in SnapshotDiff flow#9235
HDDS-13849. Refactor getTablePrefix function in SnapshotDiff flow#9235swamirishi merged 8 commits intoapache:masterfrom
Conversation
Change-Id: If068daeb7ca9bf3234614a29da90edcb9fb5778d
Change-Id: If064b64d1dfd624324090a946a33eab55535f28f
Change-Id: I3ed6575a96a12c349bb229fb3772c256a986f242
Change-Id: Iaddc93f6dc50a34dd5ac3eaac92b7400f4856ccf
Change-Id: I1eb91064f806d5cd5393c74597c727e41eb11742
|
@SaketaChalamchala can you review this |
Change-Id: I98a74e0d6815f28792096f0992e1aca933706963
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the snapshot diff flow by moving table prefix computation from a utility class to the metadata manager implementation level. The key improvement is eliminating redundant SST file filtering operations by centralizing table prefix management and changing the representation from Map<String, String> to a dedicated TablePrefixInfo class.
Key Changes:
- Introduced
TablePrefixInfoclass to encapsulate table-to-prefix mappings - Moved table prefix computation from
SnapshotUtils.getColumnFamilyToKeyPrefixMap()toOMMetadataManager.getTableBucketPrefix() - Updated SST filtering methods to accept
Set<String>instead ofList<String>for table lookups - Consolidated duplicate table cleanup methods in
OmSnapshotManager
Reviewed Changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
TablePrefixInfo.java |
New class encapsulating table prefix mappings with immutable map |
OMMetadataManager.java |
Added interface methods for computing table bucket prefixes |
OmMetadataManagerImpl.java |
Implemented table prefix computation with switch-based lookup |
SnapshotDiffManager.java |
Updated to use TablePrefixInfo and Set<String> for table lookups |
RocksDiffUtils.java |
Refactored SST filtering to use TablePrefixInfo and table sets |
RocksDBCheckpointDiffer.java |
Updated DAG traversal to filter by table set |
KeyManagerImpl.java |
Modified to use table-based prefix lookup |
OmSnapshotManager.java |
Consolidated three separate table cleanup methods into one |
| Test files | Updated mocks and assertions to use new types |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| LOG.info("{} actual row count={}, expectedCount={}", table.getName(), | ||
| count.get(), expectedCount); | ||
| }); | ||
| System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount); |
There was a problem hiding this comment.
Debug print statement should be removed before merging to production. Use the existing LOG.info statement instead or remove entirely.
| System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount); | |
| // Debug print statement removed; information is already logged above. |
| public static String getFirstNChars(String str, int n) { | ||
| if (str == null || str.length() < n) { | ||
| return str; | ||
| } | ||
| return str.substring(0, n); | ||
| } |
There was a problem hiding this comment.
The method returns the original string when str.length() < n, but should return the full string only when it's shorter. When str is null, returning null may cause NullPointerExceptions in callers. Consider returning an empty string for null input or documenting this behavior clearly.
| return false; | ||
| } | ||
|
|
||
| // NOTE: Update both getTableBucketPrefixInfo(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket) |
There was a problem hiding this comment.
The comment references 'getTableBucketPrefixInfo' but the actual method name is 'getTableBucketPrefix' (without 'Info'). Update the comment to match the actual method signature.
| // NOTE: Update both getTableBucketPrefixInfo(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket) | |
| // NOTE: Update both getTableBucketPrefix(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket) |
| case DELETED_DIR_TABLE: | ||
| case OPEN_FILE_TABLE: | ||
| return getBucketKeyPrefixFSO(volume, bucket); | ||
| default: |
There was a problem hiding this comment.
Returning an empty string as default for unrecognized table names could silently mask configuration errors. Consider logging a warning when an unknown table name is encountered to aid debugging.
| default: | |
| default: | |
| LOG.warn("Unknown table name '{}' passed to getTableBucketPrefix (volume='{}', bucket='{}'). Returning empty string.", tableName, volume, bucket); |
jojochuang
left a comment
There was a problem hiding this comment.
Apart from cosmetic issues and missing javadocs, this looks good.
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
| LOG.info("{} actual row count={}, expectedCount={}", table.getName(), | ||
| count.get(), expectedCount); | ||
| }); | ||
| System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount); |
| public static String getFirstNChars(String str, int n) { | ||
| if (str == null || str.length() < n) { | ||
| return str; | ||
| } | ||
| return str.substring(0, n); | ||
| } |
...ksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java
Show resolved
Hide resolved
Change-Id: I06301f780ff7fff2ab652722872b1707501d61ec
Change-Id: I13eb719258714697f875d0b64d38e1185ba75db1
ea64602 to
1507659
Compare
|
Thank you @jojochuang for reviewing the patch |
|
@swamirishi please set fix version when resolving Jira issue after PR merge |
What changes were proposed in this pull request?
Currently get table Prefix function is placed inside snapshotUtils. However the implementation should dictate the various table prefix. For every diff computation corresponding to a table DAG traversal is done for all tables redundantly making the snapshot diff operation do redundant sst file seeks and reads which could have been avoided.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13849
How was this patch tested?
Fixed existing unit tests