Skip to content

HDDS-13849. Refactor getTablePrefix function in SnapshotDiff flow#9235

Merged
swamirishi merged 8 commits intoapache:masterfrom
swamirishi:HDDS-13849
Nov 11, 2025
Merged

HDDS-13849. Refactor getTablePrefix function in SnapshotDiff flow#9235
swamirishi merged 8 commits intoapache:masterfrom
swamirishi:HDDS-13849

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

Currently get table Prefix function is placed inside snapshotUtils. However the implementation should dictate the various table prefix. For every diff computation corresponding to a table DAG traversal is done for all tables redundantly making the snapshot diff operation do redundant sst file seeks and reads which could have been avoided.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13849

How was this patch tested?

Fixed existing unit tests

Change-Id: If068daeb7ca9bf3234614a29da90edcb9fb5778d
Change-Id: If064b64d1dfd624324090a946a33eab55535f28f
Change-Id: I3ed6575a96a12c349bb229fb3772c256a986f242
Change-Id: Iaddc93f6dc50a34dd5ac3eaac92b7400f4856ccf
Change-Id: I1eb91064f806d5cd5393c74597c727e41eb11742
@swamirishi
Copy link
Contributor Author

@SaketaChalamchala can you review this

Change-Id: I98a74e0d6815f28792096f0992e1aca933706963
@jojochuang jojochuang requested a review from Copilot November 10, 2025 20:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the snapshot diff flow by moving table prefix computation from a utility class to the metadata manager implementation level. The key improvement is eliminating redundant SST file filtering operations by centralizing table prefix management and changing the representation from Map<String, String> to a dedicated TablePrefixInfo class.

Key Changes:

  • Introduced TablePrefixInfo class to encapsulate table-to-prefix mappings
  • Moved table prefix computation from SnapshotUtils.getColumnFamilyToKeyPrefixMap() to OMMetadataManager.getTableBucketPrefix()
  • Updated SST filtering methods to accept Set<String> instead of List<String> for table lookups
  • Consolidated duplicate table cleanup methods in OmSnapshotManager

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
TablePrefixInfo.java New class encapsulating table prefix mappings with immutable map
OMMetadataManager.java Added interface methods for computing table bucket prefixes
OmMetadataManagerImpl.java Implemented table prefix computation with switch-based lookup
SnapshotDiffManager.java Updated to use TablePrefixInfo and Set<String> for table lookups
RocksDiffUtils.java Refactored SST filtering to use TablePrefixInfo and table sets
RocksDBCheckpointDiffer.java Updated DAG traversal to filter by table set
KeyManagerImpl.java Modified to use table-based prefix lookup
OmSnapshotManager.java Consolidated three separate table cleanup methods into one
Test files Updated mocks and assertions to use new types

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

LOG.info("{} actual row count={}, expectedCount={}", table.getName(),
count.get(), expectedCount);
});
System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount);
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statement should be removed before merging to production. Use the existing LOG.info statement instead or remove entirely.

Suggested change
System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount);
// Debug print statement removed; information is already logged above.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this

Comment on lines +127 to +132
public static String getFirstNChars(String str, int n) {
if (str == null || str.length() < n) {
return str;
}
return str.substring(0, n);
}
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method returns the original string when str.length() < n, but should return the full string only when it's shorter. When str is null, returning null may cause NullPointerExceptions in callers. Consider returning an empty string for null input or documenting this behavior clearly.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine.

return false;
}

// NOTE: Update both getTableBucketPrefixInfo(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket)
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references 'getTableBucketPrefixInfo' but the actual method name is 'getTableBucketPrefix' (without 'Info'). Update the comment to match the actual method signature.

Suggested change
// NOTE: Update both getTableBucketPrefixInfo(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket)
// NOTE: Update both getTableBucketPrefix(volume, bucket) & getTableBucketPrefix(tableName, volume, bucket)

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case DELETED_DIR_TABLE:
case OPEN_FILE_TABLE:
return getBucketKeyPrefixFSO(volume, bucket);
default:
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an empty string as default for unrecognized table names could silently mask configuration errors. Consider logging a warning when an unknown table name is encountered to aid debugging.

Suggested change
default:
default:
LOG.warn("Unknown table name '{}' passed to getTableBucketPrefix (volume='{}', bucket='{}'). Returning empty string.", tableName, volume, bucket);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from cosmetic issues and missing javadocs, this looks good.

LOG.info("{} actual row count={}, expectedCount={}", table.getName(),
count.get(), expectedCount);
});
System.out.println("Swaminathan \t" + count.get() + "\t" + expectedCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix this

Comment on lines +127 to +132
public static String getFirstNChars(String str, int n) {
if (str == null || str.length() < n) {
return str;
}
return str.substring(0, n);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine.

Change-Id: I06301f780ff7fff2ab652722872b1707501d61ec
Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Change-Id: I13eb719258714697f875d0b64d38e1185ba75db1
@swamirishi swamirishi merged commit ff41ce6 into apache:master Nov 11, 2025
43 checks passed
@swamirishi
Copy link
Contributor Author

Thank you @jojochuang for reviewing the patch

@adoroszlai
Copy link
Contributor

@swamirishi please set fix version when resolving Jira issue after PR merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants