Skip to content

Conversation

@smengcl
Copy link
Contributor

@smengcl smengcl commented Oct 7, 2025

What changes were proposed in this pull request?

Implement Background Snapshot Defragmentation Service outlined in the design

Some commits are cherry-picked from the POC and rebased/changed.

  • WIP

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13009

How was this patch tested?

  • To cherry-pick test cases from POC
  • Add additional test cases as we see fit

@smengcl smengcl requested a review from swamirishi October 7, 2025 21:24
@smengcl smengcl added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Oct 7, 2025
Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smengcl thank you for starting to work on the framework for defrag service. I believe the approach to defrag a snapshot is not the one in the design doc and this could be suboptimal. I have left comments inline

throw new RocksDatabaseException("Failed to create temporary SST directory: " + tempSstDir, e);
}

LOG.info("Applied {} total incremental changes using snapshotDiff approach", totalChanges);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW we should truncate other tables like deletedTable, deletedDirectoryTable, snapshotRenamedTable, bucketTable, volumeTable etc. and directly bulk import the tables into the defragged snapshot DB.
Here we can use: dumpToFile and loadFromFile which would just dump the table into an sstFile which can be directly loaded. This would be much more optimal.

@Override
public void dumpToFileWithPrefix(File externalFile, KEY prefix) throws RocksDatabaseException, CodecException {
rawTable.dumpToFileWithPrefix(externalFile, encodeKey(prefix));
}
@Override
public void loadFromFile(File externalFile) throws RocksDatabaseException {
rawTable.loadFromFile(externalFile);
}

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need tests for the new service.

@jojochuang jojochuang requested a review from Copilot October 8, 2025 21:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a Background Snapshot Defragmentation Service for Ozone, designed to optimize snapshot storage by defragmenting RocksDB instances containing fragmented data. The service processes snapshots sequentially in the active chain, performing full defragmentation for the first snapshot and incremental defragmentation for subsequent ones based on snapshot diffs.

Key changes:

  • Introduces SnapshotDefragService for background snapshot defragmentation
  • Adds configuration keys and service integration into the key management framework
  • Modifies RocksDB utilities to support SST file operations and database access for defragmentation

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
SnapshotDefragService.java Main service implementation for defragmenting snapshots
KeyManagerImpl.java Integration of defrag service into key management lifecycle
KeyManager.java Interface addition for defrag service access
OMConfigKeys.java Configuration keys for defrag service parameters
RocksDatabase.java Exposed methods for checkpoint creation and data access
RDBSstFileWriter.java Added delete operation and made class public
OzoneConsts.java Added constant for defragmented checkpoint directory
OzoneConfigKeys.java Timeout configuration for defrag service
Comments suppressed due to low confidence (1)

hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/SnapshotDefragService.java:1

  • The configuration is disabled by default pending upgrade handling completion. This indicates the feature may not be ready for production use and the upgrade compatibility should be addressed.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@smengcl
Copy link
Contributor Author

smengcl commented Oct 9, 2025

we would need tests for the new service.

ofc, as I mentioned in the description, the test case was going to be cherry-picked later from the POC. The service alone is over 1000 lines diff. I wanted that to be commented on first. :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 17 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

</property>
<property>
<name>ozone.snapshot.defrag.limit.per.task</name>
<value>1</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to benchmark to find a more appropriate value.

@smengcl
Copy link
Contributor Author

smengcl commented Oct 31, 2025

I have addressed the majority of the comments in this PR except 3:

  1. Truncate other tables: HDDS-13009. Background snapshot defrag service (Draft v1) #9117 (comment)
  2. Upgrade handling (new layout feature): HDDS-13009. Background snapshot defrag service (Draft v1) #9117 (comment)
  3. Testing/benchmarking the config for a better default: HDDS-13009. Background snapshot defrag service (Draft v1) #9117 (comment)

This draft will be closed in favor of a new PR #9227, because of the rebase as a result of merging #9133 and other improvements.

@smengcl smengcl closed this Oct 31, 2025
@smengcl smengcl changed the title HDDS-13009. Background snapshot defrag service HDDS-13009. Background snapshot defrag service (Draft v1) Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants