Skip to content

Conversation

@peterxcli
Copy link
Owner

@peterxcli peterxcli commented May 11, 2025

What changes were proposed in this pull request?

see apache#8178

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12960

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@peterxcli peterxcli force-pushed the hdds12960-implement-finer-compaction-and-provide-benchmark-data branch 6 times, most recently from 93ff6c9 to e5a0850 Compare May 12, 2025 08:27
@peterxcli peterxcli changed the title HDDS-12960. Implement finer compaction and provide benchmark data HDDS-12960. Implement fine grained range compaction and provide benchmark data May 12, 2025
@peterxcli peterxcli marked this pull request as draft May 12, 2025 08:33
@peterxcli peterxcli force-pushed the hdds12960-implement-finer-compaction-and-provide-benchmark-data branch 3 times, most recently from 82ed1f1 to 5504527 Compare May 25, 2025 07:46
@peterxcli peterxcli force-pushed the hdds12960-implement-finer-compaction-and-provide-benchmark-data branch 9 times, most recently from 84e331e to 19ac749 Compare May 31, 2025 04:18
@peterxcli peterxcli force-pushed the hdds12960-implement-finer-compaction-and-provide-benchmark-data branch from 69ca118 to 1e0a53e Compare June 3, 2025 16:13
@peterxcli peterxcli self-assigned this Jun 6, 2025
@peterxcli peterxcli force-pushed the hdds12960-implement-finer-compaction-and-provide-benchmark-data branch from 8e9dcc7 to ff75081 Compare June 6, 2025 13:44
Comment on lines +141 to +148
LOG.info("Scheduling range compaction compactor for table {}", tableName);
executor.scheduleWithFixedDelay(
compactor::run,
checkIntervalMs, // TODO: randomize the start time
checkIntervalMs,
TimeUnit.MILLISECONDS);
}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to separate the the scheduler start time...

…nd enhance KeyRangeStats to use long for entry counts. Introduce FSOTableCompactor and OBSTableCompactor classes for improved compaction strategies based on bucket layouts.
…yPrefixUpperBound method in AbstractCompactor for improved key range handling. Update FSOTableCompactor and OBSTableCompactor to utilize new methods and streamline compaction logic.
@github-actions
Copy link

No such command. gemini Available commands:

  • /close : Close pending pull request temporary
  • /help : Show all the available comment commands
  • /label : add new label to the issue: /label <label>
  • /pending : Add a REQUESTED_CHANGE type review to mark issue non-mergeable: /pending <reason>
  • /ready : Dismiss all the blocking reviews by github-actions bot
  • /retest : provide help on how to trigger new CI build

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature: fine-grained range compaction for RocksDB in Ozone. It adds the necessary infrastructure including DBStore API extensions, a RangeCompactionService, and Compactor implementations. The OBSTableCompactor includes detailed logic for identifying and preparing key ranges based on bucket boundaries and SST file statistics. Key changes also involve using a custom RocksDB fork, likely for specific C++ API access, and updating benchmark tools.

Major feedback points:

  • The FSOTableCompactor is currently a stub and needs implementation for FSO range compaction to be functional.
  • There are minor edge cases in StringUtils.getKeyPrefixUpperBound to consider.
  • A performance consideration for OmKeyBenchGenerator.pickRandomKey with very large live key sets.

Comment on lines +101 to +102
public static String getKeyPrefixUpperBound(String key) {
return key.substring(0, key.length() - 1) + (char)(key.charAt(key.length() - 1) + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the input key is an empty string, a StringIndexOutOfBoundsException will be thrown. Consider adding a check for an empty key at the beginning of the method.

Comment on lines 49 to 78
protected void collectRangesNeedingCompaction(List<KeyRange> ranges) {
// Iterator<Map.Entry<CacheKey<String>, CacheValue<OmBucketInfo>>> bucketIterator = getBucketIterator();
// while (bucketIterator.hasNext()) {
// Map.Entry<CacheKey<String>, CacheValue<OmBucketInfo>> entry = bucketIterator.next();
// String bucketKey = entry.getKey().getCacheKey();
// OmBucketInfo bucketInfo = entry.getValue().getCacheValue();

// if (nextBucket != null && !nextBucket.equals(bucketKey)) {
// continue;
// }

// // For FSO, we need to handle parent IDs
// if (nextParentId != null) {
// // Continue with the same bucket but different parent ID
// KeyRange parentRange = new KeyRange(
// getParentKey(bucketKey, nextParentId),
// getNextParentKey(bucketKey, nextParentId));
// processRange(parentRange, ranges);
// } else {
// // Start with the first parent ID for this bucket
// String firstParentId = getFirstParentId(bucketKey);
// if (firstParentId != null) {
// KeyRange parentRange = new KeyRange(
// getParentKey(bucketKey, firstParentId),
// getNextParentKey(bucketKey, firstParentId));
// processRange(parentRange, ranges);
// }
// }
// }
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The collectRangesNeedingCompaction method is currently commented out. This implementation needs to be completed for FSO range compaction to work.

Comment on lines +105 to +107
public static String max(String str1, String str2) {
return str1.compareTo(str2) > 0 ? str1 : str2;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding null checks for str1 and str2 or annotating them with @Nonnull to clarify the contract of this method. If nulls are possible, define the desired behavior.

Comment on lines +109 to +111
public static String min(String str1, String str2) {
return str1.compareTo(str2) < 0 ? str1 : str2;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the max method, consider adding null checks for str1 and str2 or annotating them with @Nonnull to clarify the contract. Define behavior for null inputs if they are permissible.

<phase>generate-sources</phase>
<configuration>
<url>https://github.com/facebook/rocksdb/archive/refs/tags/v${rocksdb.version}.tar.gz</url>
<url>https://github.com/peterxcli/rocksdb/archive/refs/tags/v${rocksdb.version}.tar.gz</url>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The RocksDB download URL has been changed to a personal fork (peterxcli/rocksdb) and the version is now 7.7.3.2. Document the reason for this fork in the PR description or in a comment here.

Comment on lines +258 to +273
<!-- <exec dir="${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}" executable="make" failonerror="true">
<arg line="PORTABLE=1" />
<arg line="DEBUG_LEVEL=0" />
<arg line="EXTRA_CXXFLAGS='-fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-error -Wno-unused-variable'" />
<arg line="-j${system.numCores}" />
<arg line="rocksdbjava" />
</exec> -->
<!-- Install the patched rocksdbjni to local Maven repository -->
<!-- <exec executable="mvn" failonerror="true">
<arg line="install:install-file" />
<arg line="-Dfile=${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}/java/target/rocksdbjni-${rocksdb.version}-${rocksdbOSName}.jar" />
<arg line="-DgroupId=org.rocksdb" />
<arg line="-DartifactId=rocksdbjni" />
<arg line="-Dversion=${rocksdb.version}" />
<arg line="-Dpackaging=jar" />
</exec> -->

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The make rocksdbjava and mvn install:install-file steps are commented out. Please confirm that the pre-built JNI artifact from the peterxcli/rocksdb fork (version 7.7.3.2) is expected to be available in the GitHub Maven repository defined in the main pom.xml and contains the necessary JNI bindings.

Comment on lines +243 to +257
private String pickRandomKey() {
int size = liveKeys.size();
if (size == 0) {
return null;
}

// Convert to array for O(1) random access - more efficient than stream().skip()
String[] keysArray = liveKeys.toArray(new String[0]);
if (keysArray.length == 0) {
return null; // Race condition check
}

int index = ThreadLocalRandom.current().nextInt(keysArray.length);
return keysArray[index];
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Converting liveKeys to an array using toArray(new String[0]) inside pickRandomKey() can be inefficient if the set is very large. Consider alternative strategies for random sampling from a concurrent set if maxLiveKeys is substantial and pickRandomKey() is called frequently.

Comment on lines +81 to +102
// TODO: merge consecutive ranges if they aren't too big, notice the situation that iterator has been reset
// to be more specific,
// constain 1: temp start key and end key are either null or not null at the same time, called them temp range
// constain 2: if temp range is submitted, clean temp range
// if range is null:
// 1. if temp range is not null, then submit and reset the temp range
// otherwise, do nothing
// if range is a spiltted range:
// 1. if temp range is null, submit the range directly
// 2. if temp range is not null and temp accumlated numEntries plus the current range's numEntries
// is greater than maxEntries, then submit **temp range and the current range**
// 3. if temp range is not null, and temp accumlated numEntries plus the current range's numEntries
// is less than maxEntries, then submit **a range that composed of temp range and the current range**
// (notice the difference between point 2, 3)
// if range is not a splitted range:
// 1. if temp range is null, update it to the current range's start key, and temp end key as well
// 2. if temp range is not null and temp accumlated numEntries plus the current range's numEntries
// is greater than maxEntries, then submit temp range and set temp range to the current range
// 3. if temp range is not null and temp accumlated numEntries plus the current range's numEntries
// is less than maxEntries,
// then **set temp range to a range that composed of temp range and the current range**
// (notice the difference between point 2, 3)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TODO comment here outlines a potential optimization for merging consecutive small ranges that need compaction. Implementing this could improve compaction efficiency.

gemini-code-assist[bot]

This comment was marked as duplicate.

Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
Repository owner deleted a comment from gemini-code-assist bot Jun 16, 2025
peterxcli added 3 commits July 4, 2025 21:59
…ctory-aware range processing and adaptive splitting based on directory hierarchy. Enhance logging for better traceability during compaction operations.
…nt-finer-compaction-and-provide-benchmark-data
@github-actions
Copy link

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Nov 14, 2025
@peterxcli peterxcli changed the title HDDS-12960. Implement fine grained range compaction and provide benchmark data HDDS-12960. SST Statistics-Based RocksDB Compaction Scheduling Nov 14, 2025
@github-actions github-actions bot removed the stale label Nov 15, 2025
@github-actions
Copy link

github-actions bot commented Dec 6, 2025

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions bot added the stale label Dec 6, 2025
@github-actions
Copy link

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants