-
Notifications
You must be signed in to change notification settings - Fork 0
HDDS-12960. SST Statistics-Based RocksDB Compaction Scheduling #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-12960. SST Statistics-Based RocksDB Compaction Scheduling #2
Conversation
93ff6c9 to
e5a0850
Compare
82ed1f1 to
5504527
Compare
84e331e to
19ac749
Compare
69ca118 to
1e0a53e
Compare
8e9dcc7 to
ff75081
Compare
| LOG.info("Scheduling range compaction compactor for table {}", tableName); | ||
| executor.scheduleWithFixedDelay( | ||
| compactor::run, | ||
| checkIntervalMs, // TODO: randomize the start time | ||
| checkIntervalMs, | ||
| TimeUnit.MILLISECONDS); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to separate the the scheduler start time...
…nd enhance KeyRangeStats to use long for entry counts. Introduce FSOTableCompactor and OBSTableCompactor classes for improved compaction strategies based on bucket layouts.
…yPrefixUpperBound method in AbstractCompactor for improved key range handling. Update FSOTableCompactor and OBSTableCompactor to utilize new methods and streamline compaction logic.
|
No such command.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant new feature: fine-grained range compaction for RocksDB in Ozone. It adds the necessary infrastructure including DBStore API extensions, a RangeCompactionService, and Compactor implementations. The OBSTableCompactor includes detailed logic for identifying and preparing key ranges based on bucket boundaries and SST file statistics. Key changes also involve using a custom RocksDB fork, likely for specific C++ API access, and updating benchmark tools.
Major feedback points:
- The
FSOTableCompactoris currently a stub and needs implementation for FSO range compaction to be functional. - There are minor edge cases in
StringUtils.getKeyPrefixUpperBoundto consider. - A performance consideration for
OmKeyBenchGenerator.pickRandomKeywith very large live key sets.
| public static String getKeyPrefixUpperBound(String key) { | ||
| return key.substring(0, key.length() - 1) + (char)(key.charAt(key.length() - 1) + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...one/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/compaction/FSOTableCompactor.java
Show resolved
Hide resolved
| protected void collectRangesNeedingCompaction(List<KeyRange> ranges) { | ||
| // Iterator<Map.Entry<CacheKey<String>, CacheValue<OmBucketInfo>>> bucketIterator = getBucketIterator(); | ||
| // while (bucketIterator.hasNext()) { | ||
| // Map.Entry<CacheKey<String>, CacheValue<OmBucketInfo>> entry = bucketIterator.next(); | ||
| // String bucketKey = entry.getKey().getCacheKey(); | ||
| // OmBucketInfo bucketInfo = entry.getValue().getCacheValue(); | ||
|
|
||
| // if (nextBucket != null && !nextBucket.equals(bucketKey)) { | ||
| // continue; | ||
| // } | ||
|
|
||
| // // For FSO, we need to handle parent IDs | ||
| // if (nextParentId != null) { | ||
| // // Continue with the same bucket but different parent ID | ||
| // KeyRange parentRange = new KeyRange( | ||
| // getParentKey(bucketKey, nextParentId), | ||
| // getNextParentKey(bucketKey, nextParentId)); | ||
| // processRange(parentRange, ranges); | ||
| // } else { | ||
| // // Start with the first parent ID for this bucket | ||
| // String firstParentId = getFirstParentId(bucketKey); | ||
| // if (firstParentId != null) { | ||
| // KeyRange parentRange = new KeyRange( | ||
| // getParentKey(bucketKey, firstParentId), | ||
| // getNextParentKey(bucketKey, firstParentId)); | ||
| // processRange(parentRange, ranges); | ||
| // } | ||
| // } | ||
| // } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static String max(String str1, String str2) { | ||
| return str1.compareTo(str2) > 0 ? str1 : str2; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static String min(String str1, String str2) { | ||
| return str1.compareTo(str2) < 0 ? str1 : str2; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <phase>generate-sources</phase> | ||
| <configuration> | ||
| <url>https://github.com/facebook/rocksdb/archive/refs/tags/v${rocksdb.version}.tar.gz</url> | ||
| <url>https://github.com/peterxcli/rocksdb/archive/refs/tags/v${rocksdb.version}.tar.gz</url> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| <!-- <exec dir="${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}" executable="make" failonerror="true"> | ||
| <arg line="PORTABLE=1" /> | ||
| <arg line="DEBUG_LEVEL=0" /> | ||
| <arg line="EXTRA_CXXFLAGS='-fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-error -Wno-unused-variable'" /> | ||
| <arg line="-j${system.numCores}" /> | ||
| <arg line="rocksdbjava" /> | ||
| </exec> --> | ||
| <!-- Install the patched rocksdbjni to local Maven repository --> | ||
| <!-- <exec executable="mvn" failonerror="true"> | ||
| <arg line="install:install-file" /> | ||
| <arg line="-Dfile=${project.build.directory}/rocksdb/rocksdb-${rocksdb.version}/java/target/rocksdbjni-${rocksdb.version}-${rocksdbOSName}.jar" /> | ||
| <arg line="-DgroupId=org.rocksdb" /> | ||
| <arg line="-DartifactId=rocksdbjni" /> | ||
| <arg line="-Dversion=${rocksdb.version}" /> | ||
| <arg line="-Dpackaging=jar" /> | ||
| </exec> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private String pickRandomKey() { | ||
| int size = liveKeys.size(); | ||
| if (size == 0) { | ||
| return null; | ||
| } | ||
|
|
||
| // Convert to array for O(1) random access - more efficient than stream().skip() | ||
| String[] keysArray = liveKeys.toArray(new String[0]); | ||
| if (keysArray.length == 0) { | ||
| return null; // Race condition check | ||
| } | ||
|
|
||
| int index = ThreadLocalRandom.current().nextInt(keysArray.length); | ||
| return keysArray[index]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // TODO: merge consecutive ranges if they aren't too big, notice the situation that iterator has been reset | ||
| // to be more specific, | ||
| // constain 1: temp start key and end key are either null or not null at the same time, called them temp range | ||
| // constain 2: if temp range is submitted, clean temp range | ||
| // if range is null: | ||
| // 1. if temp range is not null, then submit and reset the temp range | ||
| // otherwise, do nothing | ||
| // if range is a spiltted range: | ||
| // 1. if temp range is null, submit the range directly | ||
| // 2. if temp range is not null and temp accumlated numEntries plus the current range's numEntries | ||
| // is greater than maxEntries, then submit **temp range and the current range** | ||
| // 3. if temp range is not null, and temp accumlated numEntries plus the current range's numEntries | ||
| // is less than maxEntries, then submit **a range that composed of temp range and the current range** | ||
| // (notice the difference between point 2, 3) | ||
| // if range is not a splitted range: | ||
| // 1. if temp range is null, update it to the current range's start key, and temp end key as well | ||
| // 2. if temp range is not null and temp accumlated numEntries plus the current range's numEntries | ||
| // is greater than maxEntries, then submit temp range and set temp range to the current range | ||
| // 3. if temp range is not null and temp accumlated numEntries plus the current range's numEntries | ||
| // is less than maxEntries, | ||
| // then **set temp range to a range that composed of temp range and the current range** | ||
| // (notice the difference between point 2, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...e/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/service/RangeCompactionService.java
Show resolved
Hide resolved
…ctory-aware range processing and adaptive splitting based on directory hierarchy. Enhance logging for better traceability during compaction operations.
…nt-finer-compaction-and-provide-benchmark-data
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
What changes were proposed in this pull request?
see apache#8178
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12960
How was this patch tested?
(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)