-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-8943. [Snapshot] Limit the total size of sst files in bootstrapping tarball. #5014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| </property> | ||
| <property> | ||
| <name>ozone.om.ratis.snapshot.max.total.sst.size</name> | ||
| <value>100000000</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the unit (MB/GB) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just bytes. Should be MB. I'll change it.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/HddsServerUtil.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/RDBSnapshotProvider.java
Show resolved
Hide resolved
...op-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMRatisSnapshots.java
Outdated
Show resolved
Hide resolved
| if (copySize.get() + fileSize > maxTotalSstSize) { | ||
| return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. What happens if there is one single huge SST file (exceeding 100 MB) in the DB? In that case will that SST file ever be included in the tarball with the current logic?
It might happen if some SST goes deep down in compaction level. (or when SST compression is disabled, which shouldn't happen in production, but can be used for testing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the default single SST file size limit is 64 MB:
ozone/hadoop-hdds/container-service/src/test/resources/123-dn-container.db/OPTIONS-000036
Line 128 in 275653e
| target_file_size_base=67108864 |
And that the level multiplier is 1 as well:
ozone/hadoop-hdds/container-service/src/test/resources/123-dn-container.db/OPTIONS-000036
Line 109 in 275653e
| target_file_size_multiplier=1 |
which implies that the largest SST file currently in Ozone should be around 64 MB, according to the tuning guide:
target_file_size_base and target_file_size_multiplier -- Files in level 1 will have target_file_size_base bytes. Each next level's file size will be target_file_size_multiplier bigger than previous one. However, by default target_file_size_multiplier is 1, so files in all L1..Lmax levels are equal. Increasing target_file_size_base will reduce total number of database files, which is generally a good thing. We recommend setting target_file_size_base to be max_bytes_for_level_base / 10, so that there are 10 files in level 1.
so while in theory this wouldn't be a problem if the ozone.om.ratis.snapshot.max.total.sst.size config is set to 100 MB (currently) because the largest SST file wouldn't exceed that. However, IMO it could still be an issue when ozone.om.ratis.snapshot.max.total.sst.size is misconfigured or when the RocksDB config is tuned in the future.
@prashantpogde accidentally merged this PR. You may open an addendum PR for the same jira (or filing another jira will do as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks @smengcl !
smengcl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @GeorgeJahad for the improvement. I have a few comments inline.
Co-authored-by: Siyao Meng <[email protected]>
|
Thanks you @GeorgeJahad for making these changes. |
Agreed. I'll fix it in an addendum pr. Thanks for the research @smengcl ! |
What changes were proposed in this pull request?
The incremental checkpointing PR I created here: #4770 allows multiple increments of data if the initial tarball took too long to download.
This is needed for om snapshots but is not completely sufficient, because the first increment can grow quite large, as it includes all sst files that exist at the time it is created. (Each subsequent increment only includes the sst files created after the first tarball was created.)
This means there is no limit to the size of the initial tarball.
In order to alleviate that, this PR adds the "max sst size" config flag: OZONE_OM_RATIS_SNAPSHOT_MAX_TOTAL_SST_SIZE_KEY.
The tarball creation process has been modified to close the tarball once it reaches that size of sst files. If that leaves the tarball incomplete, the follower will know to request another, incremental one.
Note that with this design all the non sst files are in the final tarball; (because they are mutable and we want the latest version of each).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8943
How was this patch tested?
Tests were updated accordingly. However due to another bug, one of the test classes is entirely disabled: https://issues.apache.org/jira/browse/HDDS-8952
It will be reenabled once that ticket is addressed