Skip to content

Conversation

@GeorgeJahad
Copy link
Contributor

What changes were proposed in this pull request?

The incremental checkpointing PR I created here: #4770 allows multiple increments of data if the initial tarball took too long to download.

This is needed for om snapshots but is not completely sufficient, because the first increment can grow quite large, as it includes all sst files that exist at the time it is created. (Each subsequent increment only includes the sst files created after the first tarball was created.)

This means there is no limit to the size of the initial tarball.

In order to alleviate that, this PR adds the "max sst size" config flag: OZONE_OM_RATIS_SNAPSHOT_MAX_TOTAL_SST_SIZE_KEY.

The tarball creation process has been modified to close the tarball once it reaches that size of sst files. If that leaves the tarball incomplete, the follower will know to request another, incremental one.

Note that with this design all the non sst files are in the final tarball; (because they are mutable and we want the latest version of each).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8943

How was this patch tested?

Tests were updated accordingly. However due to another bug, one of the test classes is entirely disabled: https://issues.apache.org/jira/browse/HDDS-8952

It will be reenabled once that ticket is addressed

@GeorgeJahad GeorgeJahad changed the title HDDS-8943. Limit the total size of sst files in bootstrapping tarball. HDDS-8943. [Snapshot] Limit the total size of sst files in bootstrapping tarball. Jul 1, 2023
@adoroszlai adoroszlai added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Jul 2, 2023
</property>
<property>
<name>ozone.om.ratis.snapshot.max.total.sst.size</name>
<value>100000000</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the unit (MB/GB) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just bytes. Should be MB. I'll change it.

@smengcl smengcl self-requested a review July 10, 2023 21:41
Comment on lines +275 to +276
if (copySize.get() + fileSize > maxTotalSstSize) {
return false;
Copy link
Contributor

@smengcl smengcl Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. What happens if there is one single huge SST file (exceeding 100 MB) in the DB? In that case will that SST file ever be included in the tarball with the current logic?

It might happen if some SST goes deep down in compaction level. (or when SST compression is disabled, which shouldn't happen in production, but can be used for testing)

Copy link
Contributor

@smengcl smengcl Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the default single SST file size limit is 64 MB:

And that the level multiplier is 1 as well:

which implies that the largest SST file currently in Ozone should be around 64 MB, according to the tuning guide:

target_file_size_base and target_file_size_multiplier -- Files in level 1 will have target_file_size_base bytes. Each next level's file size will be target_file_size_multiplier bigger than previous one. However, by default target_file_size_multiplier is 1, so files in all L1..Lmax levels are equal. Increasing target_file_size_base will reduce total number of database files, which is generally a good thing. We recommend setting target_file_size_base to be max_bytes_for_level_base / 10, so that there are 10 files in level 1.

so while in theory this wouldn't be a problem if the ozone.om.ratis.snapshot.max.total.sst.size config is set to 100 MB (currently) because the largest SST file wouldn't exceed that. However, IMO it could still be an issue when ozone.om.ratis.snapshot.max.total.sst.size is misconfigured or when the RocksDB config is tuned in the future.

@prashantpogde accidentally merged this PR. You may open an addendum PR for the same jira (or filing another jira will do as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks @smengcl !

Copy link
Contributor

@smengcl smengcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @GeorgeJahad for the improvement. I have a few comments inline.

@prashantpogde prashantpogde merged commit 66c4c06 into apache:master Jul 11, 2023
@prashantpogde
Copy link
Contributor

Thanks you @GeorgeJahad for making these changes.

@GeorgeJahad
Copy link
Contributor Author

However, IMO it could still be an issue when ozone.om.ratis.snapshot.max.total.sst.size is misconfigured or when the RocksDB config is tuned in the future.

Agreed. I'll fix it in an addendum pr. Thanks for the research @smengcl !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants