-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13003. [Design Doc] Snapshot Defragmentation to reduce storage footprint #8514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
5cd218d
HDDS-13003. [Design Doc] Snapshot Compaction to reduce storage footprint
swamirishi 82ca33e
Merge remote-tracking branch 'apache/master' into HEAD
swamirishi 17573a6
HDDS-13003. Add force manual compaction for first snapshot in chain
swamirishi 421373a
Update SnapshotCompaction.md
swamirishi 8f0b329
Apply suggestion from @jojochuang
smengcl c12e519
Change term: Snapshot Compaction -> Snapshot Defragmentation
smengcl e0d2e64
Rename the design doc file itself
smengcl e7f846c
Address Wei-Chiu's comments.
smengcl File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,93 @@ | ||||||
| # Improving Snapshot Scale: | ||||||
|
|
||||||
| [HDDS-13003](https://issues.apache.org/jira/browse/HDDS-13003) | ||||||
|
|
||||||
| # Problem Statement | ||||||
|
|
||||||
| In Apache Ozone, snapshots currently take a checkpoint of the Active Object Store (AOS) RocksDB each time a snapshot is created and track the compaction of SST files over time. This model works efficiently when snapshots are short-lived, as they merely serve as hard links to the AOS RocksDB. However, over time, if an older snapshot persists while significant churn occurs in the AOS RocksDB (due to compactions and writes), the snapshot RocksDB may diverge significantly from both the AOS RocksDB and other snapshot RocksDB instances. This divergence increases storage requirements linearly with the number of snapshots. | ||||||
|
|
||||||
| # Solution Proposal: | ||||||
|
|
||||||
| The primary inefficiency in the current snapshotting mechanism stems from constant RocksDB compactions in AOS, which can cause a key, file, or directory entry to appear in multiple SST files. Ideally, each unique key, file, or directory entry should reside in only one SST file, eliminating redundant storage and mitigating the multiplier effect caused by snapshots. If implemented correctly, the total RocksDB size would be proportional to the total number of unique keys in the system rather than the number of snapshots. | ||||||
|
|
||||||
| ## Snapshot Compaction: | ||||||
|
|
||||||
| Currently, automatic RocksDB compactions are disabled for snapshot RocksDB to preserve snapshot diff performance, preventing any form of compaction. However, snapshots can be compacted if the next snapshot in the chain is a checkpoint of the previous snapshot plus a diff stored in a separate SST file. The proposed approach involves rewriting snapshots iteratively from the beginning of the snapshot chain and restructuring them in a separate directory. P.S This has got nothing to do with compacting snapshot’s rocksdb, we are not going to enable rocksdb auto compaction on snapshot rocksdb. | ||||||
smengcl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
smengcl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| 1. ### Introducing a last compaction time: | ||||||
|
|
||||||
| A new boolean flag (`needsCompaction`), timestamp (`lastCompactionTime`), int `version` will be added to snapshot metadata. If absent, `needsCompaction` will default to `true`. | ||||||
| A new list of Map\<String, List\<Longs\>\> (`uncompactedSstFiles`) also needs to be added to snapshot meta as part of snapshot create operation; this would be storing the original list of sst files in the uncompacted copy of the snapshot corresponding to keyTable/fileTable/DirectoryTable. This should be done as part of the snapshot create operation. | ||||||
| Since this is not going to be consistent across all OMs this would have to be written to a local yaml file inside the snapshot directory and this can be maintained in the SnapshotChainManager in memory on startup. So all updates should not go via ratis. | ||||||
| An additional Map\<Integer, Map\<String, List\<Long\>\>\> (`compactedSstFiles`) also needs to be added to snapshotMeta. This will be maintaining a list of sstFiles of different versions of compacted snapshots. The key here would be the version number of snapshots. | ||||||
|
|
||||||
| 2. ### Snapshot Cache Lock for Read Prevention | ||||||
|
|
||||||
| A snapshot lock will be introduced in the snapshot cache to prevent reads on a specific snapshot during compaction. This ensures no active reads occur while replacing the underlying RocksDB instance. | ||||||
smengcl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
| 3. ### Directory Structure Changes | ||||||
|
|
||||||
| Snapshots currently reside in the `db.checkpoints` directory. The proposal introduces a `db.checkpoints.compacted` directory for compacted snapshots. The directory format should be as follows: | ||||||
|
||||||
| Snapshots currently reside in the `db.checkpoints` directory. The proposal introduces a `db.checkpoints.compacted` directory for compacted snapshots. The directory format should be as follows: | |
| Snapshots currently reside in the `db.checkpoints` directory. The proposal introduces a `db.checkpoints.compacted` directory for compacted snapshots. And the OM DB checkpoint directory name format inside the `db.checkpoints.compacted` directory should be as follows: |
Contributor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, snapshots are under db.snapshots. e.g.:
/var/lib/hadoop-ozone/om/data/db.snapshots
/var/lib/hadoop-ozone/om/data/db.snapshots/diffState
/var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-sst-backup
/var/lib/hadoop-ozone/om/data/db.snapshots/diffState/snapDiff
/var/lib/hadoop-ozone/om/data/db.snapshots/diffState/compaction-log
/var/lib/hadoop-ozone/om/data/db.snapshots/checkpointState
Newly created FS snapshots would be under ./db.snapshots/checkpointState
smengcl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
smengcl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.