HDDS-13847. Introduce Snapshot Content Lock to lock table contents#9212
HDDS-13847. Introduce Snapshot Content Lock to lock table contents#9212swamirishi merged 5 commits intoapache:masterfrom
Conversation
Change-Id: I35cabd1baee6887034541a9b4ee76578c1015805
|
@SaketaChalamchala can you review this patch? |
Change-Id: Ia518f4b87df6565440a33d406a7d3107454ba1c2 # Conflicts: # hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/FlatResource.java
Change-Id: Ib1d353f5c5a5cc2be8ecbebf853b29d36754ce52
Change-Id: I2cc6932b54a3458fc6ffe817fe5bd3403eac4f63
|
|
||
| IOzoneManagerLock lock = omMetadataManager.getLock(); | ||
| UUID fromSnapshotId = fromSnapshot.getSnapshotId(); | ||
| OMLockDetails lockDetails = lock.acquireReadLock(SNAPSHOT_DB_CONTENT_LOCK, fromSnapshotId.toString()); |
There was a problem hiding this comment.
do we only use read lock on SNAPSHOT_DB_CONTENT_LOCK?
There was a problem hiding this comment.
We would acquire a write lock when we want to do a defrag
There was a problem hiding this comment.
Acquire write GC_LOCK
Snapshot is flushed
take a checkpoint on tmp path
compact if first snapshot(kforce Compaction)
update keyTable, fileTable, directory()
Acquire write SNAPSHOT_DB_CONTENT_LOCK
truncate and ingest other tables(dumpToSst and ingestSstFile)
delete if new prodPath if exists and mv tmp to production version snapshotDB path
bump version snapshotLocalDataYaml -> new versionPath
SNapshot cache lock on snapshotId(previosu rocksdb instance)
delete of old version of snapshot path
release snapshot cache lock on snapshotId
releaseWrite content Lock
Release gc lock
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new lock type SNAPSHOT_DB_CONTENT_LOCK to protect snapshot RocksDB contents during operations that read or modify snapshot data. The changes add proper locking around snapshot database access during key purge and snapshot move operations.
Key changes:
- Added a new
SNAPSHOT_DB_CONTENT_LOCKflat resource for locking snapshot database contents - Enhanced
MultiSnapshotLocksto accept an initial capacity parameter for better memory efficiency - Added lock acquisition/release around snapshot database operations in purge and move responses
- Added test coverage to verify lock acquisition behavior
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| FlatResource.java | Adds new SNAPSHOT_DB_CONTENT_LOCK enum value |
| MultiSnapshotLocks.java | Adds constructor parameter for initial list capacity optimization |
| OMSnapshotMoveTableKeysResponse.java | Acquires and releases snapshot DB content locks around snapshot operations |
| OMKeyPurgeResponse.java | Acquires and releases snapshot DB content locks during key purge |
| OMDirectoriesPurgeResponseWithFSO.java | Acquires and releases snapshot DB content locks during directory purge |
| ReclaimableFilter.java | Passes capacity hint to MultiSnapshotLocks constructor |
| SnapshotDeletingService.java | Passes capacity hint to MultiSnapshotLocks constructor |
| SnapshotDefragService.java | Passes capacity hint to MultiSnapshotLocks constructor |
| TestSnapshotRequestAndResponse.java | Wraps metadata manager with spy, adds spy import, removes trailing blank line |
| TestOMSnapshotMoveTableKeysResponse.java | Adds test coverage for lock acquisition verification |
| TestOMKeyRequest.java | Removes trailing blank line |
| TestOMKeyPurgeRequestAndResponse.java | Adds test coverage for lock acquisition verification |
| TestOMDirectoriesPurgeRequestAndResponse.java | Adds test coverage for lock acquisition verification |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/FlatResource.java
Outdated
Show resolved
Hide resolved
Change-Id: I5300e857b55b24d671e823afbe10fbd10f071374
|
thank you @smengcl for reviewing the patch |
What changes were proposed in this pull request?
Introduce Snapshot Content Lock to prevent writes on a snapshot during snapshot defrag. Even though GC lock would be acquired during defrag on leader the contents in the snapshot is only guaranteed to be same for leader however defrag on follower can happen at any time, thus a lock should be acquired while the defrag to prevent the contents from changing in the snapshot which would mainly happen in 3 transactions during OM double buffer flush on followers i.e.
OMSnapshotMoveTableKeysResponse, OMKeyPurgeResponse, OMDirectoriesPurgeResponseWithFSO.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13847
How was this patch tested?
Additional unit tests added.