Skip to content

HDDS-13847. Introduce Snapshot Content Lock to lock table contents#9212

Merged
swamirishi merged 5 commits intoapache:masterfrom
swamirishi:HDDS-13847
Nov 1, 2025
Merged

HDDS-13847. Introduce Snapshot Content Lock to lock table contents#9212
swamirishi merged 5 commits intoapache:masterfrom
swamirishi:HDDS-13847

Conversation

@swamirishi
Copy link
Contributor

What changes were proposed in this pull request?

Introduce Snapshot Content Lock to prevent writes on a snapshot during snapshot defrag. Even though GC lock would be acquired during defrag on leader the contents in the snapshot is only guaranteed to be same for leader however defrag on follower can happen at any time, thus a lock should be acquired while the defrag to prevent the contents from changing in the snapshot which would mainly happen in 3 transactions during OM double buffer flush on followers i.e. 
OMSnapshotMoveTableKeysResponse, OMKeyPurgeResponse, OMDirectoriesPurgeResponseWithFSO.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13847

How was this patch tested?

Additional unit tests added.

Change-Id: I35cabd1baee6887034541a9b4ee76578c1015805
@swamirishi swamirishi added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Oct 28, 2025
@swamirishi
Copy link
Contributor Author

@SaketaChalamchala can you review this patch?

Change-Id: Ia518f4b87df6565440a33d406a7d3107454ba1c2

# Conflicts:
#	hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/lock/FlatResource.java
Change-Id: Ib1d353f5c5a5cc2be8ecbebf853b29d36754ce52
Change-Id: I2cc6932b54a3458fc6ffe817fe5bd3403eac4f63

IOzoneManagerLock lock = omMetadataManager.getLock();
UUID fromSnapshotId = fromSnapshot.getSnapshotId();
OMLockDetails lockDetails = lock.acquireReadLock(SNAPSHOT_DB_CONTENT_LOCK, fromSnapshotId.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we only use read lock on SNAPSHOT_DB_CONTENT_LOCK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would acquire a write lock when we want to do a defrag

Copy link
Contributor Author

@swamirishi swamirishi Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acquire write GC_LOCK
Snapshot is flushed

   take a checkpoint on tmp path
   compact if first snapshot(kforce Compaction)
   update keyTable, fileTable, directory()
   Acquire write SNAPSHOT_DB_CONTENT_LOCK
    truncate and ingest other tables(dumpToSst and ingestSstFile)
    delete if new prodPath if exists and mv tmp to production version snapshotDB path
    bump version snapshotLocalDataYaml -> new versionPath
    SNapshot cache lock on snapshotId(previosu rocksdb instance)
        delete of old version of snapshot path
    release snapshot cache lock on snapshotId
   releaseWrite content Lock
Release gc lock

@smengcl smengcl requested a review from Copilot November 1, 2025 02:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new lock type SNAPSHOT_DB_CONTENT_LOCK to protect snapshot RocksDB contents during operations that read or modify snapshot data. The changes add proper locking around snapshot database access during key purge and snapshot move operations.

Key changes:

  • Added a new SNAPSHOT_DB_CONTENT_LOCK flat resource for locking snapshot database contents
  • Enhanced MultiSnapshotLocks to accept an initial capacity parameter for better memory efficiency
  • Added lock acquisition/release around snapshot database operations in purge and move responses
  • Added test coverage to verify lock acquisition behavior

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
FlatResource.java Adds new SNAPSHOT_DB_CONTENT_LOCK enum value
MultiSnapshotLocks.java Adds constructor parameter for initial list capacity optimization
OMSnapshotMoveTableKeysResponse.java Acquires and releases snapshot DB content locks around snapshot operations
OMKeyPurgeResponse.java Acquires and releases snapshot DB content locks during key purge
OMDirectoriesPurgeResponseWithFSO.java Acquires and releases snapshot DB content locks during directory purge
ReclaimableFilter.java Passes capacity hint to MultiSnapshotLocks constructor
SnapshotDeletingService.java Passes capacity hint to MultiSnapshotLocks constructor
SnapshotDefragService.java Passes capacity hint to MultiSnapshotLocks constructor
TestSnapshotRequestAndResponse.java Wraps metadata manager with spy, adds spy import, removes trailing blank line
TestOMSnapshotMoveTableKeysResponse.java Adds test coverage for lock acquisition verification
TestOMKeyRequest.java Removes trailing blank line
TestOMKeyPurgeRequestAndResponse.java Adds test coverage for lock acquisition verification
TestOMDirectoriesPurgeRequestAndResponse.java Adds test coverage for lock acquisition verification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Change-Id: I5300e857b55b24d671e823afbe10fbd10f071374
@swamirishi swamirishi marked this pull request as ready for review November 1, 2025 03:51
@swamirishi swamirishi merged commit c21ec5d into apache:master Nov 1, 2025
53 of 55 checks passed
@swamirishi
Copy link
Contributor Author

thank you @smengcl for reviewing the patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants