HDDS-9198. Changed snapshot purge to single purge instead of batch purge #6491
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
We found two race conditions issues HDDS-10524 and HDDS-10590 which are fixed in PR #6443 and PR #6456 respectively.
There is still an issue with the existing way batch snapshot purge is processed.
As part of the snapshot purge, the deep clean flag of the next active snapshot, and the global and path previous of the next global and path level snapshots get updated. For this, updatedSnapInfos and updatedPathPreviousAndGlobalSnapshots maps are maintained in OMSnapshotPurgeRequest, and then these maps are flushed sequentially in OMSnapshotPurgeResponse. There is a problem with that and can cause chain corruption.
For example, let's assume as part of deep clean info update, snapshots are updated as {E -> E', F -> F', B' -> B'', G -> G'} and kept in updatedSnapInfos: [E', F', B'', G'] and previous snapshots are updated as {A - > A', B -> B', C -> C', D -> D'} and kept in updatedPathPreviousAndGlobalSnapshots: [A', B', C', D'].
After the purge final snapshot list should be [A', B'', C', D', E', F', G'] but because these maps are added to the batch sequentially [A', B', C', D', E', F', G'] or [A', B'', C', D', E', F', G'] depending on which one is added to the batch first code. The problem can still exist even if you fix the order of maps flush.
Ideally, these should be flushed in the same order the purge batch is processed.
This change is to fix the issue by changing the snapshot purge to take one snapshot at a time rather than the list of snapshots. For backward compatibility when Ratis transaction contains a list of snapshots, a new object is introduced to maintain the order of transaction and flush in the same order, they were updated in OMSnapshotPurgeRequest.
What is the link to the Apache JIRA
HDDS-9198
How was this patch tested?
Added and updated unit tests.