HDDS-8973. Ozone SCM HA should not allocates duplicate IDs when transferring leadership #5018
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Currently when we manually transfer SCM leadership, the old SCM maybe allocates some IDs duplicate with the new SCM leadership. This will cause some serious issue, such as if the SCM allocates the same ID for
Blockfor CreateKey request and thoseBlockis allocated to the same Container,Blockwill overwrite each other Chunk file, the data will be lost.Reproduce
Generate a consistently faster write load and switch the SCM with the command, then you can observe log message on the DN
Root Cause
The reason for this problem is that the
batch.lastIdis updated before the successful execution ofstateManager.allocateBatch.ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SequenceIdGenerator.java
Line 124 in dd25740
which causes the subsequent requests from other threads will get an illegitimate ID
ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SequenceIdGenerator.java
Lines 114 to 116 in dd25740
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-8973
Please replace this section with the link to the Apache JIRA)
How was this patch tested?
unit test