-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-11763. Implement container repair logic within datanodes. #7474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11763. Implement container repair logic within datanodes. #7474
Conversation
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @aswinshakil
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
|
@aswinshakil can you resolve the merge conflicts? |
...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java
Outdated
Show resolved
Hide resolved
…com/apache/ozone into HDDS-11763-repair
…pair Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestReconcileContainerCommandHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/HddsUtils.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java
Outdated
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @aswinshakil. I've only reviewed the non-test code for now since there are a lot of comments. I think we need to look into using BlockInputStream and seek for the reading part of repairs instead of implementing our own lower level logic.
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Show resolved
Hide resolved
...rc/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeStateMachine.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/client/ReplicationConfig.java
Outdated
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @aswinshakil. Still a fair amount of comments, although most are minor. I've reviewed everything except TestKeyValueHandler and TestContainerCommandReconciliation. Once the prod code looks good I will start reviewing those larger test changes.
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Outdated
Show resolved
Hide resolved
...ice/src/main/java/org/apache/hadoop/ozone/container/checksum/DNContainerOperationClient.java
Outdated
Show resolved
Hide resolved
...ozone/container/common/statemachine/commandhandler/TestReconcileContainerCommandHandler.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java
Outdated
Show resolved
Hide resolved
...e/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ChunkInputStream.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
|
I am done with my review. We can merge one @errose28 approves as well. We can add more tests in follow up jiras. |
hemantk-12
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aswinshakil for the patch.
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
Outdated
Show resolved
Hide resolved
...er-service/src/main/java/org/apache/hadoop/ozone/container/common/utils/ContainerLogger.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/impl/BlockManagerImpl.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Outdated
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
Show resolved
Hide resolved
|
|
||
| chunkByteBuffer.flip(); | ||
| ChunkBuffer chunkBuffer = ChunkBuffer.wrap(chunkByteBuffer); | ||
| writeChunkForClosedContainer(chunkInfo, blockID, chunkBuffer, container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a comment here also explaining that if we are missing a few chunks at the end and one update fails, we may get holes in the block.
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @aswinshakil. Just minor comments on the tests left. This part should be much simpler once HDDS-10374 and HDDS-11942 are done.
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...er-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
...st/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Outdated
Show resolved
Hide resolved
errose28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once CI is green. Thanks for the consistent effort on this @aswinshakil.
b34c537
into
apache:HDDS-10239-container-reconciliation
…anner-builds-mt * HDDS-10239-container-reconciliation: (646 commits) HDDS-11763. Implement container repair logic within datanodes. (apache#7474) HDDS-11887. Recon - Identify container replicas difference based on content checksums (apache#7942) HDDS-12397. Persist putBlock for closed container. (apache#7943) HDDS-12361. Mark testGetSnapshotDiffReportJob as flaky HDDS-12375. Random object created and used only once (apache#7933) HDDS-12335. Fix ozone admin namespace summary to give complete output (apache#7908) HDDS-12364. Require Override annotation for overridden methods (apache#7923) HDDS-11530. Support listMultipartUploads max uploads and markers (apache#7817) HDDS-11867. Remove code paths for non-Ratis SCM. (apache#7911) HDDS-12363. Add import options to IntelliJ IDEA style settings (apache#7921) HDDS-12215. Mark testContainerStateMachineRestartWithDNChangePipeline as flaky HDDS-12331. BlockOutputStream.failedServers is not thread-safe (apache#7885) HDDS-12188. Move server-only upgrade classes from hdds-common to hdds-server-framework (apache#7903) HDDS-12362. Remove temporary checkstyle suppression file (apache#7920) HDDS-12343. Fix spotbugs warnings in Recon (apache#7902) HDDS-12286. Fix license headers and imports for ozone-tools (apache#7919) HDDS-12284. Fix license headers and imports for ozone-s3-secret-store (apache#7917) HDDS-12275. Fix license headers and imports for ozone-integration-test (apache#7904) HDDS-12164. Rename and deprecate DFSConfigKeysLegacy config keys (apache#7803) HDDS-12283. Fix license headers and imports for ozone-recon-codegen (apache#7916) ... Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerMerkleTreeWriter.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestBackgroundContainerDataScanner.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOnDemandContainerDataScanner.java hadoop-ozone/dist/src/main/smoketest/admincli/container.robot hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerDataScannerIntegration.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerMetadataScannerIntegration.java hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestOnDemandContainerDataScannerIntegration.java
| .build(); | ||
| // Under construction is set here, during BlockInputStream#initialize() it is used to update the block length. | ||
| blkInfo.setUnderConstruction(true); | ||
| try (BlockInputStream blockInputStream = (BlockInputStream) blockInputStreamFactory.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aswinshakil , @errose28 , We need a better API to access the block data. Otherwise, the underlying stream cannot be easily changed (as you may know, we are working on a new read stream in HDDS-10338.)
Cc @chungen0126 , @sodonnel
What changes were proposed in this pull request?
#7293 Implements the container comparison logic and finds the container diff between two container merkle tree.
This patch is part (2/2) of the previous patch. It implements repairing one container replica with its peer container replica.
The
ContainerDiffReportgenerated by the comparison logic consists of missing blocks, missing chunks, and corrupt chunks. These blocks/chunks are read from the peer container and added or replaced with our container replica.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11763
How was this patch tested?