Skip to content

Conversation

@aswinshakil
Copy link
Member

@aswinshakil aswinshakil commented Nov 22, 2024

What changes were proposed in this pull request?

#7293 Implements the container comparison logic and finds the container diff between two container merkle tree.

This patch is part (2/2) of the previous patch. It implements repairing one container replica with its peer container replica.

The ContainerDiffReport generated by the comparison logic consists of missing blocks, missing chunks, and corrupt chunks. These blocks/chunks are read from the peer container and added or replaced with our container replica.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11763

How was this patch tested?

  • Unit test and Integration test

@aswinshakil aswinshakil marked this pull request as ready for review January 14, 2025 18:28
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @aswinshakil

@kerneltime
Copy link
Contributor

@aswinshakil can you resolve the merge conflicts?

…pair

Conflicts:
	hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
	hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/TestReconcileContainerCommandHandler.java
	hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueHandler.java
	hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @aswinshakil. I've only reviewed the non-test code for now since there are a lot of comments. I think we need to look into using BlockInputStream and seek for the reading part of repairs instead of implementing our own lower level logic.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @aswinshakil. Still a fair amount of comments, although most are minor. I've reviewed everything except TestKeyValueHandler and TestContainerCommandReconciliation. Once the prod code looks good I will start reviewing those larger test changes.

@kerneltime
Copy link
Contributor

I am done with my review. We can merge one @errose28 approves as well. We can add more tests in follow up jiras.

Copy link
Contributor

@hemantk-12 hemantk-12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aswinshakil for the patch.


chunkByteBuffer.flip();
ChunkBuffer chunkBuffer = ChunkBuffer.wrap(chunkByteBuffer);
writeChunkForClosedContainer(chunkInfo, blockID, chunkBuffer, container);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a comment here also explaining that if we are missing a few chunks at the end and one update fails, we may get holes in the block.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @aswinshakil. Just minor comments on the tests left. This part should be much simpler once HDDS-10374 and HDDS-11942 are done.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI is green. Thanks for the consistent effort on this @aswinshakil.

@kerneltime kerneltime merged commit b34c537 into apache:HDDS-10239-container-reconciliation Apr 10, 2025
43 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Apr 11, 2025
…anner-builds-mt

* HDDS-10239-container-reconciliation: (646 commits)
  HDDS-11763. Implement container repair logic within datanodes. (apache#7474)
  HDDS-11887. Recon - Identify container replicas difference based on content checksums (apache#7942)
  HDDS-12397. Persist putBlock for closed container. (apache#7943)
  HDDS-12361. Mark testGetSnapshotDiffReportJob as flaky
  HDDS-12375. Random object created and used only once (apache#7933)
  HDDS-12335. Fix ozone admin namespace summary to give complete output (apache#7908)
  HDDS-12364. Require Override annotation for overridden methods (apache#7923)
  HDDS-11530. Support listMultipartUploads max uploads and markers (apache#7817)
  HDDS-11867. Remove code paths for non-Ratis SCM. (apache#7911)
  HDDS-12363. Add import options to IntelliJ IDEA style settings (apache#7921)
  HDDS-12215. Mark testContainerStateMachineRestartWithDNChangePipeline as flaky
  HDDS-12331. BlockOutputStream.failedServers is not thread-safe (apache#7885)
  HDDS-12188. Move server-only upgrade classes from hdds-common to hdds-server-framework (apache#7903)
  HDDS-12362. Remove temporary checkstyle suppression file (apache#7920)
  HDDS-12343. Fix spotbugs warnings in Recon (apache#7902)
  HDDS-12286. Fix license headers and imports for ozone-tools (apache#7919)
  HDDS-12284. Fix license headers and imports for ozone-s3-secret-store (apache#7917)
  HDDS-12275. Fix license headers and imports for ozone-integration-test (apache#7904)
  HDDS-12164. Rename and deprecate DFSConfigKeysLegacy config keys (apache#7803)
  HDDS-12283. Fix license headers and imports for ozone-recon-codegen (apache#7916)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeWriter.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerMerkleTreeWriter.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestKeyValueContainerCheck.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestBackgroundContainerDataScanner.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/ozoneimpl/TestOnDemandContainerDataScanner.java
hadoop-ozone/dist/src/main/smoketest/admincli/container.robot
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerDataScannerIntegration.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestBackgroundContainerMetadataScannerIntegration.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/scanner/TestOnDemandContainerDataScannerIntegration.java
.build();
// Under construction is set here, during BlockInputStream#initialize() it is used to update the block length.
blkInfo.setUnderConstruction(true);
try (BlockInputStream blockInputStream = (BlockInputStream) blockInputStreamFactory.create(
Copy link
Contributor

@szetszwo szetszwo Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aswinshakil , @errose28 , We need a better API to access the block data. Otherwise, the underlying stream cannot be easily changed (as you may know, we are working on a new read stream in HDDS-10338.)

Cc @chungen0126 , @sodonnel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants