Skip to content

Conversation

@errose28
Copy link
Contributor

@errose28 errose28 commented May 6, 2025

What changes were proposed in this pull request?

Create a unit test framework that writes container data locally, and mocks an ozone client to operate directly on those containers. This will allow us to test multiple failure combinations across replicas using container scan and repair while running much faster than a miniozone integration test.

The major components of this change are:

  • Migrate a test from TestKeyValueHandler into its own test class for unit testing scan and repair across replicas.
  • Numerous fixes to the container repair code to fix bugs exposed by this test.
  • Logging improvements to track reconciliation progress.

As part of the review, it will be helpful to pull #7490, which also incorporates this change, and run the tests there where they will pass with the scanner changes. Inspect the new log messages to make sure that they help follow what reconciliation is doing in each of these situations. The 10 missing blocks + 5 corrupt chunks case is one of the most involved tests to inspect the output from.

What is the link to the Apache JIRA

HDDS-12980

How was this patch tested?

The tests depend on HDDS-10374/#7490 to run. For now the tests are ignored to split up the review of HDDS-10374/#7490. I have also pushed the test to that change and they are passing. Once this PR is merged we can start running the tests in HDDS-10374/#7490.

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting the intermediate review here. Still need to review tests and the KeyValueHandler reconciliation logic

+ data.getContainerID(), ex);
}
// Set in-memory data checksum.
data.setDataChecksum(checksumInfo.getContainerMerkleTree().getDataChecksum());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this is also being used by markBlockAsDeleted in case of failure in that code flow we also want set the checksum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it from here because markBlocksAsDeleted does not affect the merkle tree and should not change the data checksum.

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @errose28. I have posted some comments here.

@errose28
Copy link
Contributor Author

errose28 commented May 8, 2025

@aswinshakil All comments should be addressed in the latest commits. I just need to circle back for some checkstyle failures.

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @errose28. The changes look good to me. Let me trigger the CI.

@aswinshakil aswinshakil marked this pull request as ready for review May 9, 2025 08:04
Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failures are not related to this change. LGTM+1

@aswinshakil aswinshakil merged commit a355664 into apache:HDDS-10239-container-reconciliation May 13, 2025
38 of 43 checks passed
@errose28
Copy link
Contributor Author

Thanks for the review @aswinshakil. We can do a reverse merge of master into the feature branch to bring in #8443 which should resolve the test issues for future PRs.

errose28 added a commit to errose28/ozone that referenced this pull request May 14, 2025
…anner-builds-mt

* HDDS-10239-container-reconciliation:
  HDDS-12980. Add unit test framework for reconciliation. (apache#8402)

hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeWriter.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/ozoneimpl/ContainerController.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTreeTestUtils.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerMerkleTreeWriter.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/keyvalue/TestContainerReconciliationWithMockDatanodes.java
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/checksum/TestContainerCommandReconciliation.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants