-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10887. Implement a basic Merkle Tree Manager. #6778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10887. Implement a basic Merkle Tree Manager. #6778
Conversation
* HDDS-10239-container-reconciliation: HDDS-10372. SCM and Datanode communication for reconciliation (apache#6506) HDDS-10239. Storage Container Reconciliation. (apache#6121)
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Outdated
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Outdated
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Show resolved
Hide resolved
| return tree; | ||
| } | ||
|
|
||
| private ContainerProtos.ContainerChecksumInfo readFile() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not move this into the ChecksumTreeManager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ContainerChecksumInfo file/proto is not meant to be interacted with directly. Updates and modifications to that file should go through the manager. This would not be clear if there was a public method in the manager to read the file directly.
Making the method public with @VisibleForTesting is also not ideal because the base directory used in each method is different so that would add an extra parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline:
- The read and write metrics are separate so the can still be tested even if the only public methods exposed are doing read-modify-write.
- GRPC API to move the tree between datanodes will operate at the file level and will not deserialize the protobuf.
- If a debug method is added to the client and we need to render the trees we can add such a method to hdds-common or similar package. That could also be invoked from this class if necessary.
- This manager is designed to run in the datanode over a pool of containers and will not be suitable for client use anyways.
| containerChecksumBuffer.putLong(blockTreeProto.getBlockChecksum()); | ||
| } | ||
| containerChecksumBuffer.flip(); | ||
| checksumImpl.update(containerChecksumBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to take the longs of the blocks and just call update with the bits shifted to calculate the checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I can update this to use Java's CRC32 which has an update(int) method instead of our own ByteBuffer based implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some offline discussion we are going to handle any optimizations in checksum computation in HDDS-11077
...er-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTree.java
Show resolved
Hide resolved
|
Will give it one more round tonight. Overall the PR looks good, most of the changes can be done in a follow up PR. |
|
Going to push a few more commits before this is ready for full CI. Converting to draft to save CI time. |
| return tree; | ||
| } | ||
|
|
||
| private ContainerProtos.ContainerChecksumInfo readFile() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
Outdated
Show resolved
Hide resolved
…-delete * HDDS-10239-container-reconciliation: HDDS-10887. Implement a basic Merkle Tree Manager. (apache#6778) HDDS-10923. Container Scanner should still scan unhealthy containers. (apache#6809) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
What changes were proposed in this pull request?
Implement a central manager to coordinate reads and writes to datanode container merkle tree/checksum information. This patch defines the bare minimum API and protobuf structures needed for reconciliation:
Anything more advanced can be added with its implementation when or if it is needed. This includes the ability to diff two merkle trees, which is stubbed out here and planned for HDDS-10928.
What is the link to the Apache JIRA
HDDS-10887
How was this patch tested?
Unit tests added.