Skip to content

Conversation

@errose28
Copy link
Contributor

@errose28 errose28 commented Jun 6, 2024

What changes were proposed in this pull request?

Implement a central manager to coordinate reads and writes to datanode container merkle tree/checksum information. This patch defines the bare minimum API and protobuf structures needed for reconciliation:

  • Construct a container merkle tree from scratch
  • Mark blocks as deleted within a container
  • Persist the tree and deleted block information

Anything more advanced can be added with its implementation when or if it is needed. This includes the ability to diff two merkle trees, which is stubbed out here and planned for HDDS-10928.

What is the link to the Apache JIRA

HDDS-10887

How was this patch tested?

Unit tests added.

errose28 added 2 commits June 5, 2024 21:48
* HDDS-10239-container-reconciliation:
  HDDS-10372. SCM and Datanode communication for reconciliation (apache#6506)
  HDDS-10239. Storage Container Reconciliation. (apache#6121)
@errose28 errose28 marked this pull request as ready for review June 21, 2024 21:39
return tree;
}

private ContainerProtos.ContainerChecksumInfo readFile() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not move this into the ChecksumTreeManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ContainerChecksumInfo file/proto is not meant to be interacted with directly. Updates and modifications to that file should go through the manager. This would not be clear if there was a public method in the manager to read the file directly.

Making the method public with @VisibleForTesting is also not ideal because the base directory used in each method is different so that would add an extra parameter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline:

  • The read and write metrics are separate so the can still be tested even if the only public methods exposed are doing read-modify-write.
  • GRPC API to move the tree between datanodes will operate at the file level and will not deserialize the protobuf.
  • If a debug method is added to the client and we need to render the trees we can add such a method to hdds-common or similar package. That could also be invoked from this class if necessary.
    • This manager is designed to run in the datanode over a pool of containers and will not be suitable for client use anyways.

containerChecksumBuffer.putLong(blockTreeProto.getBlockChecksum());
}
containerChecksumBuffer.flip();
checksumImpl.update(containerChecksumBuffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to take the longs of the blocks and just call update with the bits shifted to calculate the checksum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can update this to use Java's CRC32 which has an update(int) method instead of our own ByteBuffer based implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion we are going to handle any optimizations in checksum computation in HDDS-11077

@kerneltime
Copy link
Contributor

Will give it one more round tonight. Overall the PR looks good, most of the changes can be done in a follow up PR.

@errose28 errose28 marked this pull request as draft June 25, 2024 18:09
@errose28
Copy link
Contributor Author

Going to push a few more commits before this is ready for full CI. Converting to draft to save CI time.

return tree;
}

private ContainerProtos.ContainerChecksumInfo readFile() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.

@errose28 errose28 marked this pull request as ready for review June 27, 2024 16:27
@kerneltime kerneltime merged commit d585363 into apache:HDDS-10239-container-reconciliation Jun 28, 2024
errose28 added a commit to errose28/ozone that referenced this pull request Jun 28, 2024
…-delete

* HDDS-10239-container-reconciliation:
  HDDS-10887. Implement a basic Merkle Tree Manager. (apache#6778)
  HDDS-10923. Container Scanner should still scan unhealthy containers. (apache#6809)

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants