Skip to content

Conversation

@aswinshakil
Copy link
Member

What changes were proposed in this pull request?

In #8204 , during data node startup, we read the <containerID>.tree to get the container checksum, which is used to update the in-memory container data. We want to store this dataChecksum in RocksDB, because in a huge cluster, deserializing <containerID>.tree for every container on startup may have an impact. In this patch

  • We store the dataChecksum in RocksDB so that we don't have to deserialize the .tree file every time during DN restart.
  • Fixed packing and unpacking of .tree file during container import/export.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-12824

How was this patch tested?

Updated existing tests

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @aswinshakil. I have done a partial review, have not looked at the import/export changes yet.

@aswinshakil aswinshakil changed the base branch from HDDS-10239-container-reconciliation to master June 26, 2025 07:12
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @aswinshakil. I reviewed the import/export code as well this round, so there are some more comments in that area.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes @aswinshakil. Looks like we should wait until #8565 is merged before proceeding with this one since they overlap in some areas. We should update TestContainerReader or somewhere similar for tests of loading containers when the tree and checksums may or may not exist.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments mostly based on non-existent tree handling after #8565.

@errose28
Copy link
Contributor

Test failure was not related to this change. It is being fixed in #8876

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. A few more comments, and this one still needs to be addressed.

@aswinshakil
Copy link
Member Author

@errose28 testContainerLoadingWithNoChecksumAnywhere and testContainerLoadingWithoutMerkleTree address the comment mentioned above.

@errose28
Copy link
Contributor

@errose28 testContainerLoadingWithNoChecksumAnywhere and testContainerLoadingWithoutMerkleTree address the #8604 (comment).

These tests are in TestContainerReader for startup testing, but that comment is on TestTarContainerPacker for import/export. We are testing that the tree file made it through import/export, but we are not checking that the DB and in-memory ContainerData got updated with its checksum at the destination.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@errose28 errose28 merged commit 5e03af1 into apache:master Aug 6, 2025
42 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Aug 12, 2025
* master: (55 commits)
  HDDS-13525. Rename configuration property to ozone.om.compaction.service.enabled (apache#8928)
  HDDS-13519. Reconciliation should continue if a peer datanode is unreachable (apache#8908)
  HDDS-13566. Fix incorrect authorizer class in ACL documentation (apache#8931)
  HDDS-13084. Trigger on-demand container scan when a container moves from open to unhealthy. (apache#8904)
  HDDS-13432. Accelerating Namespace Usage Calculation in Recon using - Materialised Approach (apache#8797)
  HDDS-13557. Bump jline to 3.30.5 (apache#8920)
  HDDS-13556. Bump assertj-core to 3.27.4 (apache#8919)
  HDDS-13543. [Docs] Design doc for OM bootstrapping process with snapshots. (apache#8900)
  HDDS-13541. Bump sonar-maven-plugin to 5.1.0.4751 (apache#8911)
  HDDS-13101. Remove duplicate information in datanode list output (apache#8523)
  HDDS-13528. Handle null paths when the NSSummary is initializing (apache#8901)
  HDDS-12990. (addendum) Generate tree from metadata when it does not exist during getContainerChecksumInfo call (apache#8881)
  HDDS-13086. Block duplicate reconciliation requests for the same container and datanode within the datanode. (apache#8905)
  HDDS-12990. Generate tree from metadata when it doesn't exist during getContainerChecksumInfo call (apache#8881)
  HDDS-12824. Optimize container checksum read during datanode startup (apache#8604)
  HDDS-13522. Rename axisLabel for No. of delete request received (apache#8879)
  HDDS-12196. Document ozone repair cli (apache#8849)
  HDDS-13514. Intermittent failure in TestNSSummaryMemoryLeak (apache#8889)
  HDDS-13423. Log reason for triggering on-demand container scan (apache#8854)
  HDDS-13466. Disable flaky TestOmSnapshotFsoWithNativeLibWithLinkedBuckets
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants