Skip to content

Conversation

@errose28
Copy link
Contributor

@errose28 errose28 commented Jun 4, 2025

What changes were proposed in this pull request?

In a cluster upgraded to the first version supporting reconciliation, it is likely that block deletion will reach many closed containers before the scanner does, which will generate the checksum file with a block delete list but no merkle tree. This PR adds some improvements to this area:

  • Verified our tree parsing code will not raise errors when it encounters a checksum file with this content.
  • File existence should not be used to imply that a checksum for the container has been generated. The in-memory ContainerData#dataChecksum should be used for this instead.

Additionally a few minor fixes were noticed while debugging this and have been added here:

  • Add container ID to container diff toString
  • Do not generate a data checksum for CLOSING containers. Wait until writes are fully blocked by either CLOSED, QUASI_CLOSED, or UNHEALTHY states.
  • Add logging in the tree diff for chunks that are skipped because the peer's copy is unhealthy. We already have logging for this in the main reconciliation section as well, but some may have been filtered out in this section prior.

What is the link to the Apache JIRA

HDDS-13083

How was this patch tested?

It turns out we already have a test that reading a checksum file with only a deleted block list returns an empty merkle tree instead of null: org.apache.hadoop.ozone.container.checksum.TestContainerChecksumTreeManager#testWriteOnlyDeletedBlocksToFile

I added a unit test for the new needsDataChecksum method that replaces the old file existence check.

The remainder of this change is refactoring to simplify and remove public methods in ContainerChecksumTreeManager to make sure no callers mistakenly assume the existence of a container file means that a checksum was already calculated for that container. Existing tests make sure that no functionality has changed.

I attempted to add more tests for logging in this commit but due to quirks with the log capturer and/or log4j config I could not get them to pass, despite the code to log the messages definitely being hit. After much troubleshooting I abandoned the effort.

@errose28 errose28 changed the title Hdds 13083 handle empty mt HDDS-13083 Handle cases where block deletion generates tree file before scanner Jun 5, 2025
@errose28 errose28 changed the title HDDS-13083 Handle cases where block deletion generates tree file before scanner HDDS-13083. Handle cases where block deletion generates tree file before scanner Jun 5, 2025
Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @errose28. I have few minor comments, otherwise LGTM

@errose28
Copy link
Contributor Author

Aside from an unrelated Recon test failure that passed locally, CI is green on my fork. @aswinshakil this should be ready for another look.

@aswinshakil
Copy link
Member

HDDS-10239-container-reconciliation is merged into master. Please merge this change into master and any other future changes into master

* master: (170 commits)
  HDDS-12468. Check for space availability for all dns while container creation in pipeline (apache#8663)
  HDDS-12151. Fail write when volume is full considering min free space (apache#8642)
  HDDS-13251. Update Byteman usage README license (apache#8700)
  HDDS-13251. Support dynamic Byteman scripts via bmsubmit in ozonesecure-ha (apache#8654)
  HDDS-12070. Bump Ratis to 3.2.0 (apache#8689)
  HDDS-12984. Use InodeID to identify the SST files inside the tarball. (apache#8477)
  HDDS-13319. Simplify KeyPrefixFilter (apache#8692)
  HDDS-13295. Remove jackson1 exclusions for hadoop-common (apache#8687)
  HDDS-13314. Remove unused maven-pdf-plugin (apache#8686)
  HDDS-13324. Optimize memory footprint for Recon listKeys API (apache#8680)
  HDDS-13240. Add newly added metrics into grafana dashboard. (apache#8656)
  HDDS-13309. Add keyIterator/valueIterator methods to Table. (apache#8675)
  HDDS-13325. Introduce OZONE_SERVER_OPTS for common options for server processes (apache#8685)
  HDDS-13318. Simplify the getRangeKVs methods in Table (apache#8683)
  HDDS-13322. Remove module auto-detection from flaky-test-check (apache#8679)
  HDDS-13289. Remove usage of Jetty StringUtil (apache#8684)
  HDDS-13270. Reduce getBucket API invocations in S3 bucket owner verification (apache#8653)
  HDDS-13288. Container checksum file proto changes to account for deleted blocks. (apache#8649)
  HDDS-13306. Intermittent failure in testDirectoryDeletingServiceIntervalReconfiguration (apache#8682)
  HDDS-13317. Table should support empty array/String (apache#8676)
  ...

Conflicts:
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java
hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
@errose28 errose28 changed the base branch from HDDS-10239-container-reconciliation to master June 26, 2025 18:30
@errose28
Copy link
Contributor Author

I fixed the conflicts and updated the target branch. Once CI is green on my fork I will take this out of draft.

@errose28 errose28 marked this pull request as ready for review June 26, 2025 23:16
@errose28
Copy link
Contributor Author

errose28 commented Jun 27, 2025

The test failure is intermittent. It is related to reconciliation but not this change. I filed HDDS-13346 to fix it and tagged it as flaky in this change.

@kerneltime
Copy link
Contributor

LGTM

Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I have a minor comment. Apart from that we are good.

Also can we resolve the merge conflicts here?

errose28 added 4 commits July 22, 2025 13:53
* master: (90 commits)
  HDDS-13308. OM should expose Ratis config for increasing pending write limits (apache#8668)
  HDDS-8903. Add validation for ozone.om.snapshot.db.max.open.files. (apache#8787)
  HDDS-13429. Custom metadata headers with uppercase characters are not supported (apache#8805)
  HDDS-13448. DeleteBlocksCommandHandler thread stop for normal exception (apache#8816)
  HDDS-13346. Intermittent failure in TestCloseContainer#testContainerChecksumForClosedContainer (apache#8771)
  HDDS-13125. Add metrics for monitoring the SST file pruning threads. (apache#8764)
  HDDS-13367. [Docs] User doc for container balancer. (apache#8726)
  HDDS-13200. OM RocksDB Grafana Dashbroad shows no data on all panels (apache#8577)
  HDDS-13428. Recon - Retrigger of build whole NSSummary tree task submission inconsistency. (apache#8793)
  HDDS-13378. [Docs] Add a Production page under Getting Started (apache#8734)
  HDDS-13403. [Docs] Make feature proposal process more visible. (apache#8758)
  HDDS-11797. Remove cyclic dependency between SCMSafeModeManager and SafeModeRules (apache#8782)
  HDDS-13213. KeyDeletingService should limit task size by both key count and serialized size. (apache#8757)
  HDDS-13387. OMSnapshotCreateRequest logs invalid warning about DefaultReplicationConfig (apache#8760)
  HDDS-13405. ozone admin container create runs forever without kinit (apache#8765)
  HDDS-11514. Set optimal default values for delete configurations based on live cluster testing. (apache#8766)
  HDDS-13376. Add server-side limit note to ozone sh snapshot diff --page-size option (apache#8791)
  HDDS-11679. Support multiple S3Gs in MiniOzoneCluster (apache#8733)
  HDDS-13424. Use lsof instead of fuser to find if file is used in AbstractTestChunkManager (apache#8790)
  HDDS-13427. Bump awssdk to 2.31.78 (apache#8792)
  ...
@errose28 errose28 requested a review from aswinshakil July 22, 2025 18:05
Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM+1

@aswinshakil aswinshakil merged commit 1c2c7f3 into apache:master Jul 23, 2025
42 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2025
* master: (730 commits)
  HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565)
  HDDS-12982. Reduce log level for snapshot validation failure (apache#8851)
  HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753)
  HDDS-13176. containerIds table value format change to proto from string (apache#8589)
  HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817)
  HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803)
  HDDS-13237. Container data checksum should contain block IDs. (apache#8773)
  HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847)
  HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825)
  HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829)
  HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840)
  HDDS-8565. Recon memory leak in NSSummary (apache#8823).
  HDDS-12852. Implement a sliding window counter utility (apache#8498)
  HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801)
  HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603)
  HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761)
  HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky
  HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835)
  HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834)
  HDDS-13410. Control block deletion for each DN from SCM. (apache#8767)
  ...

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
Gargi-jais11 pushed a commit to Gargi-jais11/ozone that referenced this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants