Skip to content

Conversation

@kostacie
Copy link
Contributor

What changes were proposed in this pull request?

In this PR, unit tests for RatisContainerSafeModeRule and ECContainerSafeModeRule have been added.

What is the link to the Apache JIRA

HDDS-12000

How was this patch tested?

https://github.com/kostacie/ozone/actions/runs/16267148185

Copy link
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more corner case scenarios are worth adding:

  @Test
   public void testNoContainersPresent() {
       when(containerManager.getContainers(ReplicationType.RATIS)).thenReturn(Collections.emptyList());
       rule.refresh(false);
       assertEquals(0.0, rule.getCurrentContainerThreshold(), "Threshold should be 0.0 when no containers present");
       assertFalse(rule.validate(), "Validate should return false when no containers present");
   }

   @Test
   public void testAllContainersClosed() {
       List<ContainerInfo> closedContainers = Arrays.asList(
           ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.CLOSED),
           ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.CLOSED)
       );
       when(containerManager.getContainers(ReplicationType.RATIS)).thenReturn(closedContainers);
       rule.refresh(false);
       assertEquals(0.0, rule.getCurrentContainerThreshold(), "Threshold should be 0.0 when all containers are closed");
       assertFalse(rule.validate(), "Validate should return false when all containers are closed");
   }

   @Test
   public void testAllContainersOpen() {
       List<ContainerInfo> openContainers = Arrays.asList(
           ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.OPEN),
           ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.OPEN)
       );
       when(containerManager.getContainers(ReplicationType.RATIS)).thenReturn(openContainers);
       rule.refresh(false);
       // Depending on the rule logic, adjust the expectation below:
       assertEquals(1.0, rule.getCurrentContainerThreshold(), "Threshold should be 1.0 when all containers are open");
       assertTrue(rule.validate(), "Validate should return true when all containers are open");
   }

   @Test
   public void testNullContainerInfo() {
       List<ContainerInfo> containers = Arrays.asList(
           null,
           ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.OPEN)
       );
       when(containerManager.getContainers(ReplicationType.RATIS)).thenReturn(containers);
       assertDoesNotThrow(() -> rule.refresh(false), "Null containers should be skipped/handled gracefully");
   }

   @Test
   public void testDuplicateContainerIdsInReports() {
       ContainerInfo container = ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.OPEN);
       int containerId = container.getContainerID();
       rule.refresh(false);
       rule.process(containerId);
       rule.process(containerId); // Duplicate process
       // The rule should not double-count the container
       // Add your assertion based on how the threshold should change
   }

   @Test
   public void testValidateBasedOnReportProcessingTrue() {
       rule.setValidateBasedOnReportProcessing(true);
       // Simulate report processing
       ContainerInfo container = ScmTestUtils.getContainerWithState(ContainerReplicaProto.State.OPEN);
       when(containerManager.getContainers(ReplicationType.RATIS)).thenReturn(Collections.singletonList(container));
       rule.refresh(false);
       rule.process(container.getContainerID());
       assertTrue(rule.validate(), "Should validate based on report processing");
   }
  1. No Containers Present
    Test behavior when the container list is empty.
    This will verify that methods like refresh, validate, and process handle the empty case gracefully.

  2. All Containers in CLOSED State
    Test with all containers in CLOSED state to ensure the rule behaves as expected (e.g., threshold remains at 0, or validate returns false if that's the intended logic).

  3. All Containers in OPEN State
    Test with all containers in OPEN state to verify the rule's threshold and validation logic.

  4. Null or Corrupt ContainerInfo
    Test handling of nulls or bogus container objects in the list—either skip or throw a controlled exception.

  5. Duplicate Container IDs in Reports
    Test processing a report that contains duplicate replicas/containers to ensure no double-counting.

  6. validateBasedOnReportProcessing = true
    You currently call rule.setValidateBasedOnReportProcessing(false);. Add a test with this flag set to true and verify the code path.

}

@ParameterizedTest
@EnumSource(value = LifeCycleState.class, names = {"OPEN", "CLOSED"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about covering more states?

enum LifeCycleState {
    OPEN = 1;
    CLOSING = 2;
    QUASI_CLOSED = 3;
    CLOSED = 4;
    DELETING = 5;
    DELETED = 6; // object is deleted.
    RECOVERING = 7;
}

@kostacie
Copy link
Contributor Author

kostacie commented Jul 15, 2025

@jojochuang thank you for the review.

I would only notice that there is no ScmTestUtils.getContainerWithState() method and the process() requires a NodeRegistrationContainerReport object, so I'll add and modify the tests taking this into account.

Also, there is an issue with two suggested tests testNoContainersPresentand testNullContainerInfo.
The first one is failing for some reason as getCurrentContainerThreshold equals 1.0 and rule.validate() returns true instead of false. Is it incorrect result for this test?
The second one causes NullPointerException while getting LifeCycleState because isClosed method (in RatisContainerSafeModeRule and ECContainerSafeModeRule) doesn't check nor ignore a value if it is null:

private boolean isClosed(ContainerInfo container) {
final LifeCycleState state = container.getState();
return state == LifeCycleState.QUASI_CLOSED || state == LifeCycleState.CLOSED;
}

@jojochuang jojochuang merged commit cc0a7ba into apache:master Jul 21, 2025
28 checks passed
@jojochuang
Copy link
Contributor

Thanks @kostacie this is merged. I added the "Generated-by:" tag because the suggested change was recommended by Copilot, which I took.

@adoroszlai
Copy link
Contributor

Thanks @kostacie for the patch. There is much duplication between the new test classes, created HDDS-13485 to address that.

errose28 added a commit to errose28/ozone that referenced this pull request Jul 30, 2025
* master: (730 commits)
  HDDS-13083. Handle cases where block deletion generates tree file before scanner (apache#8565)
  HDDS-12982. Reduce log level for snapshot validation failure (apache#8851)
  HDDS-13396. Documentation: Improve the top-level overview page for new users. (apache#8753)
  HDDS-13176. containerIds table value format change to proto from string (apache#8589)
  HDDS-13449. Incorrect Interrupt Handling for DirectoryDeletingService and KeyDeletingService (apache#8817)
  HDDS-2453. Add Freon tests for S3 MPU Keys (apache#8803)
  HDDS-13237. Container data checksum should contain block IDs. (apache#8773)
  HDDS-13489. Fix SCMBlockdeleting unnecessary iteration in corner case. (apache#8847)
  HDDS-13464. Make ozone.snapshot.filtering.service.interval reconfigurable (apache#8825)
  HDDS-13473. Amend validation for OZONE_OM_SNAPSHOT_DB_MAX_OPEN_FILES (apache#8829)
  HDDS-13435. Add an OzoneManagerAuthorizer interface (apache#8840)
  HDDS-8565. Recon memory leak in NSSummary (apache#8823).
  HDDS-12852. Implement a sliding window counter utility (apache#8498)
  HDDS-12000. Add unit test for RatisContainerSafeModeRule and ECContainerSafeModeRule (apache#8801)
  HDDS-13092. Container scanner should trigger volume scan when marking a container unhealthy (apache#8603)
  HDDS-13070. OM Follower changes to create and place sst files from hardlink file. (apache#8761)
  HDDS-13482. Mark testWriteStateMachineDataIdempotencyWithClosedContainer as flaky
  HDDS-13481. Fix success latency metric in SCM panels of deletion grafana dashboard (apache#8835)
  HDDS-13468. Update default value of ozone.scm.ha.dbtransactionbuffer.flush.interval. (apache#8834)
  HDDS-13410. Control block deletion for each DN from SCM. (apache#8767)
  ...

hadoop-hdds/common/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerReplicaInfo.java
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/container/ReconcileSubcommand.java
hadoop-ozone/cli-admin/src/test/java/org/apache/hadoop/hdds/scm/cli/container/TestReconcileSubcommand.java
jojochuang pushed a commit to jojochuang/ozone that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants