Skip to content

Conversation

@afilpp
Copy link
Contributor

@afilpp afilpp commented Mar 29, 2024

What changes were proposed in this pull request?

HDDS-10612. Add Robot test to verify Container Balancer for RATIS containers

Currently there are only unit tests for Container Balancer and no acceptance tests at all. At a minimum, we should add a Robot test to verify Container Balancer for RATIS containers. And probably in the future we should add robot test for EC case.

Test case:

  1. Move 1 datanode to maintenance mode (we use 4 datanodes in this test)
  2. Create multiple keys (after loading the data, we check that 3 datanodes are ~60% busy, and the one that is in maintenance mode is empty)
  3. Start datanode recommission (wait until datanode recommissioning is completed)
  4. Start container balancer (wait until container balancer is completed)
  5. Check results (after balancing on all 4 datanodes, we should see approximately the same data distribution.)

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10612

How was this patch tested?

Added Robot test

@adoroszlai adoroszlai self-requested a review March 29, 2024 13:59
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afilpp for the patch. Overall looks good, added some minor comments.

It would be nice to create the environment as an add-on for ozone-ha instead of a completely separate one, but we can check if it's feasible in a follow-up task.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afilpp for updating the patch, LGTM.

Copy link
Contributor

@myskov myskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @siddhantsangwan please take a look

Copy link
Contributor

@siddhantsangwan siddhantsangwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The robot test logic LGTM.

@myskov myskov merged commit 129cdc1 into apache:master Apr 2, 2024
@ivandika3
Copy link
Contributor

ivandika3 commented Apr 4, 2024

Seems there is an intermittent failure on the acceptance test

https://github.com/apache/ozone/actions/runs/8546730074/job/23418032793

@afilpp Could you take a look?

Edit: Can refer to the comment in HDDS-10612 for possible root cause.

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Sep 16, 2024
xichen01 pushed a commit to xichen01/ozone that referenced this pull request Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants