-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-9889. Refactor tests related to dynamical adaptation for datanode limits in ContainerBalancer #5758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-9889. Refactor tests related to dynamical adaptation for datanode limits in ContainerBalancer #5758
Conversation
|
@JacksonYao287 , @sumitagrawl, please review the changes |
c65d14f to
a839f3e
Compare
|
@adoroszlai, UPD: done! |
eed7a9a to
f8dc3fc
Compare
|
UPD: Today I rebased this PR on the master branch (to get the latest changes) |
Thanks @Montura. Only need to update from @siddhantsangwan @sumitagrawl can you please review? |
f8dc3fc to
5d175ae
Compare
|
@siddhantsangwan @sumitagrawl could you please review? |
|
UPD: Today I merged master branch in this PR to resolve conflicts |
|
@siddhantsangwan @sumitagrawl could you please review? |
|
@Montura please keep in mind that end of year is usually a holiday season in many places |
|
@Montura Sorry about the code conflicts. This PR does not allow edit from maintainers, is that intentional? If it did, I'd try to keep it updated after merging PRs touching the same files. |
|
@siddhantsangwan @sumitagrawl please take a look at the patch, to provide high-level feedback until conflicts are resolved |
Tomorrow I'll resolve conflicts, it wasn't intentional to forbid PR editing for maintainers. It's my first PR here, I'll do better next time. |
No worries. |
|
@Montura Thanks for working on this. I'm trying to understand the problem. If you wrote a test that fails without your fix, please point me to it. |
Sure, I'm merging current master now, when I finish, I'll point out to the test |
I didn't really get this. Can you please elaborate? It'd be helpful to have a small example where you describe this problem. |
Let's imaging that you have a cluster with the total DNs number equals to any value of [4, 9] (4 or 5 or 6 or 7 or 8 or 9). Then the maximum value of DN's that could be involved in balancing for that clusters will be By spec java primitive narrowing conversion, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3) The Java programming language uses round toward zero when converting a floating value to an integer (§5.1.3), which acts, in this case, as though the number were truncated, discarding the mantissa bits. Rounding toward zero chooses as its result the format's value closest to and no greater in magnitude than the infinitely precise result. So we got under-utilized nodes that will never take part in balancing at all. All clusters with DNs count > 3 and < 10 will start balancing and do nothing because of action in |
|
Ah, I understand. Yes, I've seen this happen in some small clusters. The recommendation is to increase the value of |
|
Ok, make it sense. Let me rewrite the tests to verify desired behavior by increase the value of What do you think? UPD: I've updated PR with using |
0332673 to
0239124
Compare
1c19524 to
fdb7ed6
Compare
…roperty explicitly in tests
fdb7ed6 to
89bd705
Compare
…ancerConfiguration more explicitly
492863c to
1f126d1
Compare
|
@siddhantsangwan, I applied all your suggestions. Please, look at the PR once again |
siddhantsangwan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Montura Thanks for the update. LGTM.
|
@Montura can you push an empty commit with no changes, and the commit message saying "Trigger CI"?
for 1 workflow. @adoroszlai any idea? |
Empty commit is disabled, any changes required. Let's wait for @adoroszlai |
|
Done |
|
Merge please |
|
Thanks @Montura for continued efforts on this. Thanks @siddhantsangwan for the review. |
…e limits in ContainerBalancer (apache#5758) (cherry picked from commit 78a7e7a)
…e limits in ContainerBalancer (apache#5758) (cherry picked from commit 78a7e7a)
…e limits in ContainerBalancer (apache#5758) (cherry picked from commit 78a7e7a)
…e limits in ContainerBalancer (apache#5758) (cherry picked from 78a7e7a)
Dynamical adaptation (introduced in HDDS-5526) for
datanodes.involved.max.percentage.per.iterationin container balancer doesn't work well in some cases.Sometimes the number of under-utilized nodes may not be sufficient to satisfy the limit about the max percent of datanodes participating in the balance iteration (
datanodes.involved.max.percentage.per.iteration). Thus, collections of source and target datanodes are reset and balancing is skipped (see comment).The issue it can be easily detected when cluster has few nodes (< 10), for example 4 or 5. To fix this case we have to set
datanodes.involved.max.percentage.per.iterationvalue to 100.@siddhantsangwan wrote a small documentaion with some facing details about container balancer.
What changes were proposed in this pull request?
Introduced
TestableClusterclass to reuse it in tests for clusters with the different numbers of of datanodes.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9889
How was this patch tested?
hdds.scm.container.balancer.TestContainerBalancerTaskis reworked:hdds.scm.container.balancer.MockedSCMfor setting up testablehdds.scm.server.StorageContainerManagerhdds.scm.container.balancer.TestableClusterfor creating test cluster with a required number of datanodesTestContainerBalancerDatanodeNodeLimittest with 3 tests extracted fromTestContainerBalancerTaskwith cluster with different node count.