Skip to content

Conversation

@siddhantsangwan
Copy link
Contributor

What changes were proposed in this pull request?

The ContainerBalancerMetrics class is responsible for recording metrics. These metrics always show 0 when accessed through JMX, while expected values are greater than 0.

I deleted some metrics of the type MutableGaugeLong and instead used the type MutableCounterLong. Introduced a new metric countIterations that keeps count of the number of iterations that balancer has run for.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6244

How was this patch tested?

Updated TestContainerBalancer#testMetrics()

@siddhantsangwan
Copy link
Contributor Author

@lokeshj1703 @JacksonYao287 please take a look!

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhantsangwan Thanks for working on this! The changes look good to me. I have a few comments inline.

# Conflicts:
#	hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/container/balancer/TestContainerBalancer.java
@siddhantsangwan
Copy link
Contributor Author

@lokeshj1703 I've replaced the current metrics and included more testing.

Copy link
Contributor

@JacksonYao287 JacksonYao287 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @siddhantsangwan for the work ! the change looks good.
I also suggest that we can add aggregate metrics(total containers and total datasize). but this can be implemented in a seperate jira.

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siddhantsangwan for updating the PR! I have some minor comments.

@siddhantsangwan
Copy link
Contributor Author

Thanks for reviewing @lokeshj1703 @JacksonYao287
Addressed your comments and opened a new jira for aggregate metrics: https://issues.apache.org/jira/browse/HDDS-6362

Copy link
Contributor

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the PR! The changes look good to me. +1.
I have one minor comment, let me know if you would like to address it in this or new PR.

private MutableCounterLong dataSizeUnbalancedGB;

@Metric(about = "Number of unbalanced datanodes.")
private MutableCounterLong datanodesNumUnbalanced;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will also recommend changing names of other metrics. Like datanodesNum -> numDatanodes and movedContainersNum -> numMovedContainers.
We can do it in a separate PR as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I remember we had decided on this naming convention in a previous PR: #2230 (comment)

But this does conflict with numIterations where "num" is the prefix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. My idea is that it should be easy to search these metrics in prometheus. Maybe we need to have common prefix for balancer then. Can we check some naming guide here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into some examples. We've used "num" as a prefix in other places, such as numCacheHits, numCacheMisses, numInFlightReplications, numWriteStateMachineOps etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lokeshj1703 I've fixed the naming.

@lokeshj1703 lokeshj1703 merged commit b83c1f9 into apache:master Mar 3, 2022
@lokeshj1703
Copy link
Contributor

@siddhantsangwan Thanks for the contribution! @JacksonYao287 Thanks for the reviews! I have committed the PR to master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants