-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode #6185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| decommissioningNodes.size() + " node(s)"); | ||
| } | ||
|
|
||
| String metricsJson = scmClient.getMetrics("Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice - that you can filter the metrics server side with the query string. I thought we would have to do that client side, but this is better.
|
The change LGTM. Have you tried it out in docker-compose cluster and validated that it all works fine when one node is decommissioning, perhaps multiple nodes decommissioning and none are decommissioning? |
|
I have tested it for all three cases in docker-compose: Case-2: 1 node (first output is when metrics are not available yet, second is once the metrics are updated) Case-3: 2 nodes decommissioning (The empty braces at the end of each output are for the container lists, it can be ignored for this PR) |
What changes were proposed in this pull request?
In order to track the progress of the decommissioning of a datanode, the number of pipelines associated to the datanode and the number of containers on the datanode blocking the decommissioning (i.e., unhealthy and under-replicated containers) is necessary to be shown as a part of the decommission status command.
These counts, along with the time at which decommission was started for the datanode are stored as a part of metrics in NodeDecommissionMetrics. In this PR, a class similar to JMXJsonServerlet (from hadoop-common) is introduced in scm-server, which can accept a request for metrics from a specific class. The response is parsed to display the counts and start-time for each node currently in DECOMMISSIONING.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-9738
How was this patch tested?
Updated tests in TestDecommissionStatusSubCommand. Also tested in docker cluster.
When metrics haven't been updated-
When metrics have been updated-