Skip to content

Conversation

@Tejaskriya
Copy link
Contributor

@Tejaskriya Tejaskriya commented Feb 7, 2024

What changes were proposed in this pull request?

In order to track the progress of the decommissioning of a datanode, the number of pipelines associated to the datanode and the number of containers on the datanode blocking the decommissioning (i.e., unhealthy and under-replicated containers) is necessary to be shown as a part of the decommission status command.
These counts, along with the time at which decommission was started for the datanode are stored as a part of metrics in NodeDecommissionMetrics. In this PR, a class similar to JMXJsonServerlet (from hadoop-common) is introduced in scm-server, which can accept a request for metrics from a specific class. The response is parsed to display the counts and start-time for each node currently in DECOMMISSIONING.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9738

How was this patch tested?

Updated tests in TestDecommissionStatusSubCommand. Also tested in docker cluster.
When metrics haven't been updated-

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 1 node(s)

Datanode: 554a15ca-5530-4d4a-8b3a-10d475725fe1 (/default-rack/172.19.0.11/ozone-datanode-1.ozone_default)
Error getting pipeline and container counts for ozone-datanode-1.ozone_default
{}

When metrics have been updated-

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 1 node(s)

Datanode: 554a15ca-5530-4d4a-8b3a-10d475725fe1 (/default-rack/172.19.0.11/ozone-datanode-1.ozone_default)
Decommission started at : 13/02/2024 05:28:28 UTC
No. of Pipelines: 1
No. of UnderReplicated containers: 0.0
No. of Unclosed Containers: 0.0
{}

@Tejaskriya Tejaskriya changed the title HDDS-9738. HDDS-9738. Display startTime, pipeline and container counts for decommissioning datanode Feb 7, 2024
decommissioningNodes.size() + " node(s)");
}

String metricsJson = scmClient.getMetrics("Hadoop:service=StorageContainerManager,name=NodeDecommissionMetrics");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice - that you can filter the metrics server side with the query string. I thought we would have to do that client side, but this is better.

@sodonnel
Copy link
Contributor

The change LGTM. Have you tried it out in docker-compose cluster and validated that it all works fine when one node is decommissioning, perhaps multiple nodes decommissioning and none are decommissioning?

@Tejaskriya
Copy link
Contributor Author

Tejaskriya commented Feb 13, 2024

I have tested it for all three cases in docker-compose:
Case-1: no nodes

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 0 node(s)

Case-2: 1 node (first output is when metrics are not available yet, second is once the metrics are updated)

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 1 node(s)

Datanode: 554a15ca-5530-4d4a-8b3a-10d475725fe1 (/default-rack/172.19.0.11/ozone-datanode-1.ozone_default)
Error getting pipeline and container counts for ozone-datanode-1.ozone_default
{}

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 1 node(s)

Datanode: 554a15ca-5530-4d4a-8b3a-10d475725fe1 (/default-rack/172.19.0.11/ozone-datanode-1.ozone_default)
Decommission started at : 13/02/2024 05:28:28 UTC
No. of Pipelines: 1
No. of UnderReplicated containers: 0.0
No. of Unclosed Containers: 0.0
{}

Case-3: 2 nodes decommissioning

bash-4.2$ ozone admin datanode status decommission

Decommission Status: DECOMMISSIONING - 2 node(s)

Datanode: 1a067845-b5a2-4f2a-b1c8-70a2140173ee (/default-rack/172.23.0.8/ozone-datanode-2.ozone_default)
Decommission started at : 13/02/2024 06:31:31 UTC
No. of Pipelines: 1
No. of UnderReplicated containers: 0.0
No. of Unclosed Containers: 0.0
{}

Datanode: 2df9e226-3e04-404c-8836-986231ab2b82 (/default-rack/172.23.0.10/ozone-datanode-1.ozone_default)
Decommission started at : 13/02/2024 06:31:27 UTC
No. of Pipelines: 2
No. of UnderReplicated containers: 0.0
No. of Unclosed Containers: 0.0
{}

(The empty braces at the end of each output are for the container lists, it can be ignored for this PR)
The results are same for secure and HA clusters as well

@sodonnel sodonnel merged commit 3c4683e into apache:master Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants