Skip to content

Conversation

@mladjan-gadzic
Copy link
Contributor

What changes were proposed in this pull request?

Add servername tag for UGI metrics.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7648

How was this patch tested?

  • unit test
  • manual test

@mladjan-gadzic mladjan-gadzic marked this pull request as ready for review December 15, 2022 17:01
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mladjan-gadzic for the patch.

Comment on lines 32 to 41
<properties>
<ozone-common.version>1.3.0-SNAPSHOT</ozone-common.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.ozone</groupId>
<artifactId>ozone-common</artifactId>
<version>${ozone-common.version}</version>
</dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not introduce backwards dependency (hdds -> ozone). If we want to access some code currently in ozone from both, we should move it to hdds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a need for org.apache.hadoop.ozone.om.OMConfigKeys. Where exactly do you propose it should be moved to make the most sense out of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.hadoop.ozone.om package (om part yet to be created but at same level where org.apache.hadoop.hdds.scm.ScmConfigKeys resides) from common module from hadoop-hdds (hdds-common artifact id) sounds reasonable enough, but there are quite a few moving parts together with org.apache.hadoop.ozone.om.OMConfigKeys.

Copy link
Contributor

@xBis7 xBis7 Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai Thanks for looking into this. I don't think there is a package where we can access both OMConfigKeys and ScmConfigKeys but we might be able to simplify the switch case that requires both of them. We present prometheus metrics only for SCM and OM. We can only check the ScmConfigKeys and if it doesn't match then we can assume the server is OM without checking OMConfigKeys.

This might not be the best approach. We have the hostname available. There might be a way to use that to get the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai @xBis7 Thanks for the suggestions! Check for ScmConfigKeys is enough right now - it might not be the best future proof solution but it is still better than moving tons of classes just in order to use OMConfigKeys.

Copy link
Contributor

@adoroszlai adoroszlai Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a simple, more generic solution: PrometheusMetricsSink is created in BaseHttpServer, which has service-specific subclasses. So define a method in BaseHttpServer that returns the info you want to pass to PrometheusMetricsSink, then override in OzoneManagerHttpServer, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adoroszlai This might be exactly what I was looking for. Thanks for suggestion, will try to expand on it!

@adoroszlai
Copy link
Contributor

@xBis7 please review

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mladjan-gadzic Thanks for working on this. I left a few comments inline.

This code here should also be moved under PrometheusMetricsSinkUtil. PrometheusMetricsSink should call PrometheusMetricsSinkUtil methods and these methods in return should call any other methods that belong to DecayRpcSchedulerUtil, UgiMetricsUtil etc.

I'm proposing this refactoring in order to facilitate creating similar classes that might edit other metrics. We are currently editing DecayRpcScheduler and UGI metrics. There might be a future need to make similar changes to other metrics as well.

* Util class for
* {@link org.apache.hadoop.hdds.server.http.PrometheusMetricsSink}.
*/
public final class PrometheusMetricsSinkUtil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should be more generic and agnostic to any metrics specific operations. It would be best to have another final class called UgiMetricsUtil similar to DecayRpcSchedulerUtil and move the relevant methods under it.

Also, if it turns out that we have a dependency issue, it will be best to have the code that causes the issue in a new class and maybe move that class under another package.

*/
public final class PrometheusMetricsSinkUtil {

private static final String UGI_METRICS = "ugi_metrics";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under UgiMetricsUtil

public final class PrometheusMetricsSinkUtil {

private static final String UGI_METRICS = "ugi_metrics";
private static final String SERVER_NAME = "unknown";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under UgiMetricsUtil

* @param key metrics entry key
* @param metricTags list of metric tags
*/
private static void addServerNameTag(String key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under UgiMetricsUtil

* @param username caller username
* @param metricTags list of metric tags
*/
private static void addUsernameTag(String username,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under DecayRpcSchedulerUtil

* @param serverPort server port
* @return <tt>metricKey</tt> with replaced servername tag value.
*/
public static String replaceServerNameTagValue(String metricEntryKey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under UgiMetricsUtil

case OMConfigKeys.OZONE_OM_HTTP_BIND_PORT_DEFAULT:
serverName = "OM";
break;
case ScmConfigKeys.OZONE_SCM_HTTP_BIND_PORT_DEFAULT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @adoroszlai pointed out, this should be ScmConfigKeys.OZONE_SCM_HTTP_ADDRESS_KEY and from there get the port, in order to cover the case of a custom port.

Copy link
Contributor

@xBis7 xBis7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mladjan-gadzic I left one comment inline. Apart from that LGTM. @adoroszlai Could you take another look?

import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

class UgiMetricsUtilTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be renamed to follow test naming conventions.

Suggested change
class UgiMetricsUtilTest {
class TestUgiMetricsUtil {

import org.apache.hadoop.metrics2.MetricsTag;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.Test;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc comment missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xBis7 Thank you!

# Conflicts:
#	hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/server/http/PrometheusMetricsSink.java
#	hadoop-hdds/framework/src/test/java/org/apache/hadoop/hdds/server/http/TestPrometheusMetricsSink.java
Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mladjan-gadzic for updating the patch.

@adoroszlai adoroszlai merged commit eb031df into apache:master Jan 3, 2023
@adoroszlai
Copy link
Contributor

Thanks @xBis7 for the reviews.

errose28 added a commit to errose28/ozone that referenced this pull request Jan 9, 2023
* master: (176 commits)
  HDDS-7726. EC: Enhance datanode reconstruction log message (apache#4155)
  HDDS-7739. EC: Increase the information in the RM sending command log message (apache#4153)
  HDDS-7652. Volume Quota not enforced during write when bucket quota is not set (apache#4124)
  HDDS-7628. Intermittent failure in TestOzoneContainerWithTLS (apache#4142)
  HDDS-7695. EC metrics related to replication commands don't add up (apache#4152)
  HDDS-7729. EC: ECContainerReplicaCount should handle pending delete of unhealthy replicas (apache#4146)
  HDDS-7738. SCM terminates when adding container to a closed pipeline (apache#4154)
  HDDS-7243. Remove RequestFeatureValidator from echoRPC method which supports only ValidationCondition.OLDER_CLIENT_REQUESTS (apache#4051)
  HDDS-7708. No check for certificate duration config scenarios. (apache#4149)
  HDDS-7727. EC: SCM unregistered event handler for DatanodeCommandCountUpdated (apache#4147)
  HDDS-7606. Add SCM HA support in intellij run (apache#4058)
  HDDS-7666. EC: Unrecoverable EC containers with some remaining replicas may block decommissioning (apache#4118)
  HDDS-7339. Implement Certificate renewal task for services (apache#3982)
  HDDS-7696. MisReplicationHandler does not consider QUASI_CLOSED replicas as sources (apache#4144)
  HDDS-7714. Docker cluster ozone-om-ha fails during docker-compose up (apache#4137)
  HDDS-7716. Log read requests rejected with permission denied in OM audit (apache#4136)
  HDDS-7588. Intermittent failure in TestObjectStoreWithLegacyFS#testFlatKeyStructureWithOBS (apache#4040)
  HDDS-7633. Compile error with Java 11: package com.sun.jmx.mbeanserver is not visible (apache#4077)
  HDDS-7648. Add a servername tag in UGI metrics. (apache#4094)
  HDDS-7564. Update Ozone version after 1.3.0 release (apache#4115)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants