Skip to content

Conversation

@aierate
Copy link
Contributor

@aierate aierate commented Jan 23, 2024

What changes were proposed in this pull request?

  • Use the Performance replica the MutableRate XceiverClient Metrics of XceiverClientMetrics for Client and S3G.

  • The Performance including MutableQuantiles and MutableMinMax and MutableStat Metrics, the MutableStat is same with original MutableRate Metrics.

  • Replacing MutableRate with Performance ensures that the original MutableRate remains unchanged, and adds configurable MutableQuantiles and MutableMinMax metrics.

  • Reference from the following link:
    HDDS-9717. Add P99 quantiles and Min/Max Metrics for S3G Performance Metrics #5627

Before

http://ozone:9878/jmx?qry=Hadoop:service=S3Gateway,name=XceiverClientMetrics

 {
    "name" : "Hadoop:service=S3Gateway,name=XceiverClientMetrics",
    "modelerType" : "XceiverClientMetrics",
    "tag.Context" : "dfs",
    "tag.Hostname" : "conway-hadoop3",
    "numPendingCreateContainer" : 0,
    "opCountCreateContainer" : 0,
    "CreateContainerLatencyNumOps" : 0,
    "CreateContainerLatencyAvgTime" : 0.0,
    
    .
    .
    .

    "opCountWriteChunk" : 0,
    "WriteChunkLatencyNumOps" : 0,
    "WriteChunkLatencyAvgTime" : 0,
    
    .
    . 
    .
   
    "numPendingStreamWrite" : 0,
    "opCountStreamWrite" : 0,
    "StreamWriteLatencyNumOps" : 0,
    "StreamWriteLatencyAvgTime" : 0.0,
    "EcReconstructionFailsTotal" : 0,
    "EcReconstructionTotal" : 0,
    "PendingOps" : 0,
    "TotalOps" : 0
  },

After

http://ozone:9878/jmx?qry=Hadoop:service=S3Gateway,name=XceiverClientMetrics
Default (Add MutableMinMax Metrics):

 {
    "name" : "Hadoop:service=S3Gateway,name=XceiverClientMetrics",
    "modelerType" : "XceiverClientMetrics",
    "tag.Context" : "dfs",
    "tag.Hostname" : "conway-hadoop3",
    "numPendingCreateContainer" : 0,
    "opCountCreateContainer" : 0,
    "CreateContainerLatencyNumOps" : 0,
    "CreateContainerLatencyAvgTime" : 0.0,
    "CreateContainerLatencyIMinTime" : 0,
    "CreateContainerLatencyIMaxTime" : 0,
    
    .
    .
    .

    "opCountWriteChunk" : 1,
    "WriteChunkLatencyNumOps" : 1,
    "WriteChunkLatencyAvgTime" : 701.0,
    "WriteChunkLatencyIMinTime" : 0,
    "WriteChunkLatencyIMaxTime" : 0,
    
    .
    . 
    .
   
    "numPendingStreamWrite" : 0,
    "opCountStreamWrite" : 0,
    "StreamWriteLatencyNumOps" : 0,
    "StreamWriteLatencyAvgTime" : 0.0,
    "StreamWriteLatencyIMinTime" : 0,
    "StreamWriteLatencyIMaxTime" : 0,
    "EcReconstructionFailsTotal" : 0,
    "EcReconstructionTotal" : 0,
    "PendingOps" : 0,
    "TotalOps" : 2
  },

Enable quantiles Metrics:

<property>
    <name>ozone.xceiver.metrics.percentiles.intervals.seconds</name>
    <value>60,300</value>
  </property>

http://ozone:9878/jmx?qry=Hadoop:service=S3Gateway,name=XceiverClientMetrics

 {
    "name" : "Hadoop:service=S3Gateway,name=XceiverClientMetrics",
    "modelerType" : "XceiverClientMetrics",
    "tag.Context" : "dfs",
    "tag.Hostname" : "conway-hadoop3",
    "numPendingCreateContainer" : 0,
    "opCountCreateContainer" : 0,
    "CreateContainerLatencyNumOps" : 0,
    "CreateContainerLatencyAvgTime" : 0.0,
    "CreateContainerLatency60sNumOps" : 0,
    "CreateContainerLatency60s50thPercentileTime" : 0,
    "CreateContainerLatency60s75thPercentileTime" : 0,
    "CreateContainerLatency60s90thPercentileTime" : 0,
    "CreateContainerLatency60s95thPercentileTime" : 0,
    "CreateContainerLatency60s99thPercentileTime" : 0,
    "CreateContainerLatency300sNumOps" : 0,
    "CreateContainerLatency300s50thPercentileTime" : 0,
    "CreateContainerLatency300s75thPercentileTime" : 0,
    "CreateContainerLatency300s90thPercentileTime" : 0,
    "CreateContainerLatency300s95thPercentileTime" : 0,
    "CreateContainerLatency300s99thPercentileTime" : 0,
    .
    .
    .
    "numPendingWriteChunk" : 0,
    "opCountWriteChunk" : 1,
    "WriteChunkLatencyNumOps" : 1,
    "WriteChunkLatencyAvgTime" : 722.0,
    "WriteChunkLatency60sNumOps" : 1,
    "WriteChunkLatency60s50thPercentileTime" : 722,
    "WriteChunkLatency60s75thPercentileTime" : 722,
    "WriteChunkLatency60s90thPercentileTime" : 722,
    "WriteChunkLatency60s95thPercentileTime" : 722,
    "WriteChunkLatency60s99thPercentileTime" : 722,
    "WriteChunkLatency300sNumOps" : 0,
    "WriteChunkLatency300s50thPercentileTime" : 0,
    "WriteChunkLatency300s75thPercentileTime" : 0,
    "WriteChunkLatency300s90thPercentileTime" : 0,
    "WriteChunkLatency300s95thPercentileTime" : 0,
    "WriteChunkLatency300s99thPercentileTime" : 0,
    "WriteChunkLatencyIMinTime" : 0,
    "WriteChunkLatencyIMaxTime" : 0,
    .
    .
    .
    "numPendingStreamWrite" : 0,
    "opCountStreamWrite" : 0,
    "StreamWriteLatencyNumOps" : 0,
    "StreamWriteLatencyAvgTime" : 0.0,
    "StreamWriteLatency60sNumOps" : 0,
    "StreamWriteLatency60s50thPercentileTime" : 0,
    "StreamWriteLatency60s75thPercentileTime" : 0,
    "StreamWriteLatency60s90thPercentileTime" : 0,
    "StreamWriteLatency60s95thPercentileTime" : 0,
    "StreamWriteLatency60s99thPercentileTime" : 0,
    "StreamWriteLatency300sNumOps" : 0,
    "StreamWriteLatency300s50thPercentileTime" : 0,
    "StreamWriteLatency300s75thPercentileTime" : 0,
    "StreamWriteLatency300s90thPercentileTime" : 0,
    "StreamWriteLatency300s95thPercentileTime" : 0,
    "StreamWriteLatency300s99thPercentileTime" : 0,
    "StreamWriteLatencyIMinTime" : 0,
    "StreamWriteLatencyIMaxTime" : 0,
    "EcReconstructionFailsTotal" : 0,
    "EcReconstructionTotal" : 0,
    "PendingOps" : 0,
    "TotalOps" : 2
  }

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10182

How was this patch tested?

Manually Test

@aierate aierate changed the title HDDS-10182. Add P99 quantiles and Min/Max Metrics for Xceiver Client Performance Metrics for S3G and Client HDDS-10182. Add P99 quantiles and Min/Max Metrics for Xceiver Client Performance Metrics Jan 23, 2024
@aierate
Copy link
Contributor Author

aierate commented Jan 23, 2024

@xichen01 Can you help me review the code? Thank you.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aierate for the patch.

@adoroszlai
Copy link
Contributor

@tanvipenumudy please take a look

@xichen01
Copy link
Contributor

@tanvipenumudy Thanks for you Patch.

You need to have XceiverClientMetrics implement MetricsSource and override getMetrics.
just like:

public final class XceiverClientMetrics implements MetricsSource {
  @Override
  public void (MetricsCollector collector, boolean all) {
    //....
  }
}

This is because Hadoop Metrics only supports the automatic retrieval of values from Hadoop's built-in classes. MutableMinMax is a custom class, so if getMetrics is not overridden, then MutableMinMax will not be updated.
You can test that the Min/Max metric will always be 0 in the current implementation.

@aierate
Copy link
Contributor Author

aierate commented Jan 25, 2024

@tanvipenumudy Thanks for you Patch.

You need to have XceiverClientMetrics implement MetricsSource and override getMetrics. just like:

public final class XceiverClientMetrics implements MetricsSource {
  @Override
  public void (MetricsCollector collector, boolean all) {
    //....
  }
}

This is because Hadoop Metrics only supports the automatic retrieval of values from Hadoop's built-in classes. MutableMinMax is a custom class, so if getMetrics is not overridden, then MutableMinMax will not be updated. You can test that the Min/Max metric will always be 0 in the current implementation.

@xichen01 Thanks for your suggestion very much. I am fixed this bug, please take a look.

@aierate
Copy link
Contributor Author

aierate commented Jan 25, 2024

@tanvipenumudy Thanks for you Patch.
You need to have XceiverClientMetrics implement MetricsSource and override getMetrics. just like:

public final class XceiverClientMetrics implements MetricsSource {
  @Override
  public void (MetricsCollector collector, boolean all) {
    //....
  }
}

This is because Hadoop Metrics only supports the automatic retrieval of values from Hadoop's built-in classes. MutableMinMax is a custom class, so if getMetrics is not overridden, then MutableMinMax will not be updated. You can test that the Min/Max metric will always be 0 in the current implementation.

@xichen01 Thanks for your suggestion very much. I am fixed this bug, please take a look.

Here is the jmx after fixed this bug.
http://ozone:9878/jmx?qry=Hadoop:service=S3Gateway,name=XceiverClientMetrics

{
    "name" : "Hadoop:service=S3Gateway,name=XceiverClientMetrics",
    "modelerType" : "XceiverClientMetrics",
    "tag.Context" : "dfs",
    "tag.Hostname" : "conway-hadoop3",
    "PendingOps" : 0,
    "TotalOps" : 4,
    "EcReconstructionTotal" : 0,
    "EcReconstructionFailsTotal" : 0,
    "numPendingCreateContainer" : 0,
    "opCountCreateContainer" : 0,
    "CreateContainerLatencyNumOps" : 0,
    "CreateContainerLatencyAvgTime" : 0.0,
    "CreateContainerLatency60sNumOps" : 0,
    "CreateContainerLatency60s50thPercentileTime" : 0,
    "CreateContainerLatency60s75thPercentileTime" : 0,
    "CreateContainerLatency60s90thPercentileTime" : 0,
    "CreateContainerLatency60s95thPercentileTime" : 0,
    "CreateContainerLatency60s99thPercentileTime" : 0,
    "CreateContainerLatency300sNumOps" : 0,
    "CreateContainerLatency300s50thPercentileTime" : 0,
    "CreateContainerLatency300s75thPercentileTime" : 0,
    "CreateContainerLatency300s90thPercentileTime" : 0,
    "CreateContainerLatency300s95thPercentileTime" : 0,
    "CreateContainerLatency300s99thPercentileTime" : 0,
    "CreateContainerLatencyIMinTime" : 3.4028234663852886E38,
    "CreateContainerLatencyIMaxTime" : 1.401298464324817E-45,
   .
   .
   .
   "numPendingPutBlock" : 0,
    "opCountPutBlock" : 2,
    "PutBlockLatencyNumOps" : 2,
    "PutBlockLatencyAvgTime" : 30.0,
    "PutBlockLatency60sNumOps" : 1,
    "PutBlockLatency60s50thPercentileTime" : 30,
    "PutBlockLatency60s75thPercentileTime" : 30,
    "PutBlockLatency60s90thPercentileTime" : 30,
    "PutBlockLatency60s95thPercentileTime" : 30,
    "PutBlockLatency60s99thPercentileTime" : 30,
    "PutBlockLatency300sNumOps" : 0,
    "PutBlockLatency300s50thPercentileTime" : 0,
    "PutBlockLatency300s75thPercentileTime" : 0,
    "PutBlockLatency300s90thPercentileTime" : 0,
    "PutBlockLatency300s95thPercentileTime" : 0,
    "PutBlockLatency300s99thPercentileTime" : 0,
    "PutBlockLatencyIMinTime" : 30.0,
    "PutBlockLatencyIMaxTime" : 30.0,
   . 
   .
   .
   "numPendingStreamWrite" : 0,
    "opCountStreamWrite" : 0,
    "StreamWriteLatencyNumOps" : 0,
    "StreamWriteLatencyAvgTime" : 0.0,
    "StreamWriteLatency60sNumOps" : 0,
    "StreamWriteLatency60s50thPercentileTime" : 0,
    "StreamWriteLatency60s75thPercentileTime" : 0,
    "StreamWriteLatency60s90thPercentileTime" : 0,
    "StreamWriteLatency60s95thPercentileTime" : 0,
    "StreamWriteLatency60s99thPercentileTime" : 0,
    "StreamWriteLatency300sNumOps" : 0,
    "StreamWriteLatency300s50thPercentileTime" : 0,
    "StreamWriteLatency300s75thPercentileTime" : 0,
    "StreamWriteLatency300s90thPercentileTime" : 0,
    "StreamWriteLatency300s95thPercentileTime" : 0,
    "StreamWriteLatency300s99thPercentileTime" : 0,
    "StreamWriteLatencyIMinTime" : 3.4028234663852886E38,
    "StreamWriteLatencyIMaxTime" : 1.401298464324817E-45
  }

@aierate
Copy link
Contributor Author

aierate commented Jan 25, 2024

All unit tests have passed. https://github.com/aierate/ozone/actions

@aierate
Copy link
Contributor Author

aierate commented Jan 30, 2024

@xBis7 Can you help me review the code? Thank you.

@xichen01 xichen01 changed the title HDDS-10182. Add P99 quantiles and Min/Max Metrics for Xceiver Client Performance Metrics HDDS-10182. Add P99 quantiles and Min/Max Metrics for Xceiver Client Performance Metrics Feb 1, 2024
@xichen01
Copy link
Contributor

xichen01 commented Feb 1, 2024

LGTM

@adoroszlai
Copy link
Contributor

@duongkame @kerneltime @tanvipenumudy please take a look

@xichen01
Copy link
Contributor

Thanks @aierate for the patch, @kerneltime @xichen01 for the review.

@xichen01 xichen01 merged commit d4606e1 into apache:master Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants