Skip to content
This repository has been archived by the owner on Sep 24, 2021. It is now read-only.

ExternalMetric reports incorrect value #52

Open
rbrigden opened this issue Sep 13, 2020 · 1 comment
Open

ExternalMetric reports incorrect value #52

rbrigden opened this issue Sep 13, 2020 · 1 comment

Comments

@rbrigden
Copy link

We have been noticing inconsistencies between the metric value reported by the HPA and the metric value reported from CW. We are struggling to scale our system to keep up with a work queue and would appreciate some clarity.

I have the following setup for a custom metric that is posted to CW every 15 minutes. It is in OUR/NAMESPACE, has a single dimension QUEUE and is named QUEUE_SIZE.

ExternalMetric

apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
  name: <replace>-queue-length
spec:
  name: <replace>-queue-length
  resource:
    resource: "deployment"
  queries:
    - id: <replace>
      metricStat:
        metric:
          namespace: "OUR/NAMESPACE"
          metricName: "QUEUE_SIZE"
          dimensions:
            - name: QUEUE
              value: "<replace>"
        period: 1800
        stat: Average
        unit: Count
      returnData: true

HPA

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
  name: <replace>-scaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: our-deployment
  minReplicas: 1
  maxReplicas: 200
  metrics:
    - type: External
      external:
        metricName: <replace>-queue-length
        targetAverageValue: 10

We run the CW query directly as suggested in another issue

aws cloudwatch get-metric-statistics --metric-name QUEUE_SIZE --start-time 2020-09-13T07:30:00z --end-time 2020-09-13T08:20:00z --period=1800 --namespace OUR/NAMESPACE --statistics Average --dimensions Name=QUEUE,Value=<replace> --unit Count
{
    "Label": "QUEUE_SIZE",
    "Datapoints": [
        {
            "Timestamp": "2020-09-13T07:30:00Z",
            "Average": 381.8333333333333,
            "Unit": "Count"
        }
    ]
}

We inspect the HPA and see the following

Name:                                                 <replace>-scaler
Namespace:                                            web
Labels:                                               <none>
Annotations:                                          <none>
CreationTimestamp:                                    Sun, 13 Sep 2020 00:49:13 -0700
Reference:                                            Deployment/our-deployment
Metrics:                                              ( current / target )
  "<replace>-queue-length" (target average value):  10778m / 10
Min replicas:                                         1
Max replicas:                                         200
Deployment pods:                                      36 current / 36 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from external metric <replace>-queue-length(nil)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range

The reported value appears to be nowhere close to the true value in CW. We follow the logs in the metrics adapter and it claims to successfully capture and report the external metric.

We would appreciate any tips to help us have the correct metric value supplied to the HPA. Thanks!

@chankh
Copy link
Contributor

chankh commented Sep 16, 2020

Hi, can you also provide the log output from the adapter?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants