Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu #79365

max-rocket-internet · 2019-06-25T10:48:56Z

What happened:

HPA always has a target of <unknown>/70% and events that say:

Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedComputeMetricsReplicas  36m (x12 over 38m)     horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu
  Warning  FailedGetResourceMetric       3m50s (x136 over 38m)  horizontal-pod-autoscaler  missing request for cpu

There is a single container in the pods and it has resource requests and limits set.
The metrics-server is running
All pods have metrics show in kubectl top pod
All pods have metrics in kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

Here's the HPA in YAML:

apiVersion: autoscaling/v1
  kind: HorizontalPodAutoscaler
  metadata:
    annotations:
      autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"SucceededGetScale","message":"the
        HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"FailedGetResourceMetric","message":"the
        HPA was unable to compute the replica count: missing request for cpu"}]'
    creationTimestamp: "2019-06-25T09:56:06Z"
    labels:
      app: restaurant-monitor
      env: prd01
      grafana: saFkkx6ik
      rps_region: eu01
      team: vendor
    name: myapp
    namespace: default
    resourceVersion: "56108423"
    selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/myapp
    uid: 7345f8fb-972f-11e9-935d-02a07544d854
  spec:
    maxReplicas: 25
    minReplicas: 14
    scaleTargetRef:
      apiVersion: extensions/v1beta1
      kind: Deployment
      name: myapp
    targetCPUUtilizationPercentage: 70
  status:
    currentReplicas: 15
    desiredReplicas: 0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

What you expected to happen:

No <unknown> in HPA target

How to reproduce it (as minimally and precisely as possible):

I can't be sure. It's only a single HPA in our cluster. 10 other HPAs are working OK.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.12.6
Cloud provider or hardware configuration: EKS

The text was updated successfully, but these errors were encountered:

max-rocket-internet · 2019-06-25T10:49:06Z

@DirectXMan12
/sig autoscaling

hex108 · 2019-06-26T01:36:27Z

@max-rocket-internet could you please share the yaml content of the pod? It seems the pod's cpu request is not set.

max-rocket-internet · 2019-06-26T11:26:05Z

@hex108

Sure. Here's from the deployment (kubectl get -o json deployment myapp | jq '.spec.template.spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

This shows there's only a single container in these pods.

Here's a list of pods:

$ kubectl get -l app=restaurant-monitor pod
NAME                                                              READY   STATUS      RESTARTS   AGE
myapp-67d9c5849d-24qbs                    1/1     Running     0          18h
myapp-67d9c5849d-9l8z4                    1/1     Running     0          18h
myapp-67d9c5849d-bv6sf                    1/1     Running     0          18h
myapp-67d9c5849d-hgqw9                    1/1     Running     0          18h
myapp-67d9c5849d-j5n2r                    1/1     Running     0          18h
myapp-67d9c5849d-kctgn                    1/1     Running     0          18h
myapp-67d9c5849d-ldhmq                    1/1     Running     0          18h
myapp-67d9c5849d-mfrd5                    1/1     Running     0          18h
myapp-67d9c5849d-p8cz4                    1/1     Running     0          18h
myapp-67d9c5849d-rm9nl                    1/1     Running     0          18h
myapp-67d9c5849d-shlj6                    1/1     Running     0          18h
myapp-67d9c5849d-sxs8f                    1/1     Running     0          18h
myapp-67d9c5849d-tpfp8                    1/1     Running     0          17h
myapp-67d9c5849d-vsz78                    1/1     Running     0          18h
myapp-issue-detection-job-15613344fl42z   0/1     Completed   0          2d11h
myapp-issue-detection-job-15614208rmdkj   0/1     Completed   0          35h
myapp-issue-detection-job-1561507268cnr   0/1     Completed   0          11h

And resources from all pods ($ kubectl get -o json -l app=myapp pod | jq '.items[].spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}
{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

That just repeats 14 times, once for each pod. And then the 3 completed pods show as:

{}
{}
{}

max-rocket-internet · 2019-06-26T11:35:43Z

Ahhh, I deleted those Completed pods and suddenly the HPA is back in action:

myapp                Deployment/myapp                12%/70%         14        25        14         25h

max-rocket-internet · 2019-06-26T11:57:23Z

But these Completed pods are not from the deployment that is specified in the HPA, they are created from a Job. Sure they don't have resources set but they should be ignored by the HPA, right?

Here's the pod JSON from one:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {
      "checksum/config": "31e32a934d7d95c9399fc8ca8250ca6e6974c543e4ee16397b5dcd04b4399679"
    },
    "creationTimestamp": "2019-06-26T00:00:03Z",
    "generateName": "myapp-issue-detection-job-1561507200-",
    "labels": {
      "controller-uid": "59709b7a-97a5-11e9-b7c2-06c556123efe",
      "env": "prd01",
      "job-name": "myapp-issue-detection-job-1561507200",
      "team": "vendor"
    },
    "name": "myapp-issue-detection-job-1561507268cnr",
    "namespace": "default",
    "ownerReferences": [
      {
        "apiVersion": "batch/v1",
        "blockOwnerDeletion": true,
        "controller": true,
        "kind": "Job",
        "name": "myapp-issue-detection-job-1561507200",
        "uid": "59709b7a-97a5-11e9-b7c2-06c556123efe"
      }
    ],
    "resourceVersion": "56293646",
    "selfLink": "/api/v1/namespaces/default/pods/myapp-issue-detection-job-1561507268cnr",
    "uid": "59733023-97a5-11e9-b7c2-06c556123efe"
  }
}

I will test creating more Completed pods WITHOUT resources set and see if the issue returns. And then test creating more Completed pods WITH resources and see if it's OK.

hex108 · 2019-06-27T01:16:05Z

It is a little weird. How could I reproduce it?

zq-david-wang · 2019-06-27T02:18:07Z

kubernetes/pkg/controller/podautoscaler/metrics/rest_metrics_client.go

Lines 63 to 69 in cd89631

    
           // GetResourceMetric gets the given resource metric (and an associated oldest timestamp) 
        
           // for all pods matching the specified selector in the given namespace 
        
           func (c *resourceMetricsClient) GetResourceMetric(resource v1.ResourceName, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) { 
        
           	metrics, err := c.client.PodMetricses(namespace).List(metav1.ListOptions{LabelSelector: selector.String()}) 
        
           	if err != nil { 
        
           		return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from resource metrics API: %v", err) 
        
           	}

HPA use only selector to filter pods without checking ownership, I think this is a horrible mistake!

hex108 · 2019-06-27T03:23:03Z

@zq-david-wang It might be the cause of this issue. Would you like to send a PR for it? If not, I could try to fix it.

@max-rocket-internet Could you share the yaml of Deployment/myapp and Job/myapp-issue-detection-job, especially the labels part?

cc @DirectXMan12

zq-david-wang · 2019-06-27T06:24:10Z

@hex108 I am not working on this. :)

max-rocket-internet · 2019-06-27T14:48:10Z

Here the job:

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2019-06-27T14:37:06Z"
  labels:
    app: app01
    controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
    env: prd01
    grafana: grafana_dashboard_link
    job-name: myapp-runner-job-1561646220
    rps_region: eu01
    team: vendor
  name: myapp-runner-job-1561646220
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: myapp-runner-job
    uid: d47b5438-98e6-11e9-935d-02a07544d854
  resourceVersion: "56786867"
  selfLink: /apis/batch/v1/namespaces/default/jobs/myapp-runner-job-1561646220
  uid: 09662df3-98e9-11e9-b7c2-06c556123efe
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
  template:
    metadata:
      annotations:
        checksum/config: 2177f5ab128ca89f6256ef363e9ea5615352d57fc5f207f614f0bc401d2c2b7e
      creationTimestamp: null
      labels:
        app: app01
        controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
        env: prd01
        grafana: grafana_dashboard_link
        job-name: myapp-runner-job-1561646220
        rps_region: eu01
        team: vendor
    spec:
      containers:
      - args:
        - -c
        - node main-report-most-unreachable.js
        command:
        - /bin/sh
        - -c
        - sleep 5
        env:
          # Deleted
        image: ubuntu
        imagePullPolicy: IfNotPresent
        name: app
        ports:
        - containerPort: 8060
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /rps_rm_service/app/config/parameters
          name: config
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 180
      volumes:
      - configMap:
          defaultMode: 420
          name: myapp
        name: config
status:
  completionTime: "2019-06-27T14:37:13Z"
  conditions:
  - lastProbeTime: "2019-06-27T14:37:13Z"
    lastTransitionTime: "2019-06-27T14:37:13Z"
    status: "True"
    type: Complete
  startTime: "2019-06-27T14:37:06Z"
  succeeded: 1

And here's the deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "224"
    kubernetes.io/change-cause: kubectl patch deployment myapp
      --kubeconfig=/home/spinnaker/.hal/default/staging/dependencies/1354313958-kubeconfig
      --context=eks-cluster01 --namespace=default --record=true --type=strategic
      --patch={"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"template":{"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"containers":[{"image":"xxx:0.0.1.7293","name":"app"}]}}}}
    moniker.spinnaker.io/application: myapp
  creationTimestamp: "2019-02-14T09:36:40Z"
  generation: 6399
  labels:
    app: app01
    app_version: 0.0.1.7293
    env: prd01
    grafana: grafana_dashboard_link
    rps_region: eu01
    team: vendor
  name: myapp
  namespace: default
  resourceVersion: "56784691"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/myapp
  uid: 08077060-303c-11e9-9855-0a17475bde48
spec:
  progressDeadlineSeconds: 600
  replicas: 14
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: app01
      env: prd01
      rps_region: eu01
      team: vendor
  strategy:
    rollingUpdate:
      maxSurge: 15%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/config: d378dcf69c87a9daa71f2bd8d23e8584f884a181368619886595e99c8b3233a8
      creationTimestamp: null
      labels:
        app: app01
        app_version: 0.0.1.7293
        env: prd01
        grafana: grafana_dashboard_link
        rps_region: eu01
        team: vendor
    spec:
      containers:
      - args:
        - -c
        - node main-migrate.js && node main-start.js
        command:
        - sh
        env:
          # Deleted
        image: xxxx:0.0.1.7293
        imagePullPolicy: IfNotPresent
        livenessProbe:
          # Deleted
        name: app
        ports:
        - containerPort: 8060
          protocol: TCP
        readinessProbe:
          # Deleted
        resources:
          limits:
            cpu: "2"
            memory: 2Gi
          requests:
            cpu: "1"
            memory: 2Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /rps_rm_service/app/config/parameters
          name: config
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 180
      volumes:
      - configMap:
          defaultMode: 420
          name: myapp
        name: config
status:
  availableReplicas: 14
  conditions:
  - lastTransitionTime: "2019-06-27T13:49:12Z"
    lastUpdateTime: "2019-06-27T13:49:12Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-05T14:30:55Z"
    lastUpdateTime: "2019-06-27T14:28:12Z"
    message: ReplicaSet "myapp-f88fc9499" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 6399
  readyReplicas: 14
  replicas: 14
  updatedReplicas: 14

I deleted all the Completed pods and the HPA started working again. And then when new Completed pods are present, I see these events:

107s        Normal    SuccessfulCreate               Job                       Created pod: myapp-runner-job-15616450zpnrz
107s        Normal    SuccessfulCreate               CronJob                   Created job myapp-runner-job-1561645080
106s        Normal    Pulling                        Pod                       pulling image "ubuntu"
103s        Normal    Pulled                         Pod                       Successfully pulled image "ubuntu"
103s        Normal    Created                        Pod                       Created container
103s        Normal    Started                        Pod                       Started container
97s         Normal    SawCompletedJob                CronJob                   Saw completed job: myapp-runner-job-1561645080
87s         Warning   FailedGetResourceMetric        HorizontalPodAutoscaler   missing request for cpu
87s         Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler   failed to get cpu utilization: missing request for cpu

max-rocket-internet · 2019-06-27T14:56:46Z

Maybe worth noting that even though I got these events and the HPA is saying failed to get cpu utilization: missing request for cpu it still didn't transition to <unknown> / 70% after 10 minutes. But if I create the HPA while there is these Completed pods then it stays in <unknown> state.

max-rocket-internet · 2019-07-16T16:29:41Z

Any update @zq-david-wang?

zq-david-wang · 2019-07-17T03:26:40Z

... @max-rocket-internet I am not working on this issue....
@hex108 are you working on this? Hope you dit not get an impression that I am working on this.....

max-rocket-internet · 2019-07-17T08:16:24Z

I am not working on this issue...

Ah sorry, my mistake!

hex108 · 2019-07-17T08:35:42Z

@zq-david-wang Ah I missed that not:( I'll try to fix it.

DeanPH · 2019-07-25T18:27:51Z

Experiencing this same issue 👍

hex108 · 2019-07-30T03:09:51Z

I checked the code, but not sure whether it is intended to get pods list just by labels and not check the owner reference. @DirectXMan12 Could you please help confirm it? Thanks! If it is not intended, I could send a PR to fix it.

khteh · 2019-09-04T08:38:42Z

I get the same issue:

 $ k get hpa
NAME                  REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
backend-hpa           StatefulSet/biz4x-backend     0%/75%          2         5         2          42d
frontend-hpa          StatefulSet/biz4x-frontend    0%/75%          2         5         2          42d
identityservice-hpa   StatefulSet/identityservice   <unknown>/75%   2         5         2          4h59m
kibana-hpa            StatefulSet/kibana            1%/75%          2         5         2          42d
send4x-hpa            StatefulSet/send4x-prod       0%/75%          2         5         2          4h59m

$ k describe hpa identityservice-hpa
Name:                                                  identityservice-hpa
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"identityservice-hpa","namespace"...
CreationTimestamp:                                     Wed, 04 Sep 2019 11:24:35 +0800
Reference:                                             StatefulSet/identityservice
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 75%
Min replicas:                                          2
Max replicas:                                          5
StatefulSet pods:                                      2 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  40s (x1199 over 5h)  horizontal-pod-autoscaler  missing request for cpu

Here is my HPA spec:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: identityservice-hpa
  namespace: default 
spec:
  minReplicas: 2
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1 
    kind: StatefulSet
    name: identityservice
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 75

Wonder why this only happens to ONE StatefulSet but not the others?

nigimaster · 2019-12-08T19:07:36Z

Hi,

Unfortunately, I am also facing the same:
Warning FailedGetResourceMetric 3m59s (x35656 over 6d5h) horizontal-pod-autoscaler missing request for cpu.

Any idea how to fix this?

tedyu · 2019-12-08T22:30:13Z

@max-rocket-internet
I submitted a tentative PR #86044

manojnirania · 2019-12-27T22:11:02Z

experiencing the same issue

Configured HPA as

root@k8master:~/metrics-server/deploy/1.8+# kubectl describe horizontalpodautoscaler.autoscaling/nginx
Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 27 Dec 2019 21:29:07 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          2
Max replicas:                                          5
Deployment pods:                                       2 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       2s    horizontal-pod-autoscaler  missing request for cpu
  Warning  FailedComputeMetricsReplicas  1s    horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu

and

root@k8master:~/metrics-server/deploy/1.8+# kubectl get hpa
NAME    REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   <unknown>/80%   2         5         2          4m3s
root@k8master:~/metrics-server/deploy/1.8+#

Deployment CPU resources as

root@k8master:~/metrics-server/deploy/1.8+# kubectl get -o json deployment nginx | jq '.spec.template.spec.containers[].resources'
{
  "limits": {
    "cpu": "2"
  },
  "requests": {
    "cpu": "200m",
    "memory": "50Mi"
  }
}

Cluster details

root@k8master:~/metrics-server/deploy/1.8+# kubectl get node -o wide
NAME       STATUS   ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8master   Ready    master   6d9h   v1.16.3   172.168.57.5   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node     Ready    <none>   6d9h   v1.16.3   172.168.57.3   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node2    Ready    <none>   6d9h   v1.16.3   172.168.57.7   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7

alexvaque · 2021-06-23T14:50:44Z

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready... Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

vumdao · 2021-07-02T16:36:03Z

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready... Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

@alexvaque In my case, I had to add the request resource to the deployment to fix the issue

ryunchang · 2021-07-23T08:51:08Z

@welsh

It works for me thanks!!

oleg-webdeveloper · 2021-08-05T14:14:35Z

I`ve got this issue when all my pods were under one selector an I had to explicity fill resoures blocks for each pod affected this selector

vladimir259 · 2021-08-10T14:33:56Z

Had same issue with a deployment that could not scale because of the

"failed to get cpu utilization: missing request for cpu"

error that the HPA of the deployment was showing.

Finally got it fixed now.

Here the reasons & background:

My deployment consist of a

Job that run at the beginning
regular POD with three containers - two "sidecar" containers and one with the main app

The "main app" container had "resources" set.
Both "sidecar" containers had not.

So first problem were the missing "resources" specs on both sidecar containers.

Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Please note that if some of the Pod's containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric. See the algorithm details section below for more information about how the autoscaling algorithm works.

The second problem was that the "Job" that ran before the actual app deployment has ALSO to have "resources" defined.

And THAT was really unexpected.

That is something where @max-rocket-internet also stumbled upon & what i've tested then.
@max-rocket-internet - thanks for the hint 🍺

So, TIL:

enable "resources" on ALL containers/jobs in the POD

msonnleitner · 2021-08-11T15:29:47Z

Also run into this. Had some old pods without requests which were in Shutdown state (because of preemptible nodes). Deleting these pods got the HPA working.

rfatolahzade · 2021-08-22T12:22:59Z

I removed the cluster, and rebuild it from scratch. The problem doesn't appear anymore.

Thank you so much <3 in minikube case just run "minikube start"

chirangaalwis · 2021-10-31T21:23:01Z

Encountered the same issue when attempting HPA based on CPU utilisation. Fixed the issue by setting CPU resource request at the Deployment level.

phuongleeo · 2021-11-08T07:36:34Z

it worked for me. The cause is I used Helm to deploy the app along with post-install-job without resource limit.
I confirm the Pod's containers must be set relevant resource request

mlnj · 2021-11-15T21:23:51Z

Faced this same issue. In my case my pod container had limits set, but my Dapr injected sidecar container didn't.

AlexanderYastrebov · 2021-12-29T15:15:30Z

I think #88167 (substitutes #86044, proposed by @tedyu) approach that checks the GroupVersionKind of pod metrics is suboptimal. E.g. one may have two deployments that use the same selector and then HPA would get metrics for pods of both deployments, checking the kind would not work to tell them apart.

See also #78761 (comment) - basically HPA, although it targets deployment by name, uses deployment selector labels to get pod's metrics.

szuecs · 2022-01-03T10:55:10Z

@AlexanderYastrebov I talked with @arjunrn and the underlying issue is that you can only query metrics-server by label.
So our "label": { "deployment": "my-deployment-7"} workaround is the way to go.
ref: #83878 (comment)

Mirdrack · 2022-01-30T10:38:38Z

My application is running on a pod with a sql-proxy as sidecar.
I have solved this adding resource limits to the sidecar definition on the deployment but now I'm confused about how I should read the metrics.
Should I read the value as an average of my containers?

NAME                                           REFERENCE             TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/demo-app   Deployment/demo-app   31%/70%, 0%/70%   2         5         2          91m

Also I have trying set this on my HPA definition but I have faced two issues:
1.- Documentation is not clear about how to refer the container in the pod
2.- It seems like there is a bug on this #105972

type: ContainerResource
containerResource:
  name: cpu
  container: application
  target:
    type: Utilization
    averageUtilization: 60

How can I set the metric based on my main container ignoring the sidecar?

TDanielsHL · 2022-02-03T23:54:23Z

Wondering if this issue should remain open, it seems what another user pointed out above could be the cause, and was true in my case:

Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Please note that if some of the Pod's containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric. See the algorithm details section below for more information about how the autoscaling algorithm works.

habibqureshi · 2022-03-01T17:46:45Z

I'm getting same issue, I have enable metric-server in minikube, when I create hpa its always says
FailedGetResourceMetric 4m15s (x21 over 9m16s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

my deployment able to get scale, but not scale down even after hours,

--------Edited------

I have tried my same deployment with kind cluster and its working fine, there is some issue with minikube

k8s-triage-robot · 2022-05-30T19:20:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-06-29T20:09:05Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

mjudeikis · 2022-07-15T08:03:35Z

/remove-lifecycle rotten
we keep seeing this with PSS on AKS, EKS. I suspect there is something wrong here. I didn't seen any evidence here tht this is being fixed/worked on? I might be wrong

pbetkier · 2022-09-02T08:51:01Z

Reading the discussion it seems to me the error message missing request for cpu can have multiple causes, which adds to the confusion. IMO a good action item would be to make the message more detailed, e.g. pointing to which pod and container didn't have the requests set.

abhijit-dev82 · 2022-09-05T07:11:28Z

/assign

max-rocket-internet added the kind/bug Categorizes issue or PR as related to a bug. label Jun 25, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2019

k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2019

PatrickLang mentioned this issue Sep 11, 2019

feat: upgrade metrics server to v0.3.4 Azure/aks-engine#1109

Merged

4 tasks

ghost mentioned this issue Nov 10, 2019

[stable/nginx-ingress] default backend deployment and controller sharing same selector causes issue helm/charts#18757

Closed

tedyu mentioned this issue Dec 8, 2019

Check the GroupVersionKind of pod metrics #86044

Closed

zq-david-wang mentioned this issue Dec 12, 2019

Creation of a pod outside the deployment with common labels causes hpa failure #86151

Closed

the-cybersapien mentioned this issue Dec 31, 2019

[stable/nginx-ingress] Add "component" key to deployment selector helm/charts#19817

Closed

4 tasks

jurgenweber mentioned this issue Dec 14, 2021

ensure the job patches have different labels estahn/k8s-image-swapper#124

Closed

pacoxu mentioned this issue Apr 1, 2022

HPA does not scale down when it fail to fetch external data #109214

Closed

ksauzz mentioned this issue May 11, 2022

Resources support aws-observability/aws-sigv4-proxy-admission-controller#8

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 15, 2022

k8s-ci-robot assigned abhijit-dev82 Sep 5, 2022

abhijit-dev82 mentioned this issue Sep 17, 2022

HPA : Enhance error message to capture POD details #112544

Merged

k8s-ci-robot closed this as completed in #112544 Oct 28, 2022

Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu #79365

Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu #79365

Comments

max-rocket-internet commented Jun 25, 2019

max-rocket-internet commented Jun 25, 2019

hex108 commented Jun 26, 2019

max-rocket-internet commented Jun 26, 2019

max-rocket-internet commented Jun 26, 2019

max-rocket-internet commented Jun 26, 2019 • edited Loading

hex108 commented Jun 27, 2019 • edited Loading

zq-david-wang commented Jun 27, 2019

hex108 commented Jun 27, 2019 • edited Loading

zq-david-wang commented Jun 27, 2019

max-rocket-internet commented Jun 27, 2019

max-rocket-internet commented Jun 27, 2019

max-rocket-internet commented Jul 16, 2019

zq-david-wang commented Jul 17, 2019

max-rocket-internet commented Jul 17, 2019

hex108 commented Jul 17, 2019

DeanPH commented Jul 25, 2019

hex108 commented Jul 30, 2019 • edited Loading

khteh commented Sep 4, 2019

nigimaster commented Dec 8, 2019

tedyu commented Dec 8, 2019

manojnirania commented Dec 27, 2019

experiencing the same issue

alexvaque commented Jun 23, 2021

vumdao commented Jul 2, 2021

ryunchang commented Jul 23, 2021 • edited Loading

oleg-webdeveloper commented Aug 5, 2021

vladimir259 commented Aug 10, 2021

msonnleitner commented Aug 11, 2021

rfatolahzade commented Aug 22, 2021

chirangaalwis commented Oct 31, 2021 • edited Loading

phuongleeo commented Nov 8, 2021

mlnj commented Nov 15, 2021

AlexanderYastrebov commented Dec 29, 2021 • edited Loading

szuecs commented Jan 3, 2022 • edited Loading

Mirdrack commented Jan 30, 2022

TDanielsHL commented Feb 3, 2022

habibqureshi commented Mar 1, 2022 • edited Loading

k8s-triage-robot commented May 30, 2022

k8s-triage-robot commented Jun 29, 2022

mjudeikis commented Jul 15, 2022

pbetkier commented Sep 2, 2022

abhijit-dev82 commented Sep 5, 2022

max-rocket-internet commented Jun 26, 2019 •

edited

Loading

hex108 commented Jun 27, 2019 •

edited

Loading

hex108 commented Jun 27, 2019 •

edited

Loading

hex108 commented Jul 30, 2019 •

edited

Loading

ryunchang commented Jul 23, 2021 •

edited

Loading

chirangaalwis commented Oct 31, 2021 •

edited

Loading

AlexanderYastrebov commented Dec 29, 2021 •

edited

Loading

szuecs commented Jan 3, 2022 •

edited

Loading

habibqureshi commented Mar 1, 2022 •

edited

Loading