Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu #79365

Closed
max-rocket-internet opened this issue Jun 25, 2019 · 69 comments · Fixed by #112544
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.

Comments

@max-rocket-internet
Copy link

What happened:

HPA always has a target of <unknown>/70% and events that say:

Events:
  Type     Reason                        Age                    From                       Message
  ----     ------                        ----                   ----                       -------
  Warning  FailedComputeMetricsReplicas  36m (x12 over 38m)     horizontal-pod-autoscaler  failed to get cpu utilization: missing request for cpu
  Warning  FailedGetResourceMetric       3m50s (x136 over 38m)  horizontal-pod-autoscaler  missing request for cpu
  • There is a single container in the pods and it has resource requests and limits set.
  • The metrics-server is running
  • All pods have metrics show in kubectl top pod
  • All pods have metrics in kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"

Here's the HPA in YAML:

apiVersion: autoscaling/v1
  kind: HorizontalPodAutoscaler
  metadata:
    annotations:
      autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"SucceededGetScale","message":"the
        HPA controller was able to get the target''s current scale"},{"type":"ScalingActive","status":"False","lastTransitionTime":"2019-06-25T09:56:21Z","reason":"FailedGetResourceMetric","message":"the
        HPA was unable to compute the replica count: missing request for cpu"}]'
    creationTimestamp: "2019-06-25T09:56:06Z"
    labels:
      app: restaurant-monitor
      env: prd01
      grafana: saFkkx6ik
      rps_region: eu01
      team: vendor
    name: myapp
    namespace: default
    resourceVersion: "56108423"
    selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/myapp
    uid: 7345f8fb-972f-11e9-935d-02a07544d854
  spec:
    maxReplicas: 25
    minReplicas: 14
    scaleTargetRef:
      apiVersion: extensions/v1beta1
      kind: Deployment
      name: myapp
    targetCPUUtilizationPercentage: 70
  status:
    currentReplicas: 15
    desiredReplicas: 0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

What you expected to happen:

No <unknown> in HPA target

How to reproduce it (as minimally and precisely as possible):

I can't be sure. It's only a single HPA in our cluster. 10 other HPAs are working OK.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.12.6
  • Cloud provider or hardware configuration: EKS
@max-rocket-internet max-rocket-internet added the kind/bug Categorizes issue or PR as related to a bug. label Jun 25, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 25, 2019
@max-rocket-internet
Copy link
Author

@DirectXMan12
/sig autoscaling

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 25, 2019
@hex108
Copy link
Contributor

hex108 commented Jun 26, 2019

@max-rocket-internet could you please share the yaml content of the pod? It seems the pod's cpu request is not set.

@max-rocket-internet
Copy link
Author

@hex108

Sure. Here's from the deployment (kubectl get -o json deployment myapp | jq '.spec.template.spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

This shows there's only a single container in these pods.

Here's a list of pods:

$ kubectl get -l app=restaurant-monitor pod
NAME                                                              READY   STATUS      RESTARTS   AGE
myapp-67d9c5849d-24qbs                    1/1     Running     0          18h
myapp-67d9c5849d-9l8z4                    1/1     Running     0          18h
myapp-67d9c5849d-bv6sf                    1/1     Running     0          18h
myapp-67d9c5849d-hgqw9                    1/1     Running     0          18h
myapp-67d9c5849d-j5n2r                    1/1     Running     0          18h
myapp-67d9c5849d-kctgn                    1/1     Running     0          18h
myapp-67d9c5849d-ldhmq                    1/1     Running     0          18h
myapp-67d9c5849d-mfrd5                    1/1     Running     0          18h
myapp-67d9c5849d-p8cz4                    1/1     Running     0          18h
myapp-67d9c5849d-rm9nl                    1/1     Running     0          18h
myapp-67d9c5849d-shlj6                    1/1     Running     0          18h
myapp-67d9c5849d-sxs8f                    1/1     Running     0          18h
myapp-67d9c5849d-tpfp8                    1/1     Running     0          17h
myapp-67d9c5849d-vsz78                    1/1     Running     0          18h
myapp-issue-detection-job-15613344fl42z   0/1     Completed   0          2d11h
myapp-issue-detection-job-15614208rmdkj   0/1     Completed   0          35h
myapp-issue-detection-job-1561507268cnr   0/1     Completed   0          11h

And resources from all pods ($ kubectl get -o json -l app=myapp pod | jq '.items[].spec.containers[].resources'):

{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}
{
  "limits": {
    "cpu": "2",
    "memory": "2Gi"
  },
  "requests": {
    "cpu": "1",
    "memory": "2Gi"
  }
}

That just repeats 14 times, once for each pod. And then the 3 completed pods show as:

{}
{}
{}

@max-rocket-internet
Copy link
Author

Ahhh, I deleted those Completed pods and suddenly the HPA is back in action:

myapp                Deployment/myapp                12%/70%         14        25        14         25h

@max-rocket-internet
Copy link
Author

max-rocket-internet commented Jun 26, 2019

But these Completed pods are not from the deployment that is specified in the HPA, they are created from a Job. Sure they don't have resources set but they should be ignored by the HPA, right?

Here's the pod JSON from one:

{
  "apiVersion": "v1",
  "kind": "Pod",
  "metadata": {
    "annotations": {
      "checksum/config": "31e32a934d7d95c9399fc8ca8250ca6e6974c543e4ee16397b5dcd04b4399679"
    },
    "creationTimestamp": "2019-06-26T00:00:03Z",
    "generateName": "myapp-issue-detection-job-1561507200-",
    "labels": {
      "controller-uid": "59709b7a-97a5-11e9-b7c2-06c556123efe",
      "env": "prd01",
      "job-name": "myapp-issue-detection-job-1561507200",
      "team": "vendor"
    },
    "name": "myapp-issue-detection-job-1561507268cnr",
    "namespace": "default",
    "ownerReferences": [
      {
        "apiVersion": "batch/v1",
        "blockOwnerDeletion": true,
        "controller": true,
        "kind": "Job",
        "name": "myapp-issue-detection-job-1561507200",
        "uid": "59709b7a-97a5-11e9-b7c2-06c556123efe"
      }
    ],
    "resourceVersion": "56293646",
    "selfLink": "/api/v1/namespaces/default/pods/myapp-issue-detection-job-1561507268cnr",
    "uid": "59733023-97a5-11e9-b7c2-06c556123efe"
  }
}

I will test creating more Completed pods WITHOUT resources set and see if the issue returns. And then test creating more Completed pods WITH resources and see if it's OK.

@hex108
Copy link
Contributor

hex108 commented Jun 27, 2019

It is a little weird. How could I reproduce it?

@zq-david-wang
Copy link

// GetResourceMetric gets the given resource metric (and an associated oldest timestamp)
// for all pods matching the specified selector in the given namespace
func (c *resourceMetricsClient) GetResourceMetric(resource v1.ResourceName, namespace string, selector labels.Selector) (PodMetricsInfo, time.Time, error) {
metrics, err := c.client.PodMetricses(namespace).List(metav1.ListOptions{LabelSelector: selector.String()})
if err != nil {
return nil, time.Time{}, fmt.Errorf("unable to fetch metrics from resource metrics API: %v", err)
}

HPA use only selector to filter pods without checking ownership, I think this is a horrible mistake!

@hex108
Copy link
Contributor

hex108 commented Jun 27, 2019

@zq-david-wang It might be the cause of this issue. Would you like to send a PR for it? If not, I could try to fix it.

@max-rocket-internet Could you share the yaml of Deployment/myapp and Job/myapp-issue-detection-job, especially the labels part?

cc @DirectXMan12

@zq-david-wang
Copy link

@hex108 I am not working on this. :)

@max-rocket-internet
Copy link
Author

Here the job:

apiVersion: batch/v1
kind: Job
metadata:
  creationTimestamp: "2019-06-27T14:37:06Z"
  labels:
    app: app01
    controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
    env: prd01
    grafana: grafana_dashboard_link
    job-name: myapp-runner-job-1561646220
    rps_region: eu01
    team: vendor
  name: myapp-runner-job-1561646220
  namespace: default
  ownerReferences:
  - apiVersion: batch/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CronJob
    name: myapp-runner-job
    uid: d47b5438-98e6-11e9-935d-02a07544d854
  resourceVersion: "56786867"
  selfLink: /apis/batch/v1/namespaces/default/jobs/myapp-runner-job-1561646220
  uid: 09662df3-98e9-11e9-b7c2-06c556123efe
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  selector:
    matchLabels:
      controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
  template:
    metadata:
      annotations:
        checksum/config: 2177f5ab128ca89f6256ef363e9ea5615352d57fc5f207f614f0bc401d2c2b7e
      creationTimestamp: null
      labels:
        app: app01
        controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
        env: prd01
        grafana: grafana_dashboard_link
        job-name: myapp-runner-job-1561646220
        rps_region: eu01
        team: vendor
    spec:
      containers:
      - args:
        - -c
        - node main-report-most-unreachable.js
        command:
        - /bin/sh
        - -c
        - sleep 5
        env:
          # Deleted
        image: ubuntu
        imagePullPolicy: IfNotPresent
        name: app
        ports:
        - containerPort: 8060
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /rps_rm_service/app/config/parameters
          name: config
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 180
      volumes:
      - configMap:
          defaultMode: 420
          name: myapp
        name: config
status:
  completionTime: "2019-06-27T14:37:13Z"
  conditions:
  - lastProbeTime: "2019-06-27T14:37:13Z"
    lastTransitionTime: "2019-06-27T14:37:13Z"
    status: "True"
    type: Complete
  startTime: "2019-06-27T14:37:06Z"
  succeeded: 1

And here's the deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "224"
    kubernetes.io/change-cause: kubectl patch deployment myapp
      --kubeconfig=/home/spinnaker/.hal/default/staging/dependencies/1354313958-kubeconfig
      --context=eks-cluster01 --namespace=default --record=true --type=strategic
      --patch={"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"template":{"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"containers":[{"image":"xxx:0.0.1.7293","name":"app"}]}}}}
    moniker.spinnaker.io/application: myapp
  creationTimestamp: "2019-02-14T09:36:40Z"
  generation: 6399
  labels:
    app: app01
    app_version: 0.0.1.7293
    env: prd01
    grafana: grafana_dashboard_link
    rps_region: eu01
    team: vendor
  name: myapp
  namespace: default
  resourceVersion: "56784691"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/myapp
  uid: 08077060-303c-11e9-9855-0a17475bde48
spec:
  progressDeadlineSeconds: 600
  replicas: 14
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: app01
      env: prd01
      rps_region: eu01
      team: vendor
  strategy:
    rollingUpdate:
      maxSurge: 15%
      maxUnavailable: 0
    type: RollingUpdate
  template:
    metadata:
      annotations:
        checksum/config: d378dcf69c87a9daa71f2bd8d23e8584f884a181368619886595e99c8b3233a8
      creationTimestamp: null
      labels:
        app: app01
        app_version: 0.0.1.7293
        env: prd01
        grafana: grafana_dashboard_link
        rps_region: eu01
        team: vendor
    spec:
      containers:
      - args:
        - -c
        - node main-migrate.js && node main-start.js
        command:
        - sh
        env:
          # Deleted
        image: xxxx:0.0.1.7293
        imagePullPolicy: IfNotPresent
        livenessProbe:
          # Deleted
        name: app
        ports:
        - containerPort: 8060
          protocol: TCP
        readinessProbe:
          # Deleted
        resources:
          limits:
            cpu: "2"
            memory: 2Gi
          requests:
            cpu: "1"
            memory: 2Gi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /rps_rm_service/app/config/parameters
          name: config
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 180
      volumes:
      - configMap:
          defaultMode: 420
          name: myapp
        name: config
status:
  availableReplicas: 14
  conditions:
  - lastTransitionTime: "2019-06-27T13:49:12Z"
    lastUpdateTime: "2019-06-27T13:49:12Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2019-06-05T14:30:55Z"
    lastUpdateTime: "2019-06-27T14:28:12Z"
    message: ReplicaSet "myapp-f88fc9499" has successfully
      progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 6399
  readyReplicas: 14
  replicas: 14
  updatedReplicas: 14

I deleted all the Completed pods and the HPA started working again. And then when new Completed pods are present, I see these events:

107s        Normal    SuccessfulCreate               Job                       Created pod: myapp-runner-job-15616450zpnrz
107s        Normal    SuccessfulCreate               CronJob                   Created job myapp-runner-job-1561645080
106s        Normal    Pulling                        Pod                       pulling image "ubuntu"
103s        Normal    Pulled                         Pod                       Successfully pulled image "ubuntu"
103s        Normal    Created                        Pod                       Created container
103s        Normal    Started                        Pod                       Started container
97s         Normal    SawCompletedJob                CronJob                   Saw completed job: myapp-runner-job-1561645080
87s         Warning   FailedGetResourceMetric        HorizontalPodAutoscaler   missing request for cpu
87s         Warning   FailedComputeMetricsReplicas   HorizontalPodAutoscaler   failed to get cpu utilization: missing request for cpu

@max-rocket-internet
Copy link
Author

Maybe worth noting that even though I got these events and the HPA is saying failed to get cpu utilization: missing request for cpu it still didn't transition to <unknown> / 70% after 10 minutes. But if I create the HPA while there is these Completed pods then it stays in <unknown> state.

@max-rocket-internet
Copy link
Author

Any update @zq-david-wang?

@zq-david-wang
Copy link

... @max-rocket-internet I am not working on this issue....
@hex108 are you working on this? Hope you dit not get an impression that I am working on this.....

@max-rocket-internet
Copy link
Author

I am not working on this issue...

Ah sorry, my mistake!

@hex108
Copy link
Contributor

hex108 commented Jul 17, 2019

@zq-david-wang Ah I missed that not:( I'll try to fix it.

@DeanPH
Copy link

DeanPH commented Jul 25, 2019

Experiencing this same issue 👍

@hex108
Copy link
Contributor

hex108 commented Jul 30, 2019

I checked the code, but not sure whether it is intended to get pods list just by labels and not check the owner reference. @DirectXMan12 Could you please help confirm it? Thanks! If it is not intended, I could send a PR to fix it.

@khteh
Copy link

khteh commented Sep 4, 2019

I get the same issue:

 $ k get hpa
NAME                  REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
backend-hpa           StatefulSet/biz4x-backend     0%/75%          2         5         2          42d
frontend-hpa          StatefulSet/biz4x-frontend    0%/75%          2         5         2          42d
identityservice-hpa   StatefulSet/identityservice   <unknown>/75%   2         5         2          4h59m
kibana-hpa            StatefulSet/kibana            1%/75%          2         5         2          42d
send4x-hpa            StatefulSet/send4x-prod       0%/75%          2         5         2          4h59m
$ k describe hpa identityservice-hpa
Name:                                                  identityservice-hpa
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v2beta1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"identityservice-hpa","namespace"...
CreationTimestamp:                                     Wed, 04 Sep 2019 11:24:35 +0800
Reference:                                             StatefulSet/identityservice
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 75%
Min replicas:                                          2
Max replicas:                                          5
StatefulSet pods:                                      2 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  40s (x1199 over 5h)  horizontal-pod-autoscaler  missing request for cpu

Here is my HPA spec:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: identityservice-hpa
  namespace: default 
spec:
  minReplicas: 2
  maxReplicas: 5
  scaleTargetRef:
    apiVersion: apps/v1 
    kind: StatefulSet
    name: identityservice
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 75

Wonder why this only happens to ONE StatefulSet but not the others?

@nigimaster
Copy link

Hi,

Unfortunately, I am also facing the same:
Warning FailedGetResourceMetric 3m59s (x35656 over 6d5h) horizontal-pod-autoscaler missing request for cpu.

Any idea how to fix this?

@tedyu
Copy link
Contributor

tedyu commented Dec 8, 2019

@max-rocket-internet
I submitted a tentative PR #86044

@manojnirania
Copy link

experiencing the same issue

Configured HPA as

root@k8master:~/metrics-server/deploy/1.8+# kubectl describe horizontalpodautoscaler.autoscaling/nginx
Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Fri, 27 Dec 2019 21:29:07 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          2
Max replicas:                                          5
Deployment pods:                                       2 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: missing request for cpu
Events:
  Type     Reason                        Age   From                       Message
  ----     ------                        ----  ----                       -------
  Warning  FailedGetResourceMetric       2s    horizontal-pod-autoscaler  missing request for cpu
  Warning  FailedComputeMetricsReplicas  1s    horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: missing request for cpu

and

root@k8master:~/metrics-server/deploy/1.8+# kubectl get hpa
NAME    REFERENCE          TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   <unknown>/80%   2         5         2          4m3s
root@k8master:~/metrics-server/deploy/1.8+# 

Deployment CPU resources as

root@k8master:~/metrics-server/deploy/1.8+# kubectl get -o json deployment nginx | jq '.spec.template.spec.containers[].resources'
{
  "limits": {
    "cpu": "2"
  },
  "requests": {
    "cpu": "200m",
    "memory": "50Mi"
  }
}

Cluster details

root@k8master:~/metrics-server/deploy/1.8+# kubectl get node -o wide
NAME       STATUS   ROLES    AGE    VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8master   Ready    master   6d9h   v1.16.3   172.168.57.5   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node     Ready    <none>   6d9h   v1.16.3   172.168.57.3   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7
k8node2    Ready    <none>   6d9h   v1.16.3   172.168.57.7   <none>        Ubuntu 16.04.6 LTS   4.15.0-72-generic   docker://18.9.7

@alexvaque
Copy link

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready... Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

@vumdao
Copy link

vumdao commented Jul 2, 2021

I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready... Check the healtchecks , security groups, etc.

Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html

@alexvaque In my case, I had to add the request resource to the deployment to fix the issue

@ryunchang
Copy link

ryunchang commented Jul 23, 2021

@welsh

It works for me thanks!!

@oleg-webdeveloper
Copy link

I`ve got this issue when all my pods were under one selector an I had to explicity fill resoures blocks for each pod affected this selector

@vladimir259
Copy link

Had same issue with a deployment that could not scale because of the

"failed to get cpu utilization: missing request for cpu"

error that the HPA of the deployment was showing.

Finally got it fixed now.

Here the reasons & background:

My deployment consist of a

  • Job that run at the beginning
  • regular POD with three containers - two "sidecar" containers and one with the main app

The "main app" container had "resources" set.
Both "sidecar" containers had not.

So first problem were the missing "resources" specs on both sidecar containers.

Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Please note that if some of the Pod's containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric. See the algorithm details section below for more information about how the autoscaling algorithm works.

The second problem was that the "Job" that ran before the actual app deployment has ALSO to have "resources" defined.

And THAT was really unexpected.

That is something where @max-rocket-internet also stumbled upon & what i've tested then.
@max-rocket-internet - thanks for the hint 🍺

So, TIL:

  • enable "resources" on ALL containers/jobs in the POD

@msonnleitner
Copy link

Also run into this. Had some old pods without requests which were in Shutdown state (because of preemptible nodes). Deleting these pods got the HPA working.

@rfatolahzade
Copy link

I removed the cluster, and rebuild it from scratch. The problem doesn't appear anymore.

Thank you so much <3 in minikube case just run "minikube start"

@chirangaalwis
Copy link

chirangaalwis commented Oct 31, 2021

Encountered the same issue when attempting HPA based on CPU utilisation. Fixed the issue by setting CPU resource request at the Deployment level.

@phuongleeo
Copy link

it worked for me. The cause is I used Helm to deploy the app along with post-install-job without resource limit.
I confirm the Pod's containers must be set relevant resource request

@mlnj
Copy link

mlnj commented Nov 15, 2021

Faced this same issue. In my case my pod container had limits set, but my Dapr injected sidecar container didn't.

@AlexanderYastrebov
Copy link
Contributor

AlexanderYastrebov commented Dec 29, 2021

I think #88167 (substitutes #86044, proposed by @tedyu) approach that checks the GroupVersionKind of pod metrics is suboptimal. E.g. one may have two deployments that use the same selector and then HPA would get metrics for pods of both deployments, checking the kind would not work to tell them apart.

See also #78761 (comment) - basically HPA, although it targets deployment by name, uses deployment selector labels to get pod's metrics.

@szuecs
Copy link
Member

szuecs commented Jan 3, 2022

@AlexanderYastrebov I talked with @arjunrn and the underlying issue is that you can only query metrics-server by label.
So our "label": { "deployment": "my-deployment-7"} workaround is the way to go.
ref: #83878 (comment)

@Mirdrack
Copy link

My application is running on a pod with a sql-proxy as sidecar.
I have solved this adding resource limits to the sidecar definition on the deployment but now I'm confused about how I should read the metrics.
Should I read the value as an average of my containers?

NAME                                           REFERENCE             TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/demo-app   Deployment/demo-app   31%/70%, 0%/70%   2         5         2          91m

Also I have trying set this on my HPA definition but I have faced two issues:
1.- Documentation is not clear about how to refer the container in the pod
2.- It seems like there is a bug on this #105972

type: ContainerResource
containerResource:
  name: cpu
  container: application
  target:
    type: Utilization
    averageUtilization: 60

How can I set the metric based on my main container ignoring the sidecar?

@TDanielsHL
Copy link

Wondering if this issue should remain open, it seems what another user pointed out above could be the cause, and was true in my case:

Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Please note that if some of the Pod's containers do not have the relevant resource request set, CPU utilization for the Pod will not be defined and the autoscaler will not take any action for that metric. See the algorithm details section below for more information about how the autoscaling algorithm works.

@habibqureshi
Copy link

habibqureshi commented Mar 1, 2022

I'm getting same issue, I have enable metric-server in minikube, when I create hpa its always says
FailedGetResourceMetric 4m15s (x21 over 9m16s) horizontal-pod-autoscaler failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API

my deployment able to get scale, but not scale down even after hours,

--------Edited------

I have tried my same deployment with kind cluster and its working fine, there is some issue with minikube

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 29, 2022
@mjudeikis
Copy link
Contributor

/remove-lifecycle rotten
we keep seeing this with PSS on AKS, EKS. I suspect there is something wrong here. I didn't seen any evidence here tht this is being fixed/worked on? I might be wrong

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 15, 2022
@pbetkier
Copy link
Contributor

pbetkier commented Sep 2, 2022

Reading the discussion it seems to me the error message missing request for cpu can have multiple causes, which adds to the confusion. IMO a good action item would be to make the message more detailed, e.g. pointing to which pod and container didn't have the requests set.

@abhijit-dev82
Copy link
Contributor

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.