Keda not scaling deployment #2724

denzhel · 2022-03-08T10:10:33Z

Report

I'm using this HPA config:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prod-us-west-1-service
spec:
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: prod-us-west-1-service
  pollingInterval: 5
  minReplicaCount: 12
  maxReplicaCount:  20
  fallback:
    failureThreshold: 3
    replicas: 12
  advanced:
      restoreToOriginalReplicaCount: false
      horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
          selectPolicy: Max
  triggers:
     - type: prometheus
        metadata:
          serverAddress: X.Y.V.Z
          metricName: outbound_rabbitmq_notifications_queue_count
          query: 'max(avg_over_time(rabbitmq_queue_messages{queue="someQueue.notifications"}[5m]))'
          threshold: "100"

This is a screenshot from Grafana:

This is a screenshot from Prometheus it self when running the query manually:

Expected Behavior

I expect Keda to scale my deployment

Actual Behavior

As you can see HPA was not triggered and did not scale up
No errors were seen in logs

Steps to Reproduce the Problem

Use the config I mentioned above
Use a prometheus metric that rabbitmq publishes with the amount of messages in queuee
Try to publish messages and turning off consumers for this queue
See that Keda will not scale the deployment

Logs from KEDA operator

{"level":"info","ts":1646731800.4123747,"logger":"controller.scaledobject","msg":"Reconciling ScaledObject","reconciler group":"keda.sh","reconciler kind":"ScaledObject","name":"prod-us-west-1-SomeService","namespace":"prod-us-west-1"}

A lot of these ^

KEDA Version

2.6.1

Kubernetes Version

1.20

Platform

Amazon Web Services

Scaler Details

Prometheus

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2022-03-09T10:57:25Z

Hi @denzhel
Could you share KEDA logs (from operator and metrics-server instances)?

denzhel · 2022-03-09T17:56:14Z

Hey @JorTurFer

Operator:
repeated over and over:

{"level":"info","ts":1646847856.7035537,"logger":"controller.scaledobject","msg":"Reconciling ScaledObject","reconciler group":"keda.sh","reconciler kind":"ScaledObject","name":"prod-us-west-1-outbound-api","namespace":"prod-us-west-1"}

Metrics server:
repeated over and over:

I0309 17:19:33.033925       1 trace.go:205] Trace[459177578]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/prod-us-west-1/s0-prometheus-outbound_rabbitmq_notifications_queue_count,user-agent:kube-controller-manager/v1.20.11 (linux/amd64) kubernetes/f17b810/system:serviceaccount:kube-system:horizontal-pod-autoscaler,audit-id:6cdd3e4f-c022-4cbc-9eb5-1eed5d76c277,client:172.16.53.41,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (09-Mar-2022 17:19:32.506) (total time: 527ms):
Trace[459177578]: ---"Listing from storage done" 527ms (17:19:33.033)
Trace[459177578]: [527.712687ms] [527.712687ms] END

After searching and reading the various issues, I think I understand why Keda is not scaling. Let me know if I'm right please @JorTurFer

So .. the query I used is max(avg_over_time(rabbitmq_queue_messages{queue="someQueue.notifications"}[5m])) meaning I'm searching for the average amount of messages in period of 5 minutes. The threshold is configured to100 meaning if the average amount of messages goes above 100, I expect Keda to scale my deployment.

Today I realized that I might have configured the threshold incorrectly ... for example if the HPA current metric shows 26167m which after a calculation of 26167 / 1000 equals to 26.167, I understand that 26.167 < 100 and that's why Keda did not scale my deployment.

Should I set the threshold I want according to <DesiredThreshold> / <DeploymentsReplicaCount> ?

zroubalik · 2022-03-10T12:25:26Z

@denzhel I would definitely recommend setting the threshold to some lower number (5-10?), just to clarify that KEDA is working correctly. Then you can set the threshold to a number that fits your needs.

denzhel · 2022-03-10T12:58:24Z

Hi @zroubalik, thank you for your answer !

Besides lowering my threshold to a lower number for debugging purposes, can you please explain why should I do that or was my theory on how HPA works is correct ?

JorTurFer · 2022-03-10T13:16:03Z

Did you query that metric manually? Just for knowing what value is receiving/exposing KEDA
You can find how to do it here

denzhel · 2022-03-10T13:50:14Z

@JorTurFer When I run this query in Prometheus:

keda_metrics_adapter_scaler_metrics_value{metric="s0-prometheus-outbound_rabbitmq_notifications_queue_count", namespace="prod-us-west-1"}

I get the following:

keda_metrics_adapter_scaler_metrics_value{container="keda-operator-metrics-apiserver",endpoint="metrics",instance="X.Y.Z.V:12345",job="monitoring/keda-operator-metrics-apiserver",metric="s0-prometheus-outbound_rabbitmq_notifications_queue_count",namespace="prod-us-west-1",pod="keda-operator-metrics-apiserver-6b49b56c89-xkcz5",scaledObject="prod-us-west-1-outbound-api",scaler="prometheusScaler",scalerIndex="0"} 37

However, when I run the following kubectl query:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/prod-us-west-1/s0-prometheus-outbound_rabbitmq_notifications_queue_count/"

I get:

Error from server: exactly one ScaledObject should match label

JorTurFer · 2022-03-10T14:39:59Z

There is a section at the end of the page explaining how to solve it :)

denzhel · 2022-03-10T19:41:40Z

@JorTurFer

At the moment I checked, both Prometheus and the kubectl get --raw command returned the same value:

"value":"1525"

How ever, k get hpa returns:

95312m

Divided by 1000 due to the m unit is 95.312.
Multiply by 16 which is the number of the deployment pods is 1524.5(equals to the original metric).

Should I set my threshold lower ? 1525 is way above the threshold and currently that doesn't work because the threshold is the average number between all the pods.

JorTurFer · 2022-03-10T20:00:43Z

okey, I think that I know what's happening...
I think that you are expecting to scale in base of that "raw" value but the real behaviour is that workload is scaling in base of the average of the pods. This is the expected behaviour because KEDA creates average metrics, that means that the HPA will try to fill the average between the "raw" value and the instances count.

If the upstream is returning for example 1600 and you have set the threshold in 100, you will have approx 16 instances because 1600/16=100.
Right now, you could understand the threshold like the desired value per pod

denzhel · 2022-03-11T17:58:30Z

Threshold = desired value per pod

Got it.

It's not that trivial.

zroubalik · 2022-03-11T18:05:23Z

FYI there's issue to support Value metric type and not just AverageValue: #2030

@denzhel Can we close this?

denzhel · 2022-03-11T21:41:59Z

@zroubalik Yes.
Thank you.

denzhel added the bug Something isn't working label Mar 8, 2022

JorTurFer closed this as completed Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keda not scaling deployment #2724

Keda not scaling deployment #2724

denzhel commented Mar 8, 2022

JorTurFer commented Mar 9, 2022

denzhel commented Mar 9, 2022

zroubalik commented Mar 10, 2022

denzhel commented Mar 10, 2022 •

edited

Loading

JorTurFer commented Mar 10, 2022

denzhel commented Mar 10, 2022 •

edited

Loading

JorTurFer commented Mar 10, 2022

denzhel commented Mar 10, 2022

JorTurFer commented Mar 10, 2022

denzhel commented Mar 11, 2022

zroubalik commented Mar 11, 2022

denzhel commented Mar 11, 2022

Keda not scaling deployment #2724

Keda not scaling deployment #2724

Comments

denzhel commented Mar 8, 2022

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Mar 9, 2022

denzhel commented Mar 9, 2022

zroubalik commented Mar 10, 2022

denzhel commented Mar 10, 2022 • edited Loading

JorTurFer commented Mar 10, 2022

denzhel commented Mar 10, 2022 • edited Loading

JorTurFer commented Mar 10, 2022

denzhel commented Mar 10, 2022

JorTurFer commented Mar 10, 2022

denzhel commented Mar 11, 2022

zroubalik commented Mar 11, 2022

denzhel commented Mar 11, 2022

denzhel commented Mar 10, 2022 •

edited

Loading

denzhel commented Mar 10, 2022 •

edited

Loading