Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda not scaling deployment #2724

Closed
denzhel opened this issue Mar 8, 2022 · 12 comments
Closed

Keda not scaling deployment #2724

denzhel opened this issue Mar 8, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@denzhel
Copy link

denzhel commented Mar 8, 2022

Report

I'm using this HPA config:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: prod-us-west-1-service
spec:
  scaleTargetRef:
    apiVersion: "apps/v1"
    kind: "Deployment"
    name: prod-us-west-1-service
  pollingInterval: 5
  minReplicaCount: 12
  maxReplicaCount:  20
  fallback:
    failureThreshold: 3
    replicas: 12
  advanced:
      restoreToOriginalReplicaCount: false
      horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
        scaleUp:
          stabilizationWindowSeconds: 0
          policies:
          - type: Percent
            value: 100
            periodSeconds: 15
          selectPolicy: Max
  triggers:
     - type: prometheus
        metadata:
          serverAddress: X.Y.V.Z
          metricName: outbound_rabbitmq_notifications_queue_count
          query: 'max(avg_over_time(rabbitmq_queue_messages{queue="someQueue.notifications"}[5m]))'
          threshold: "100"

This is a screenshot from Grafana:
image
This is a screenshot from Prometheus it self when running the query manually:
image

Expected Behavior

I expect Keda to scale my deployment

Actual Behavior

As you can see HPA was not triggered and did not scale up
No errors were seen in logs

Steps to Reproduce the Problem

  1. Use the config I mentioned above
  2. Use a prometheus metric that rabbitmq publishes with the amount of messages in queuee
  3. Try to publish messages and turning off consumers for this queue
  4. See that Keda will not scale the deployment

Logs from KEDA operator

{"level":"info","ts":1646731800.4123747,"logger":"controller.scaledobject","msg":"Reconciling ScaledObject","reconciler group":"keda.sh","reconciler kind":"ScaledObject","name":"prod-us-west-1-SomeService","namespace":"prod-us-west-1"}

A lot of these ^

KEDA Version

2.6.1

Kubernetes Version

1.20

Platform

Amazon Web Services

Scaler Details

Prometheus

Anything else?

No response

@denzhel denzhel added the bug Something isn't working label Mar 8, 2022
@JorTurFer
Copy link
Member

Hi @denzhel
Could you share KEDA logs (from operator and metrics-server instances)?

@denzhel
Copy link
Author

denzhel commented Mar 9, 2022

Hey @JorTurFer

Operator:
repeated over and over:

{"level":"info","ts":1646847856.7035537,"logger":"controller.scaledobject","msg":"Reconciling ScaledObject","reconciler group":"keda.sh","reconciler kind":"ScaledObject","name":"prod-us-west-1-outbound-api","namespace":"prod-us-west-1"}

Metrics server:
repeated over and over:

I0309 17:19:33.033925       1 trace.go:205] Trace[459177578]: "List" url:/apis/external.metrics.k8s.io/v1beta1/namespaces/prod-us-west-1/s0-prometheus-outbound_rabbitmq_notifications_queue_count,user-agent:kube-controller-manager/v1.20.11 (linux/amd64) kubernetes/f17b810/system:serviceaccount:kube-system:horizontal-pod-autoscaler,audit-id:6cdd3e4f-c022-4cbc-9eb5-1eed5d76c277,client:172.16.53.41,accept:application/vnd.kubernetes.protobuf, */*,protocol:HTTP/2.0 (09-Mar-2022 17:19:32.506) (total time: 527ms):
Trace[459177578]: ---"Listing from storage done" 527ms (17:19:33.033)
Trace[459177578]: [527.712687ms] [527.712687ms] END

After searching and reading the various issues, I think I understand why Keda is not scaling. Let me know if I'm right please @JorTurFer

So .. the query I used is max(avg_over_time(rabbitmq_queue_messages{queue="someQueue.notifications"}[5m])) meaning I'm searching for the average amount of messages in period of 5 minutes. The threshold is configured to100 meaning if the average amount of messages goes above 100, I expect Keda to scale my deployment.

Today I realized that I might have configured the threshold incorrectly ... for example if the HPA current metric shows 26167m which after a calculation of 26167 / 1000 equals to 26.167, I understand that 26.167 < 100 and that's why Keda did not scale my deployment.

Should I set the threshold I want according to <DesiredThreshold> / <DeploymentsReplicaCount> ?

@zroubalik
Copy link
Member

@denzhel I would definitely recommend setting the threshold to some lower number (5-10?), just to clarify that KEDA is working correctly. Then you can set the threshold to a number that fits your needs.

@denzhel
Copy link
Author

denzhel commented Mar 10, 2022

Hi @zroubalik, thank you for your answer !

Besides lowering my threshold to a lower number for debugging purposes, can you please explain why should I do that or was my theory on how HPA works is correct ?

@JorTurFer
Copy link
Member

Did you query that metric manually? Just for knowing what value is receiving/exposing KEDA
You can find how to do it here

@denzhel
Copy link
Author

denzhel commented Mar 10, 2022

@JorTurFer When I run this query in Prometheus:

keda_metrics_adapter_scaler_metrics_value{metric="s0-prometheus-outbound_rabbitmq_notifications_queue_count", namespace="prod-us-west-1"}

I get the following:

keda_metrics_adapter_scaler_metrics_value{container="keda-operator-metrics-apiserver",endpoint="metrics",instance="X.Y.Z.V:12345",job="monitoring/keda-operator-metrics-apiserver",metric="s0-prometheus-outbound_rabbitmq_notifications_queue_count",namespace="prod-us-west-1",pod="keda-operator-metrics-apiserver-6b49b56c89-xkcz5",scaledObject="prod-us-west-1-outbound-api",scaler="prometheusScaler",scalerIndex="0"} 37

However, when I run the following kubectl query:

kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/prod-us-west-1/s0-prometheus-outbound_rabbitmq_notifications_queue_count/"

I get:

Error from server: exactly one ScaledObject should match label 

@JorTurFer
Copy link
Member

There is a section at the end of the page explaining how to solve it :)

@denzhel
Copy link
Author

denzhel commented Mar 10, 2022

@JorTurFer

At the moment I checked, both Prometheus and the kubectl get --raw command returned the same value:

"value":"1525"

How ever, k get hpa returns:

95312m

Divided by 1000 due to the m unit is 95.312.
Multiply by 16 which is the number of the deployment pods is 1524.5(equals to the original metric).

Should I set my threshold lower ? 1525 is way above the threshold and currently that doesn't work because the threshold is the average number between all the pods.

@JorTurFer
Copy link
Member

okey, I think that I know what's happening...
I think that you are expecting to scale in base of that "raw" value but the real behaviour is that workload is scaling in base of the average of the pods. This is the expected behaviour because KEDA creates average metrics, that means that the HPA will try to fill the average between the "raw" value and the instances count.
image

If the upstream is returning for example 1600 and you have set the threshold in 100, you will have approx 16 instances because 1600/16=100.
Right now, you could understand the threshold like the desired value per pod

@denzhel
Copy link
Author

denzhel commented Mar 11, 2022

Threshold = desired value per pod

Got it.

It's not that trivial.

@zroubalik
Copy link
Member

FYI there's issue to support Value metric type and not just AverageValue: #2030

@denzhel Can we close this?

@denzhel
Copy link
Author

denzhel commented Mar 11, 2022

@zroubalik Yes.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

3 participants