Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS scaleOnInFlight not being respected #4276

Closed
miklapko opened this issue Feb 24, 2023 · 5 comments · Fixed by #4358
Closed

SQS scaleOnInFlight not being respected #4276

miklapko opened this issue Feb 24, 2023 · 5 comments · Fixed by #4358
Labels
bug Something isn't working

Comments

@miklapko
Copy link

miklapko commented Feb 24, 2023

Report

We have our ScaledObject configured as such:

spec:
  cooldownPeriod: 180
  maxReplicaCount: 2000
  minReplicaCount: 100
  pollingInterval: 5
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: $DEPLOYMENT
triggers:
    - metadata:
        awsRegion: $REGION
        identityOwner: operator
        queueLength: '2'
        queueURL: >-
          $QUEUE_URL
        scaleOnInFlight: 'true'
      type: aws-sqs-queue

KEDA is not scaling the deployment based on in-flight message count (AWS API is reporting the value correctly). It only looks at messages available, and we verified it by changing queueLength - 1 instead of 2 indeed causes KEDA to scale twice as high, but only if there are messages available.

As soon as messages available reach 0, no matter how many messages are in-flight (tested with 200-2000), KEDA will not scale unless messages available starts going up again.

This results in frequent and steep peaks and lows in scaling.

What is also weird is that it was working as expected before, and started behaving incorrectly when we added more ScaledObjects / ScaledJobs (not many, around 8) to this cluster.

We have tried recreating ScaledObject and restarting / redeploying KEDA completely, no dice. We're running EKS K8S 1.22 and are planning to upgrade, however, again, it worked as expected for some time. I haven't seen any specific fixes in newer versions / any similar issues open or closed.

Expected Behavior

Expected to see KEDA keeping up with in-flight message counter and keep, for example, deployment replicas at 250 for ~500 messages in-flight and 0 messages available.

Actual Behavior

KEDA does not keep up with messages in-flight and scales deployment to minimum.

Steps to Reproduce the Problem

  1. Deploy KEDA 2.8.1
  2. Deploy a ScaledObject with aforementioned config and deployment to scale.
  3. Have another ScaledObject / Deployment with scaleOnInFlight set to false.
  4. Have an SQS queue with constant ~200-2000 messages in-flight and coming in consistently.

Logs from KEDA operator

Nothing suspicious / no errors in logs, tried debug level as well.

KEDA Version

2.8.1

Kubernetes Version

< 1.23

Platform

Amazon Web Services

Scaler Details

SQS

Anything else?

No response

@miklapko miklapko added the bug Something isn't working label Feb 24, 2023
@miklapko
Copy link
Author

After some experiments and looking at the scaler code, it turns out if only one of many ScaledObjects in the cluster has scaleOnInFlight set to false, true isn't being respected for any other ScaledObject and metrics server doesn't count in-flight messages. I guess it's not the intended behaviour? Our dev tells me scaleOnInFlight is a global variable and that's the cause (I'm not much of a golanger myself).

@JorTurFer
Copy link
Member

Our dev tells me scaleOnInFlight is a global variable and that's the cause (I'm not much of a golanger myself).

What do you mean? It's a global variable inside the AWS SDK? inside the SQS Backend? I KEDA we store it as part of the scaler but each trigger has its own scaler instance. This behaviour you are describing sounds weird :( If you set scaleOnInFlight to true in all your scaledobjects , does it work? 🤔

@miklapko
Copy link
Author

Yes, if it's explicitly set to true in all ScaledObjects, it works as expected, same if it's not set at all and uses the default. However, just one trigger having it set to false seemingly overrides all the others. Speaking about "global variable" - I guess it's global for the scaler? Idk how go / go instances work, sorry.

@JorTurFer
Copy link
Member

JorTurFer commented Feb 26, 2023

I guess it's global for the scaler? Idk how go / go instances work, sorry.

Each trigger inside each ScaledObject generates its own scaler instance, so from KEDA code pov, it isn't global. I can't guarantee that internally the client doesn't use it globally somehow (maybe storing the value at scope level). I have to check it deeper to be 100% sure but this is interesting as this could be a strong limitation to share in docs.

@JorTurFer JorTurFer moved this from To Triage to To Do in Roadmap - KEDA Core Mar 13, 2023
@JorTurFer
Copy link
Member

JorTurFer commented Mar 13, 2023

Hi,
I think that I have found the problem. I'll open a PR soon, but you were right, once a ScaledObject sets scaleOnInFlight: false, all the ScaledObjects has that value set. Thanks for reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants