Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEDA scaledjob with accurate strategy doesn't consider messages in flight when calculating Scale. #1323

Closed
kiran-bjn opened this issue Nov 12, 2020 · 4 comments · Fixed by #1391
Assignees
Labels
bug Something isn't working

Comments

@kiran-bjn
Copy link

kiran-bjn commented Nov 12, 2020

A clear and concise description of what the bug is.

 Keda job scaler spawns new pods indefinitely if the spawned pods don't consume the messages in the queue within the next polling Iteration. 

When the strategy specified is accurate, It is expected that the Scaler's SDK can return queuelength not including messages that are being processed for queues like AWS-SQS etc. So that the scale depends solely on the queuelength. But in cases when there's a resource crunch in the cluster, the spawned pods might not get placed onto the nodes until the next polling iteration, which means they don't consume the messages from the queue either. During the next poll the scaler again spawns the new pods equal to the queuelength.

Expected Behavior

Eg:

Trigger : SQS
Available Messages : 100
Messages locked : 50
Pods Running : 50
Poll Duration : 60 seconds
Target Average Value = 1

At polling Iteration T1, 100 new pods are spawned. None of the pods were successfully scheduled. no messages were consumed.
At polling Iteration T2, 100 messages are still available in the queue. but no new pods are created as TotalPodsCurrentlyActive = messagesAvailable + messagesLocked.

Actual Behavior

At polling Iteration T2, 100 messages are still available in the queue. So 100 new pods are created again assuming the messages were received after the last polling Iteration.

Specifications

  • KEDA Version:2.0.0-rc2
  • Scaler(s): SQS
@kiran-bjn kiran-bjn added the bug Something isn't working label Nov 12, 2020
@zroubalik
Copy link
Member

@TsuyoshiUshio PTAL

@TsuyoshiUshio
Copy link
Contributor

TsuyoshiUshio commented Nov 19, 2020

As we discussed on the closed PR, scheduledJobCount might be help. I'm not sure who we can make it, so that, let me investigate it. Until then, longer polling interval and if the container has 0 message, it will quit. might work.

@kiran-bjn
Copy link
Author

Thanks @TsuyoshiUshio , I am already using a longer polling Interval and exiting the job if queue is empty as a non-blocker. For our use-case, we want the scaling to be very responsive and not have any delay in the system due to the polling Interval. Will wait for your Update.

@TsuyoshiUshio
Copy link
Contributor

Thank you for your feedback. I'm grad people using scaled job for many use cases!

thomas-lamure added a commit to thomas-lamure/keda that referenced this issue Dec 2, 2020
thomas-lamure added a commit to thomas-lamure/keda that referenced this issue Dec 2, 2020
zroubalik pushed a commit that referenced this issue Dec 11, 2020
ycabrer pushed a commit to ycabrer/keda that referenced this issue Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants