Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cron-scaler scales higher than expected #5820

Closed
dagvl opened this issue May 22, 2024 · 6 comments
Closed

cron-scaler scales higher than expected #5820

dagvl opened this issue May 22, 2024 · 6 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@dagvl
Copy link

dagvl commented May 22, 2024

Report

When using cron triggers and a scaleDown policy of 5% pods every 2 seconds, the deployment never scales down to the expected number of pods.

E.g. if i have a cron trigger saying 20 pods, and then I edit that cron trigger to 10 pods, the deployment scales down to 11 pods instead of 10.

This is related to the scaleDown policy, because if i set a policy of 100% Pods every 2 seconds, it correctly scales down to 10 pods.

Expected Behavior

I expect the number of replicas to match the desiredReplicas in the cron trigger

Actual Behavior

I get more than 10 replicas.

Steps to Reproduce the Problem

First create a ScaledObject referencing a deployment with a cron trigger requesting 20 pods:

spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
          - periodSeconds: 2
            type: Percent
            value: 5
          - periodSeconds: 2
            type: Pods
            value: 1
          stabilizationWindowSeconds: 2
      name: scaled-object-test-hpa
    scalingModifiers: {}
  cooldownPeriod: 2
  fallback:
    failureThreshold: 3
    replicas: 1
  maxReplicaCount: 200
  minReplicaCount: 10
  pollingInterval: 30
  scaleTargetRef:
    name: scaled-object-test
  triggers:
  - metadata:
      value: "80"
    metricType: Utilization
    type: cpu
  - metadata:
      desiredReplicas: "20"
      end: 59 23 * * 6
      start: 0  0  * * 0
      timezone: Europe/Oslo
    type: cron

(note that this has a CPU utilization trigger also just because of the internal tooling we use to generate the ScaledObject, but this trigger is not a factor as average CPU usage is 0% in my pods [its an idle nginx container]).

Note the scaleDown setting allows max(1, pods*0.05) to scale down.

Apply this scaledobject and see that the deployment scales up to 20

then change desiredReplicas to 10 and reapply.

The deployment starts to slowly scale down, but the scaledown ends at 11 replicas instead of 10.

If you set the policy to 100% percent and do the same thing, the scaledown ends at 10 pods as expected.

Logs from KEDA operator

No response

KEDA Version

2.14.0

Kubernetes Version

1.29

Platform

Amazon Web Services

Scaler Details

cron

Anything else?

One thought that crossed my mind, but i can't verify ,is that this is the HPA scaling to within a tolerance level instead of to a exact value.

E.g. right now, i have the cron desiredReplicas set to 10, but the deployment is stuck at 11.

If i look at the HPA, i see this:

  "s1-cron-Europe-Oslo-00xx0-5923xx6" (target average value):  910m / 1

10/11 = 0.910 which seems like cron is emitting the correct metric but the hpa is not reacting to it. In the production case it is similar:
The scaledObject cron trigger is emitting 220 desiredReplicas, but we have 244 currently. Looking at the hpa we have:

  "s1-cron-Australia-Sydney-01xx1-010xx4" (target average value):  902m / 1

220/244 = 0.902 so again we are within 10% of the target value

@dagvl dagvl added the bug Something isn't working label May 22, 2024
@JorTurFer
Copy link
Member

JorTurFer commented May 26, 2024

Hello,
You're right, the problem here is the 10% of tolerance and currently there isn't any solution :(
IDK if @SpiritZhou will finally contribute to the upstream with this feature, do you have any extra info @SpiritZhou ?

@SpiritZhou
Copy link
Contributor

I am still working on it.

Copy link

stale bot commented Jul 26, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jul 26, 2024
Copy link

stale bot commented Aug 3, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Aug 3, 2024
@jpriebe
Copy link

jpriebe commented Nov 13, 2024

We are also experiencing an issue like this. We use a cron trigger along with an SQS trigger.

The cron trigger brings up the pod count to 500 in anticipation of a daily workload and holds it there throughout the expected workload period.

The workload starts, the SQS trigger scales the pods to some higher number (say 900 pods). As the SQS queue comes under control, the scaledown starts. It never quite gets back to 500. It will land somewhere like 540 pods, which is within 10% of the cron desired value.

But that's 40 too many pods, and this has real cost implications.

@JorTurFer
Copy link
Member

We are also experiencing an issue like this. We use a cron trigger along with an SQS trigger.

The cron trigger brings up the pod count to 500 in anticipation of a daily workload and holds it there throughout the expected workload period.

The workload starts, the SQS trigger scales the pods to some higher number (say 900 pods). As the SQS queue comes under control, the scaledown starts. It never quite gets back to 500. It will land somewhere like 540 pods, which is within 10% of the cron desired value.

But that's 40 too many pods, and this has real cost implications.

I understand the issue, but KEDA relies on the HPA controller and we can't do it if the upstream doesn't support it. I'd suggest asking about it in the upstream issue -> kubernetes/kubernetes#116984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

4 participants