Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a new ScaledObject with the “paused” annotation is not working #6421

Open
Dark0096 opened this issue Dec 13, 2024 · 7 comments
Open
Labels
bug Something isn't working

Comments

@Dark0096
Copy link

Dark0096 commented Dec 13, 2024

Report

There is an issue where, after creating the initial ScaledObject with the “paused” annotation set, when resuming, the ScaledObject correctly generates the HPA, but the "Paused" condition state becomes True again, and the HPA remains in place.

Expected Behavior

  1. Init condition(Create the ScaledObject)
    Ready: True
    Active: True
    Fallback: False
    Paused: True
    HPA: None

  2. After resume(Paused Annotation Removed)
    Ready: True
    Active: True
    Fallback: False
    Paused: False
    HPA: New one created by the scaled object

Actual Behavior

  1. Init condition(Create the ScaledObject)
    Ready: True
    Active: Unknown
    Fallback: Unknown
    Paused: True
    HPA: None

  2. After resume(Paused Annotation Removed)

2-1. Init
Ready: True
Active: Unknown
Fallback: Unknown
Paused: False
HPA: New one created by scaled object

2-2. After a few seconds
Ready: True
Active: Unknown
Fallback: False
Paused: True
HPA: New one created by scaled object

Here is a video from the Lens app demonstrating the issue

Steps to Reproduce the Problem

  1. Write a ScaledObject manifest including the CPU and Cron Scalers.
  2. Add the paused annotation to the created manifest.
  3. Apply the created manifest using kubectl apply.

Logs from KEDA operator

There are no specific logs available.

KEDA Version

2.15.1

Kubernetes Version

1.30

Platform

Amazon Web Services

Scaler Details

cpu, memory, cron

Anything else?

Hi,

I’ve been using KEDA effectively, and I have a question.
Has the pattern of initially creating a ScaledObject resource with a paused annotation by default and resuming it as needed been considered?

If there are any parts of the code you suspect might be relevant, please let me know, and I’ll try making modifications myself.

Thank you!

@Dark0096 Dark0096 added the bug Something isn't working label Dec 13, 2024
@vtomasr5
Copy link

We have a similar issue with pausing.

If we add the autoscaling.keda.sh/paused-replicas: "10" annotation to the ScaledObject (SO) we see those logs in the keda-operator:

...
2024-12-18T15:49:18Z	ERROR	Reconciler error	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "ScaledObject": {"name":"querier","namespace":"mimir"}, "namespace": "mimir", "name": "querier", "reconcileID": "c931a2a2-0820-4847-9fd3-e88987b09e7e", "error": "ScaledObject paused replicas are being scaled"} 
  |   | /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 |  
  |   | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 |  
  |   | /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 |  
  |   | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem |  
  |   | /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316 |  
  |   | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler |  
  |   | /workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119 |  
  |   | sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile |  
  |   | /workspace/controllers/keda/scaledobject_controller.go:196 |  
  |   | github.com/kedacore/keda/v2/controllers/keda.(*ScaledObjectReconciler).Reconcile
...

Which comes from this code

The Active type is Unknown, an issue for the ArgoCD syncing process that gets stuck forever.

status:
  conditions:
  - message: ScaledObject check failed
    reason: UnknownState
    status: Unknown
    type: Active

@SpiritZhou
Copy link
Contributor

I believe there is a bug, but I am unable to reproduce it. Would you be able to provide the ScaledObject YAML?

@rickbrouwer
Copy link
Contributor

I can only reproduce this if I specify both in a conflicting situation (both false and specifying paused-replicas). This:

annotations:
    autoscaling.keda.sh/paused-replicas: "2"
    autoscaling.keda.sh/paused: "false"

I think it would be clearer if autoscaling.keda.sh/paused will be leading? So if it is set to false, it is really paused regardless of whether paused-replicas is filled?
It am also curious if Dark0096 and vtomasr5 specify both or not. If not, I am also curious about the ScaledObject.yaml.

@Dark0096
Copy link
Author

Dark0096 commented Dec 20, 2024

Hi, @SpiritZhou, @rickbrouwer.

This issue occurred when paused was used during the initial creation of a new resource.
Below is a reproducible example, including the YAML configuration.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  annotations:
    autoscaling.keda.sh/paused: 'false'
  labels:
    scaledobject.keda.sh/name: resource-name-1
  name: resource-name-1
  namespace: deployment-dev
spec:
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleDown:
          policies:
            - periodSeconds: 60
              type: Percent
              value: 30
          selectPolicy: Min
          stabilizationWindowSeconds: 300
        scaleUp:
          policies:
            - periodSeconds: 15
              type: Pods
              value: 4
            - periodSeconds: 15
              type: Percent
              value: 100
          selectPolicy: Max
          stabilizationWindowSeconds: 0
      name: resource-name-1
    scalingModifiers: {}
  cooldownPeriod: 300
  maxReplicaCount: 3
  minReplicaCount: 1
  pollingInterval: 30
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: resource-name-1
  triggers:
    - metadata:
        desiredReplicas: '1'
        end: 30 1 * * *
        name: lateNight
        start: 0 1 * * *
        timezone: Asia/Seoul
      type: cron
    - metadata:
        desiredReplicas: '3'
        end: 30 9 * * *
        name: earlyMorning
        start: 0 9 * * *
        timezone: Asia/Seoul
      type: cron
    - metadata:
        desiredReplicas: '3'
        end: 0 16 * * *
        name: earlyEvening
        start: 30 15 * * *
        timezone: Asia/Seoul
      type: cron
    - metadata:
        value: '55'
      metricType: Utilization
      type: cpu

Please feel free to let me know if you need any additional information.

@rickbrouwer
Copy link
Contributor

I use Keda 2.16.0 (with ArgoCD v2.12.8).

The only thing I can reproduce is the following:

When i initially place the above scaledObject with only a paused with value false I get the following status condition:

- status: Unknown
type: Paused

When I look at the code, an initial creation with the value false will give an Unknown status because the condition has never been true.

With a little adjustment in the reconcileScaledObject we can give a better status and message.

@Dark0096
Copy link
Author

Thank you for checking this out. @rickbrouwer.

Unfortunately, if you create a scaledObject with the paused annotation set to true,
�The same issue occurs with the following status and behavior.

  1. Init condition(Create the ScaledObject)
    Ready: True
    Active: Unknown
    Fallback: Unknown
    Paused: True
    HPA: None

  2. After resume(Paused Annotation Removed)
    Ready: True
    Active: Unknown
    Fallback: Unknown
    Paused: False
    HPA: New one created by scaled object

  3. After a few seconds
    Ready: True
    Active: Unknown
    Fallback: False
    Paused: True
    HPA: New one created by scaled object

Here is a video from the Lens app demonstrating the issue.

KEDA_.mov

@SpiritZhou
Copy link
Contributor

The tricky thing is that the paused status with SO will be changed from false to true. However, the kada operator will not update the paused status proactively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Triage
Development

No branches or pull requests

4 participants