Adding Scaling Strategy for ScaledJob #1227

TsuyoshiUshio · 2020-10-07T06:46:59Z

I talked with Several Customers of the ScaledJob. The use case is totally different, and the request for the Scaling Logic is also different. Also I found some limitation for many scalers. For considering, all of these, I'd like to introduce new scale logic.

Probably, we can discuss the custom scale logic that I talk you later.

What do we need to know?

1. `ScaledJob` scaling expectation

HPA scaler is about Scale-In, and Scale-Out, however, ScaledJob has no Scale-In/Out If new messages arrives in the queue, it should create new jobs until it reaches maxReplicaCount.

2. Limitation of the `queueLength` implementation

Scalers use various SDKs. The behavior is different. If we fetch the queueLength, it usually includes locked message and there is no way to fetch the number of message that not locked.
The ideal metric is the number of message that is not locked. Why? The ideal number of the scale will be newly arrived messages. There are several jobs running, however, we can ignore it until it reaches MaxReplicaCount. That is the ideal behavior. For getting newly arrived message, it should not includes the number of locked message that means, messages that is already consumed by jobs. However, a lot of implementation of SDKs includes the locked messages as a number of queueLength. That causes an issue for ScaledJob.

The kind of scalers

a. Some of them can fetch the queueLength as the number of all messages - locked messages. ideal metrics e.g. Azure Storage Queue.
b. Some of them is not. e.g. Azure Service Bus. It will return queueLength as the number of all messages

We need to adjust two use cases. In case of a, Things is easy, it is ideal metric. We can have one firm scale logic for Scaled Job.
However, in case of b. We need to compromise the scale logic. And it might be very different for each use cases.

I introduce three scaling strategy default, custom , accurate. For the a. I provide accurate. For the b. I provide default and custom. If you don't specify anything, it will be default

The logic of three strategies

queueLength: the length of messages on a queue
runningJobCount: the number of running jobs

default

The original logic of ScaledJob. It is default to protect from a breaking change.

queueLength - runingJobCount

custom

You can configure customScalingQueueLengthDeduction and customScalingRunningJobPercentage on the ScaledJob definition. Users can change the parameter based on the default logic.

queueLength - customScalingQueueLengthDeduction - (runningJobCount * customScalingRunningJobPercentage)

However, it doesn't exceed MaxReplicaCount.

accurate

If the scaler returns ideal metrics or, customer delete a message once consumed it, it will be the ideal number of the queue then you can use this strategy.

	if (maxScale + runningJobCount) > maxReplicaCount {
		return maxReplicaCount - runningJobCount
	}
	return queueLength

Example Configuration

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: azure-servicebus-queue-consumer
  namespace: default
spec:
  jobTargetRef:
    template:
      spec:
        containers:
        - name: azure-servicebus-queue-receive
          image: tsuyoshiushio/azure-servicebus-queue-client:dev
          imagePullPolicy: Always
          command: ["receive"]
          envFrom:
            - secretRef:
                name: azure-servicebus-queue-secret
        restartPolicy: Never
    backoffLimit: 4  
  pollingInterval: 5   # Optional. Default: 30 seconds
  cooldownPeriod: 30   # Optional. Default: 300 seconds
  maxReplicaCount: 100  # Optional. Default: 100
  successfulJobsHistoryLimit: 5
  failedJobsHistoryLimit: 5
  scalingStrategy: "custom"
  customScalingQueueLengthDeduction: 1
  customScalingRunningJobPercentage: "0.5"

Checklist

Commits are signed with Developer Certificate of Origin (DCO)
Tests have been added
A PR is opened to update the documentation on https://github.com/kedacore/keda-docs
Changelog has been updated

Fixes #
#1186
#1207

Documentation

kedacore/keda-docs#279

Signed-off-by: Tsuyoshi Ushio <[email protected]>

zroubalik

Looking good, minor comments

pkg/scaling/executor/scale_jobs.go

api/v1alpha1/scaledjob_types.go

Signed-off-by: Tsuyoshi Ushio <[email protected]>

TsuyoshiUshio · 2020-10-08T05:49:01Z

Hi @zroubalik
I accept all your advice.

zroubalik

LGTM, thanks @TsuyoshiUshio !

Signed-off-by: Tsuyoshi Ushio <[email protected]>

orsher · 2020-10-28T12:34:15Z

Hey,

I tried using "accurate" strategy, but because it takes pods lots of time to actually get started and consume the message, every polling_time, the trigger is creating new pods for messages that pods were already created for.
In our case, we use cluster auto scaler and we start with 0 nodes what will make the first few pods to wait a few minutes before even starting and consuming the message but I think there are lots of other situations where pods won't necessarily get started between each poll. Some cap on the number of running pods for example would cause the same issue.

Is that the expected behaviour? Am I missing something here?

If that's the way it works, distinguishing between runningJobs and waitingJobs somehow could help calculating the right amount of new pods to schedule.

zroubalik · 2020-10-29T08:20:12Z

@TsuyoshiUshio PTAL

TsuyoshiUshio · 2020-11-06T04:37:08Z

Hi @orsher
The accurate strategy assume that the number of the queue length doesn't include the running job.
It depends on a scaler you choose and the implementation of the job.

For example, some customer using this strategy with ServiceBus. The service bus scaler queue length includes running jobs.
However the customer's job is once consume a message, it will delete from the queue immediately. That is why the expected scale will be queue length. The queue length doesn't include running jobs.

Which scaler do you use? If your scaler's queue length include the number of running jobs, and your job consume a message then do something, after finishing the job, the message is deleted from queue, then you can use default strategy.
If it is not the either case you can use custom strategy.

However, we need to learn more use case, If your use case is not I described, could you share which scaler, and how is your job's message consuming strategy? I'd happy to hear that and reflect it to keda.

orsher · 2020-11-08T06:50:02Z

Hey @TsuyoshiUshio ,
Sure,
We're using rabbitmq and we're deleting the message right after the message is consumed and only after processing the task itself.
So, according to the docs and what you just described, at least if I got it right, the accurate strategy should be a good fit here.
I'll try to describe our use case and what we are experiencing using an example.
time 0 - a producer sends x messages to rabbitmq
time 1 - KEDA rabbitmq trigger for the first time and schedule x jobs
time 2 - Jobs are waiting for scheduling by Kubernetes, might task some time depending on the current available resources
time 3 - KEDA rabbitmq trigger for the second time and schedule again x jobs because non of the current submitted jobs got schedule and consumed a message

We get much more jobs sent than what we need and it will keep sending some more jobs until there will be available resources for the jobs to actually start running, consume and ack the messages.

I hope that this flow is clear.

Am I missing something here?

fjmacagno · 2020-11-13T00:30:39Z

I think there is a typo in this documentation that has been confusing me:

in

if (maxScale + runningJobCount) > maxReplicaCount {
	return maxReplicaCount - runningJobCount
}
return queueLength

maxScale and queueLength should be the same thing.

@orsher my plan to deal with that is to increase the poll interval, and write my consumers to shut themselves down if there is no message to consume. But i agree, it should take into account pending jobs when calculating the new scale, probably something like

if ((queueLength - scheduledJobCount) + runningJobCount) > maxReplicaCount {
	return maxReplicaCount - runningJobCount
}
return (queueLength - scheduledJobCount)

which could be simplified to

return min(queueLength - scheduledJobCount, maxReplicaCount - runningJobCount)

audunsol · 2020-11-17T14:13:04Z

I see the exact same problem as @orsher using Azure Storage Queues. It is kind of independent of the queue type and strategy used right now, as it is a problem with an inner fast control loop scaling up the number of jobs a lot faster than our outer control loops making resources available (scaling out virtual machines/nodes in our case).

We have already made our jobs do some queue message polling 10 times, then shut down if there are no messages, so the extra jobs are not a big problem from that standpoint.

But since some of our jobs are very resource intensive and long running (which is kinda the point of using Jobs instead of Deployments for us), we have to put some high resource.request values for memory and CPU on our jobs, which in turn makes our nodes exhausted quickly and then requires the cluster autoscaler to respond by starting new machines (may take a minute or two). This in turn makes KEDA scale up more jobs (since the queue is still not drained within our poll interval), which in turn schedules more high-demand jobs, which in turn requests more nodes. At the end, in the worst cases, we have spun up a lot of new nodes that end up running some very simple "poll 10 times then shut down" job (since there are a lot more jobs than queue messages). So we are starting and stopping a lot of machines as it is now. Also, we then get the problem of scaling down nodes when some of them are doing long running jobs.

There are of course a lot of variables we can tweak, like the polling interval, but we would like to start our long running jobs as fast as possible (i.e. having a short poll interval), and have a design that allows some slower startup only when we have bursty requests.

Taking scheduledJobCount into account would probably solve most of this for us.

Edit: I think this issue is covering the problem already: #1323

TsuyoshiUshio · 2020-11-19T02:57:57Z

Sorry for the inconvenience. I'll have a look However, we should talk in #1323 rather than merged PR.
In this case which strategy (or parameters) would you useful for you guys.
I create the accurate strategy based on a customer's issue that fit for them. (I though it might required for others)
In this case, your container takes times. Already several guys report that, so I need to do something. What @fjmacagno says are very interesting. My strategy is the same as you mentioned (longer polling interval and on the container, once the message is 0 just quit) However, the idea of scheduledJobCount. Let me investigate it. Thank you for the feedback guys!

Signed-off-by: Thomas Lamure <[email protected]>

ChayanBansal · 2021-06-14T13:02:41Z

Hi @orsher
The accurate strategy assume that the number of the queue length doesn't include the running job.
It depends on a scaler you choose and the implementation of the job.

For example, some customer using this strategy with ServiceBus. The service bus scaler queue length includes running jobs.
However the customer's job is once consume a message, it will delete from the queue immediately. That is why the expected scale will be queue length. The queue length doesn't include running jobs.

Which scaler do you use? If your scaler's queue length include the number of running jobs, and your job consume a message then do something, after finishing the job, the message is deleted from queue, then you can use default strategy.
If it is not the either case you can use custom strategy.

However, we need to learn more use case, If your use case is not I described, could you share which scaler, and how is your job's message consuming strategy? I'd happy to hear that and reflect it to keda.

Hey @TsuyoshiUshio,

I am using Azure Queue Storage and from what I understand it gives the count of all messages (invisible/running and visible/non-running). You have mentioned that in such cases we should use 'default' strategy, but the official doc recommends the use of 'accurate' strategy for Azure Queue Storage.
Can you please clarify.

Adding Scaling Strategy

17c99db

Signed-off-by: Tsuyoshi Ushio <[email protected]>

TsuyoshiUshio requested review from ahmelsayed and zroubalik as code owners October 7, 2020 06:46

TsuyoshiUshio mentioned this pull request Oct 7, 2020

Add explanation of three scale strategies for Scaled Job kedacore/keda-docs#279

Merged

zroubalik reviewed Oct 7, 2020

View reviewed changes

Update log message and structure of yaml file

c746513

Signed-off-by: Tsuyoshi Ushio <[email protected]>

zroubalik approved these changes Oct 8, 2020

View reviewed changes

zroubalik merged commit 9433bc1 into v2 Oct 8, 2020

zroubalik deleted the tsushi/customlogic branch October 8, 2020 07:31

TsuyoshiUshio mentioned this pull request Oct 8, 2020

[BUG] Azure Queue Job Scaler continues to scale up Jobs while the message is being treated #636

Closed

silenceper pushed a commit to silenceper/keda that referenced this pull request Oct 10, 2020

Adding Scaling Strategy for ScaledJob (kedacore#1227)

9bd6e7e

Signed-off-by: Tsuyoshi Ushio <[email protected]>

zroubalik mentioned this pull request Oct 13, 2020

Changelog: Add Scaling Strategy for ScaledJob #1246

Merged

1 task

thomas-lamure added a commit to thomas-lamure/keda-charts that referenced this pull request Dec 2, 2020

Update CRDs according to kedacore/keda#1227

092c255

tomkerkhove pushed a commit to kedacore/charts that referenced this pull request Dec 2, 2020

Update CRDs according to kedacore/keda#1227 (#108)

7495a0a

Signed-off-by: Thomas Lamure <[email protected]>

fjmacagno mentioned this pull request Dec 11, 2020

Implements PendingJobCount to fix kedacore/keda#1323 #1391

Merged

3 tasks

kranthikumar-kvs-by mentioned this pull request Aug 30, 2022

Azure storage account queue scaler should allow users to select peek or approximate method #3555

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Scaling Strategy for ScaledJob #1227

Adding Scaling Strategy for ScaledJob #1227

TsuyoshiUshio commented Oct 7, 2020 •

edited

Loading

zroubalik left a comment

TsuyoshiUshio commented Oct 8, 2020

zroubalik left a comment

orsher commented Oct 28, 2020

zroubalik commented Oct 29, 2020

TsuyoshiUshio commented Nov 6, 2020

orsher commented Nov 8, 2020

fjmacagno commented Nov 13, 2020 •

edited

Loading

audunsol commented Nov 17, 2020 •

edited

Loading

TsuyoshiUshio commented Nov 19, 2020 •

edited

Loading

ChayanBansal commented Jun 14, 2021

Adding Scaling Strategy for ScaledJob #1227

Adding Scaling Strategy for ScaledJob #1227

Conversation

TsuyoshiUshio commented Oct 7, 2020 • edited Loading

What do we need to know?

1. ScaledJob scaling expectation

2. Limitation of the queueLength implementation

The kind of scalers

The logic of three strategies

default

custom

accurate

Example Configuration

Checklist

Documentation

zroubalik left a comment

Choose a reason for hiding this comment

TsuyoshiUshio commented Oct 8, 2020

zroubalik left a comment

Choose a reason for hiding this comment

orsher commented Oct 28, 2020

zroubalik commented Oct 29, 2020

TsuyoshiUshio commented Nov 6, 2020

orsher commented Nov 8, 2020

fjmacagno commented Nov 13, 2020 • edited Loading

audunsol commented Nov 17, 2020 • edited Loading

TsuyoshiUshio commented Nov 19, 2020 • edited Loading

ChayanBansal commented Jun 14, 2021

TsuyoshiUshio commented Oct 7, 2020 •

edited

Loading

1. `ScaledJob` scaling expectation

2. Limitation of the `queueLength` implementation

fjmacagno commented Nov 13, 2020 •

edited

Loading

audunsol commented Nov 17, 2020 •

edited

Loading

TsuyoshiUshio commented Nov 19, 2020 •

edited

Loading