Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding scaledownPeriod #882

Closed
tramperdk opened this issue Jan 18, 2024 Discussed in #881 · 11 comments · Fixed by #961
Closed

Understanding scaledownPeriod #882

tramperdk opened this issue Jan 18, 2024 Discussed in #881 · 11 comments · Fixed by #961

Comments

@tramperdk
Copy link

tramperdk commented Jan 18, 2024

Hi everyone! 👋🏻

We've been utilizing KEDA successfully for a while now, scaling our deployments and enabling scaling to zero.

Recently, I've been delving into implementing the http-add-on.
I expected that HTTPScaledObjects would mirror the behavior of ScaledObjects, featuring a target value (trigger result or, in this case, pending requests) and a cooldown period that resets with each trigger result / new request.

However, my initial testing suggests that HTTPScaledObjects doesn't quite work that way. Regardless of the number of requests sent through, it consistently adheres to the scaledownPeriod and scales the deployment down from the moment it scaled up + scaledownPeriod, even if it received a request just a second ago.

Is this the intended behavior? If so, is there a way to configure the resources to match the behavior of ScaledObjects, where the scaledownPeriod resets with every received request?

Additionally, we have customers with burst periods where they will send high volumes of requests per second. I want to ensure that we do not lose requests here via the interceptor.
I assume that if the interceptor is not performing, it will act as a sort of queue, and I'd like to avoid that.

Appreciate any insights or tips you can share! 👍🏻

@JorTurFer
Copy link
Member

Hello,
The HTTPScaledObject generates a ScaledObject pointing to the external scaler, so the behaviour is the same as you thought. The point here is that the add-on exposes the value of in-flight traffic every time when KEDA requests it. So if you have only 1 request per minute, probably KEDA sees 0.
Why? Because the add-on scaler isn't a counter of total requests, it's a counter of in-flight requests. When a request comes to the interceptor, the queue grows one, but once the request finishes, the queue decreases one. If you have a "constant flow", KEDA will see the scaler active because at any moment there are some requests. If you receive only some sporadic requests, KEDA will see the scaler as not active.

is there a way to configure the resources to match the behavior of ScaledObjects, where the scaledownPeriod resets with every received request?

KEDA doesn't work as that either. As I said above, if you have a single message randomly and unluckily KEDA doesn't check the queue when the queue has a message, the scaler will consider as not active.

I want to ensure that we do not lose requests here via the interceptor.
I assume that if the interceptor is not performing, it will act as a sort of queue, and I'd like to avoid that.

The interceptor is a proxy written in go and, as you have said, it acts as a sort queue. The point here is that the interceptor also has its own autoscaling, so if the amount of requests that it's processing grows, the interceptor will scale out. The interceptor will act as a queue more if your backend responds slowly, but if your backend responds fast, the interceptor is just another proxy in the middle like any other you can already have.
In general, you shouldn't have performance issues as it's just a proxy which autoscales, but I'd suggest performance tests covering the interceptor to configure and validate if it fits your requirement.

@tramperdk
Copy link
Author

The HTTPScaledObject generates a ScaledObject pointing to the external scaler, so the behaviour is the same as you thought. The point here is that the add-on exposes the value of in-flight traffic every time when KEDA requests it. So if you have only 1 request per minute, probably KEDA sees 0.
Why? Because the add-on scaler isn't a counter of total requests, it's a counter of in-flight requests. When a request comes to the interceptor, the queue grows one, but once the request finishes, the queue decreases one. If you have a "constant flow", KEDA will see the scaler active because at any moment there are some requests. If you receive only some sporadic requests, KEDA will see the scaler as not active.

I now understand the reason behind my initial misconception.

We utilize two additional scalers, MSSQL and RabbitMQ. These scalers consistently accumulate data and maintain longer entries. Consequently, I erroneously perceived this as a "reset" of the timer. In reality, it merely returns true due to the presence of data.

Regarding in-flight messages, I assumed that our rapid message handling process results in minimal in-flight messages, multiple customers handle similar small requests and as a result, accumulation is minimal, and the consequence is a continuous scaling up and down from zero.

While I acknowledge that the http-addon wasn't explicitly designed for our scenario, I believe our setup is not unique.
Do you have any advice on configuration options or perhaps recommend another product better suited to our needs?

@JorTurFer
Copy link
Member

Regarding in-flight messages, I assumed that our rapid message handling process results in minimal in-flight messages, multiple customers handle similar small requests and as a result, accumulation is minimal, and the consequence is a continuous scaling up and down from zero.

While I acknowledge that the http-addon wasn't explicitly designed for our scenario, I believe our setup is not unique.
Do you have any advice on configuration options or perhaps recommend another product better suited to our needs?

This could happen in scenarios with small loads (I mean, a few random requests during the time), for them , I'd suggest increasing the cooldownPeriod from 300 to 900 or 1800. Increasing the cooldown you can mitigate the issue.

I understand that this behaviour could not fit with all the scenarios, but just returning a counter for total requests introduces other challenges like the aggregation over the time (1min? 5 min? 30sec?), which can introduce extra load in the system and which should be configurable.

Currently, the metric returns the real count of requests (which IMHO is the most accurate way in high load scenarios), but I guess that we can discuss how to adapt this for other cases.

Maybe we could store and propagate the last request timestamp and return if the scaler is active or not based on it 🤷
WDYT @tomkerkhove @t0rr3sp3dr0 ?

@tramperdk
Copy link
Author

This could happen in scenarios with small loads (I mean, a few random requests during the time), for them , I'd suggest increasing the cooldownPeriod from 300 to 900 or 1800. Increasing the cooldown you can mitigate the issue.

I created a scenario that's not uncommon for us with the following PowerShell snippets:

$RequestsPerBatch = 1000
$BatchCount = 15
$IntervalBetweenBatches = 1  # in milliseconds

# Function to execute a single request
function Execute-Request {
    param (
        [string]$Uri
    )
    
    $StartTime = Get-Date
    try {
        $Result = Invoke-WebRequest -Uri $Uri -UseBasicParsing
        $EndTime = Get-Date
        $TimeTaken = ($EndTime - $StartTime).TotalMilliseconds

        [PSCustomObject]@{
            'Uri'        = $Uri
            'StatusCode' = $Result.StatusCode
            'TimeTaken'  = $TimeTaken
            'Content'    = $Result.Content
        }
    } catch {
        $ErrorDetails = $_.Exception | Select-Object -Property Message, Source, HResult, InnerException
        [PSCustomObject]@{
            'Uri'          = $Uri
            'StatusCode'   = if ($Result -ne $null) { $Result.StatusCode } else { $null }
            'Error'        = $ErrorDetails.Message
            'Source'       = $ErrorDetails.Source
            'HResult'      = $ErrorDetails.HResult
            'InnerMessage' = $ErrorDetails.InnerException.Message
        }
    }
}

# Get the function's definition as a string
$FuncDef = ${function:Execute-Request}.ToString()

$ResultCollection = 1..$BatchCount | Foreach-Object {
    1..$RequestsPerBatch | Foreach-Object -ThrottleLimit $RequestsPerBatch -Parallel {
        ${function:Execute-Request} = $using:FuncDef
        Execute-Request -Uri $using:TestUri
    }
}

$ResultCollection

I've run the above load test script, which sends 15 batches of 1000 requests with a 1-millisecond interval between batches.
It effectively scales up the deployment and maintains the desired traffic, there are considerations for overhead in the script though such as creating custom objects and recreating the function in each runspace.
In this scenario, the client becomes the limiting factor.

We observed that it works well, witnessing successful deployment scaling and effective traffic handling.
The load sustains "life" on the pods during the test, so they don't scale down.
In some cases, particularly when the client experiences 100% CPU usage, the deployment scales to zero and then back up, but typically it remains at 3 and maintains stability throughout the test.

In a closed test environment, this approach proves effective.
However, in real-life scenarios, customers don't always send all their requests in a single stream.
Instead, requests arrive intermittently, sometimes in volumes comparable to the test and other times in smaller quantities, but still multiple requests over a longer period.
Because our hosts handle the incoming request so fast, it results in no inflight messages, hence the scale to zero.
So we end up with a picture of hosts going up and down continuously.

Having a timestamp for the last request and incorporating a cooldown period based on that would be an ideal setup, at least for us.
This approach aligns more closely with real-world scenarios where customers submit requests as they come in.

Regarding the Polling Interval for the ScaledObject, if the interval is extended, will it cause requests for the initial scale-up to accumulate until the scaler checks for in-flight messages?
This could potentially lead to delayed scaling and ergo lost messages.

@JorTurFer
Copy link
Member

Regarding the Polling Interval for the ScaledObject, if the interval is extended, will it cause requests for the initial scale-up to accumulate until the scaler checks for in-flight messages?

No no, I didn't mean pollingInteval, I meant scaledownPeriod. It is translated into the KEDA's cooldownPeriod, so if you set 900 instead of keeping default value (300), it means that instead of scaling to 0 after 5 minutes "without traffic", it will wait 15 to scale to 0, so you have 3 times more time to "detect" any in-flight request

@tramperdk
Copy link
Author

Regarding the Polling Interval for the ScaledObject, if the interval is extended, will it cause requests for the initial scale-up to accumulate until the scaler checks for in-flight messages?

No no, I didn't mean pollingInteval, I meant scaledownPeriod. It is translated into the KEDA's cooldownPeriod, so if you set 900 instead of keeping default value (300), it means that instead of scaling to 0 after 5 minutes "without traffic", it will wait 15 to scale to 0, so you have 3 times more time to "detect" any in-flight request

Yeah we could do that, but I still feel like I'm just "pushing the problem" to future me then :)

@JorTurFer
Copy link
Member

it gives you some time until we decide how to proceed xD

I get your point and probably we should think about what being active means on HTTP workloads as it's quite different than queue consumers, but I'd like to know also @tomkerkhove and @t0rr3sp3dr0 thoughts about this.

@tramperdk
Copy link
Author

it gives you some time until we decide how to proceed xD

I get your point and probably we should think about what being active means on HTTP workloads as it's quite different than queue consumers, but I'd like to know also @tomkerkhove and @t0rr3sp3dr0 thoughts about this.

I appreciate the "open-ness" 🥇, for now, I'll template a longer cooldownPeriod so we at least don't scale down often during peaks of traffic periods.

Keep me in the loop and thank you for the work you do 👍🏻

@thincal
Copy link

thincal commented Jan 22, 2024

"store and propagate last request timestamp and return if the scaler is active", it sounds like a good solution.

Copy link

stale bot commented Mar 22, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Mar 22, 2024
@JorTurFer
Copy link
Member

As a quick update, we have decided to implement some aggregation modes for the metrics instead of the following the approach of managing the IsActive by the scaler directly

@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Mar 24, 2024
@github-project-automation github-project-automation bot moved this from To Triage to Done in Roadmap - KEDA HTTP Add-On Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
3 participants