Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Until this PR, we only supported scaling based on "pending" requests. This can work on high load scenarios as it measures directly the amount of requests on a given moment, but it doesn't work for other scenarios because the scaler doesn't aggregate the requests in any way.
With this PR, the add-on will support 2 scaling options, based on concurrency (current approach just renamed) and based on the request rate for a given time window (more reasons for the renaming below).
These options are configured though a new section:
The idea behind this change is adding support (in the future) for both metrics as RPS is a good metric for regular scaling as it's more fuzzy thanks to the time window, but in the other hand, peaks are better handled by concurrency. Having this future change in mind, it makes sense to move the metric configuration to a nested section instead of sharing the
targetPendingRequests
key (which is also not aligned with new naming).During the PR I've found some small changes (such as release process to include the docs update or the logger unification to have a better observability) that I've included as part of this PR to not forget them (or because they help me directly, like the logger change)
Why have I renamed the scaling?
Although pending seems worth, being accurate we are scaling based on concurrent request (or in-flight), pending can sound like request not proxied yet, but already proxied requests are taken into account even though it's the backend who hasn't answered yet. Using concurrent, we are more accurate with the real scaling behavior.
There is another reason behind this change and it's to be aligned with current Knative scaling naming. At this moment, Knative is "the king" from HTTP scaling pov, so aligning our naming with them can make sense for making the things easier for end-users.
Checklist
README.md
docs/
directoryFixes #882
Fixes #958