Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value #3809

Conversation

toniopelo
Copy link
Contributor

@toniopelo toniopelo commented Nov 2, 2022

Until now, the keda nats jetstream scaler did only use the num_pending value returned by the nats monitoring endpoint for a specifc consumer. The problem was that messages that should be re-delivered (retried) are not returned as part of the num_pending counter but they are in a separate counter called num_ack_pending.
This PR use the sum of these two counter to determine the value that should be used by keda instead of only the num_pending value.

This fixes two problems:

  • Keda would never scale up a deployment/job based on a consumer if this consumer only has messages that are waiting for a retry.
  • Keda would scale down deployment/job too fast because when a consumer pulls a message from nats, it decrements immediatly the num_pending counter and increment the num_ack_pending counter. Keda would then think that there is no work to be done and scale down the deployment/job that are still processing the messages (after a cooldown time if setup).

Checklist

  • When introducing a new scaler, I agree with the scaling governance policy
  • Commits are signed with Developer Certificate of Origin (DCO - learn more)
  • Tests have been added
  • A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
  • A PR is opened to update the documentation on (repo) (if applicable)
  • Changelog has been updated and is aligned with our changelog requirements

Fixes #3787

Relates to #

@toniopelo toniopelo requested a review from a team as a code owner November 2, 2022 21:07
@toniopelo toniopelo force-pushed the fix/nats-jetstream-message-redelivery-are-ignored branch from 0cf7bbc to 1d259f3 Compare November 2, 2022 21:09
@toniopelo toniopelo changed the title fix: count messages that should be retried as pending messages (value used for scaling) fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value Nov 2, 2022
@toniopelo
Copy link
Contributor Author

@JorTurFer That was a pretty straightforward fix after all :).
I didn't add tests because this specific line was not tested before (so it seems) and I don't have the golang knowledge to setup tests, I hope this is not a no-go. Else, we would have to find somebody to add tests on this PR or at least provide me with some guidance on how to do it :). But as it's a single line change it should leave the code coverage untouched.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 4, 2022

/run-e2e nats*
Update: You can check the progress here

@toniopelo
Copy link
Contributor Author

toniopelo commented Nov 5, 2022

@JorTurFer e2e passed, I checked the last point in the list to make the PR checks happy and updated the branch.
Do you think this can be merged an released ?

@zroubalik
Copy link
Member

zroubalik commented Nov 7, 2022

@toniopelo could you please rebase this PR? I think we can merge it then. Thanks!

…t of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Antoine Laffargue <[email protected]>
@toniopelo toniopelo force-pushed the fix/nats-jetstream-message-redelivery-are-ignored branch from 8735888 to 9ed545e Compare November 7, 2022 18:33
@toniopelo
Copy link
Contributor Author

@zroubalik Nice! Just rebased it :)

@JorTurFer
Copy link
Member

JorTurFer commented Nov 9, 2022

/run-e2e nats*
Update: You can check the progress here

@JorTurFer JorTurFer enabled auto-merge (squash) November 9, 2022 07:26
@JorTurFer JorTurFer merged commit 971ab94 into kedacore:main Nov 9, 2022
@toniopelo
Copy link
Contributor Author

When can I expect this to be released @JorTurFer ?
The scaler is unusable in production right now :/

@JorTurFer
Copy link
Member

Hi @toniopelo ,
You can see the expected release dates in the roadmap.md file. Due to KubeCon we delayed next release to December, the only suggestion I can give if you need this fix immediately, is to use main tag directly. main tag is generated on every commit, so it has these changes, but it could be not stable, so if you use it, I'd suggest to pull from main and push to another registry in order to freeze the version and reduce the chance of having errors.

@toniopelo
Copy link
Contributor Author

Hi @JorTurFer, thanks for the information, that's crystal clear!
I'll see what I do, thanks for everything :)

@JorTurFer
Copy link
Member

You're welcome, happy to help

@JorTurFer JorTurFer mentioned this pull request Jan 17, 2023
1 task
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 18, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>

* chore: update changelog

Signed-off-by: Antoine Laffargue <[email protected]>

Signed-off-by: Antoine Laffargue <[email protected]>
@pedro-stanaka pedro-stanaka mentioned this pull request Jan 18, 2023
7 tasks
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 18, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>

* chore: update changelog

Signed-off-by: Antoine Laffargue <[email protected]>

Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 19, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>

* chore: update changelog

Signed-off-by: Antoine Laffargue <[email protected]>

Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
pedro-stanaka pushed a commit to pedro-stanaka/keda that referenced this pull request Jan 19, 2023
…red (waiting for ack) towards keda value (kedacore#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>

* chore: update changelog

Signed-off-by: Antoine Laffargue <[email protected]>

Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
JorTurFer added a commit that referenced this pull request Jan 19, 2023
* fix: CVE-2022-3172 (#3693)

Signed-off-by: Pedro Tanaka <[email protected]>

* fix: Respect optional parameter inside envs for ScaledJobs (#3694)

Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* fix(prometheus scaler): Detect Inf before casting float to int (#3762)

* fix(prometheus scaler): Detect Inf before casting float to int

Signed-off-by: Jorge Turrado <[email protected]>

* Improve the log message

Signed-off-by: Jorge Turrado <[email protected]>

Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* fix(nats-jetstream): correctly count messages that should be redelivered (waiting for ack) towards keda value (#3809)

* fix: keda now include the messages that should be retried in the count of pending messages used for scaling

Signed-off-by: Antoine Laffargue <[email protected]>

* chore: update changelog

Signed-off-by: Antoine Laffargue <[email protected]>

Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* NewRelic scaler crashes on logging (#3946)

Signed-off-by: Laszlo Kishalmi <[email protected]>

Signed-off-by: Laszlo Kishalmi <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* Fix stackdriver client returning 0 for metric types of double (#3788)

* Update stackdriver client to handle metrics of value type double

Signed-off-by: Eric Takemoto <[email protected]>

* move change log note to below general

Signed-off-by: Eric Takemoto <[email protected]>

* parse activation value as float64

Signed-off-by: Eric Takemoto <[email protected]>

* change target value to float64 for GCP pub/sub and stackdriver

Signed-off-by: Eric Takemoto <[email protected]>

Signed-off-by: Eric Takemoto <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* Fixing conflicts after cherry-pick

Signed-off-by: Pedro Tanaka <[email protected]>

* fix: Close is called twice on PushScaler's deletion (#3599)

Signed-off-by: ytz <[email protected]>
Signed-off-by: taenyang <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* fix/datadog-scaler-null-last-point (#3954)

Signed-off-by: Tony Lee <[email protected]>
Signed-off-by: Tony Lee <[email protected]>
Signed-off-by: Zbynek Roubalik <[email protected]>
Co-authored-by: Tony Lee <[email protected]>
Co-authored-by: Zbynek Roubalik <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>

* fix(mongodb): escape username and password (#3989)

Fixes #3992

Signed-off-by: Pedro Tanaka <[email protected]>

* Hacking generated files to version CI expects

Signed-off-by: Pedro Tanaka <[email protected]>

* Updating aws-sdk and golang packages to fix CVEs

Signed-off-by: Pedro Tanaka <[email protected]>

* Updating golang/text package to fix CVE

Signed-off-by: Pedro Tanaka <[email protected]>

* Using same version of aws sdk as in main

Signed-off-by: Pedro Tanaka <[email protected]>

Signed-off-by: Pedro Tanaka <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Antoine Laffargue <[email protected]>
Signed-off-by: Pedro Tanaka <[email protected]>
Signed-off-by: Laszlo Kishalmi <[email protected]>
Signed-off-by: Eric Takemoto <[email protected]>
Signed-off-by: ytz <[email protected]>
Signed-off-by: taenyang <[email protected]>
Signed-off-by: Tony Lee <[email protected]>
Signed-off-by: Tony Lee <[email protected]>
Signed-off-by: Zbynek Roubalik <[email protected]>
Co-authored-by: Jorge Turrado Ferrero <[email protected]>
Co-authored-by: Antoine LAFFARGUE <[email protected]>
Co-authored-by: Laszlo Kishalmi <[email protected]>
Co-authored-by: Eric Takemoto <[email protected]>
Co-authored-by: taenyang <[email protected]>
Co-authored-by: Tony Lee <[email protected]>
Co-authored-by: Tony Lee <[email protected]>
Co-authored-by: Zbynek Roubalik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NATS Jetstream] Do not take message retries into account when scaling
3 participants