-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keda-operator memory leak when prometheus scaler having errors #5248
Comments
Latest KEDA version has the option for enabling the profile port. |
Same behavior using KEDA v2.9.2, we'd need the memory dump to check the root cause |
@GoaMind Thanks for reporting! Could you please also share an example Deployment for the workload that you are scaling? @JorTurFer do you have the same configuration of ScaledObjects? |
IDK, I hope so, spec:
cooldownPeriod: 1
maxReplicaCount: 2
minReplicaCount: 0
pollingInterval: 1
scaleTargetRef:
name: prometheus-test-deployment
triggers:
- metadata:
activationThreshold: '20'
metricName: http_requests_total
query: >-
sum(rate(http_requests_total{app="prometheus-test-monitored-app"}[2m]))
serverAddress: http://20.238.174.237
threshold: '20'
type: prometheus @GoaMind , Could you confirm that this is similar to yours? The IP is public (and mine) so you can try the ScaledObject in your cluster if you want |
That would be great! |
Hey, Also here is the behaviour with Memory consumption points: To verify that reflection is correct, I also checked on K8s side, at some point:
And after OOMKill:
If describing the pod:
I will put a scaler provided by @JorTurFer to monitor if Keda will behave the same. |
@GoaMind great, thanks for the update! |
I have deployed trigger proposed by @JorTurFer ) and there was not any memory leak visible. But, after changing |
@GoaMind what happens if you put there random IP that returns 404s? |
Could you share an example of the random DNS? I'd like to replicate your case closest as possible |
I was a bit wrong that it is reproducible with random DNS. In fact I was able to reproduce only when calling our internal prometheus server that is not available from outside. I have tried to configure my personal server Here is what I checked so far (but without luck)Checked full internal prometheus server response, that causes memory leak in
Full specs for spec:
advanced:
horizontalPodAutoscalerConfig:
behavior:
scaleUp:
stabilizationWindowSeconds: 10
scalingModifiers: {}
cooldownPeriod: 1
maxReplicaCount: 2
minReplicaCount: 0
pollingInterval: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: onboarding-debug-service
triggers:
- metadata:
activationThreshold: "20"
query: sum(rate(http_requests_total{app="prometheus-test-monitored-app"}[2m]))
serverAddress: https://XXXXXXX # With https://kedacore-test.hdo.ee/test it is not reproducible for now
threshold: "20"
type: prometheus To replicate internal server response with
But no luck, despite response have only difference in hosts, IPs and cipher I will try to play with profiling in next few days and will drop an update once anything is figured out. |
We'd appreciate a profile as we can go deeper on the issue. |
@JorTurFer thank you for information, I will check it shortly. Could you please check if you can reproduce it with this trigger: - metadata:
activationThreshold: "20"
customHeaders: Host=abc.pipedrive.tools
query: sum(rate(http_requests_total{app="prometheus-test-monitored-app"}[2m]))
serverAddress: https://pimp.pipedrive.tools
threshold: "20"
type: prometheus Key thing here is that you need to get |
Let me check your trigger :) |
Thank you for prompt checking. 🙇 |
I think that I've found a possible problem. I'll draft a PR later on, but before merging it, would you be willing to test the fix if I build an image that contains it @GoaMind ? |
Hey, sure I can test it once the docker image is available. If it is possible go this way. |
Yeah, I'm preparing the PR and once I open it, I'll give you the docker tag 😄 |
@JorTurFer apologise for late reply. I can confirm that I do not observe memory leaking with your provided image. |
Nice! |
Report
When prometheus scaler having errors while fetching metrics, memory starts to grow on
keda-operator
until it gets OOMKilled.Behaviour as follows:
Installation of Keda is done via plain manifest: https://github.com/kedacore/keda/releases/download/v2.11.2/keda-2.11.2.yaml
Expected Behavior
Memory is not growing when any of scalers have errors
Actual Behavior
Memory is growing, when prometheus scaler having errors (example fetch metrics from prometheus)
Steps to Reproduce the Problem
keda-operator
will start pushing Errors in stderrmemory
usage will start to growLogs from KEDA operator
KEDA Version
2.12.1
Kubernetes Version
1.26
Platform
Amazon Web Services
Scaler Details
prometheus
Anything else?
No response
The text was updated successfully, but these errors were encountered: