-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Datadog scaler] Scaler returns fillValue if the last data point of query is null #3906
Comments
Hi @dogzzdogzz , |
BTW, I closed the issue by error, and I have reopened it |
@JorTurFer Hmm...It's strange that you can not reproduce, It's happening on all of my clusters, to clarify the reproducible issue, just wanna double check if you do have metric And could you kindly help to check if anything wrong in my scaledobject manifests to cause this issue ? apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels:
scaledobject.keda.sh/name: query-1
annotations:
meta.helm.sh/release-name: foo
meta.helm.sh/release-namespace: bar
labels:
app: foo
app.kubernetes.io/managed-by: Helm
helm.toolkit.fluxcd.io/name: foo
helm.toolkit.fluxcd.io/namespace: bar
scaledobject.keda.sh/name: foo
name: query-1
namespace: bar
spec:
maxReplicaCount: 1
minReplicaCount: 1
scaleTargetRef:
name: foo
triggers:
- authenticationRef:
kind: ClusterTriggerAuthentication
name: keda-trigger-auth-datadog-secret
metadata:
age: "60"
metricUnavailableValue: "0"
query: 'sum:trace.express.request.hits{service:foo,env:bar}.as_rate()'
queryValue: "250"
metricType: AverageValue
type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels:
scaledobject.keda.sh/name: query-2
name: query-2
namespace: bar
spec:
maxReplicaCount: 1
minReplicaCount: 1
scaleTargetRef:
name: foo
triggers:
- authenticationRef:
kind: ClusterTriggerAuthentication
name: keda-trigger-auth-datadog-secret
metadata:
age: "60"
metricUnavailableValue: "0"
query: 'avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
queryValue: "250"
metricType: AverageValue
type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels:
scaledobject.keda.sh/name: query-3
name: query-3
namespace: bar
spec:
maxReplicaCount: 1
minReplicaCount: 1
scaleTargetRef:
name: foo
triggers:
- authenticationRef:
kind: ClusterTriggerAuthentication
name: keda-trigger-auth-datadog-secret
metadata:
age: "60"
metricUnavailableValue: "0"
query: 'sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
queryValue: "250"
metricType: AverageValue
type: datadog
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels:
scaledobject.keda.sh/name: query-4
name: query-4
namespace: bar
spec:
maxReplicaCount: 1
minReplicaCount: 1
scaleTargetRef:
name: foo
triggers:
- authenticationRef:
kind: ClusterTriggerAuthentication
name: keda-trigger-auth-datadog-secret
metadata:
age: "60"
metricUnavailableValue: "0"
query: 'sum:trace.express.request.hits{service:foo,env:bar}/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)'
queryValue: "250"
metricType: AverageValue
type: datadog Below is the HPA status of above manifests, you can see that query-1/2/4 can get the data without problem, but the metric of query-3 is always 0 because of
|
No no, I tried just modifying the client to return the json you sent. I haven't tried with datadog directly because I don't have anything working with datadog. If you could share all the manifest to spin up an scenario to reproduce your issue, I can use our datadog account to try it. (Sorry, I have 0 expertise with datadog and IDK how to generate the same scenario) I can install datadog agent on my cluster, and then what do I need to deploy for generating those metrics? |
I remember that the original issue was a panic recovering metrics, not just a fallback to 0. Is this time the same issue? I mean, a fallback to 0 could mean for example that the time window is too small to recover metrics. It's not the same that behavior that a panic in the scaler |
Oh ok,
I think it's not the same as issue 3448 because I already used rollup(10) to avoid the null data point |
@JorTurFer I updated my curl script and make sure the time of TO=$(($(date +%s))) && \
FROM=$(($END - 60)) && \
curl -X GET "https://api.datadoghq.com/api/v1/query?from=$FROM&to=$TO&query=sum:trace.express.request.hits\{service:foo,env:bar\}.as_rate()/avg:kubernetes.cpu.requests\{service:foo,env:bar\}.rollup(10)" Response {
"status": "ok",
"resp_version": 1,
"series": [
{
"end": 1670221209000,
"attributes": {},
"metric": "(trace.express.request.hits / kubernetes.cpu.requests)",
"interval": 10,
"tag_set": [],
"start": 1670221160000,
"length": 5,
"query_index": 0,
"aggr": "sum",
"scope": "env:bar,service:foo",
"pointlist": [
[
1670221160000,
2276.6667175541334
],
[
1670221170000,
2467.5000551529242
],
[
1670221180000,
2144.1667145925276
],
[
1670221190000,
null
],
[
1670221200000,
null
]
],
"expression": "(sum:trace.express.request.hits{env:bar,service:foo}.as_rate() / avg:kubernetes.cpu.requests{env:bar,service:foo}.rollup(10))",
"unit": null,
"display_name": "(trace.express.request.hits / kubernetes.cpu.requests)"
}
],
"to_date": 1670221212000,
"query": "sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)",
"message": "",
"res_type": "time_series",
"times": [],
"from_date": 1670221152000,
"group_by": [],
"values": []
} |
For anyone encounters this issue as well. there are some additional issues and details with Datadog API response mentioned in this comment |
Report
I reported similar issue 3448 before, the root cause was null latest data point to cause the exception. Recently I finally have time to test the query again. This time, I make sure all data points have data and with the same number of data points within the period
Query 1:
sum:trace.express.request.hits{service:foo,env:bar}.as_rate()
, keda can get the metrics of this query without any problemQuery 2:
avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)
, keda can get the metrics of this query without any problem```If I combine above two queries into "query 1 / query 2"
Query 3:
sum:trace.express.request.hits{service:foo,env:bar}.as_rate()/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)
The curl result looks ok but KEDA is failed to get the metrics
HPA events
If I remove
as_rate()
from Query 3Query 4:
sum:trace.express.request.hits{service:foo,env:bar}/avg:kubernetes.cpu.requests{service:foo,env:bar}.rollup(10)
KEDA can get the metrics without any problem
Expected Behavior
KEDA can get the metrics without any problem as long as it is working with curl for same query
Actual Behavior
Explained above
Steps to Reproduce the Problem
Explained above
Logs from KEDA operator
KEDA Version
2.8.1
Kubernetes Version
< 1.23
Platform
Amazon Web Services
Scaler Details
Datadog
Anything else?
No response
The text was updated successfully, but these errors were encountered: