many "Client.Timeout Exceeded while waiting header" on keda operator log #3610

tshaiman · 2022-08-29T20:08:22Z

Report

when Keda pod launches it starts normally and manage to reconcile all queues and all definition correctly.
there are lot of "healthy" info report .

then after few mintues the log is floaded with "Client.Timeout exceeeded while waiting header".

Expected Behavior

we want to understand what this error represent
we want to see how and where those timeout can be configured
we want to understand why a "healthy" reconcile object turned into timeout error

Actual Behavior

those errors do not seem to effect the scale up mechanism , as it seems to continue working
we dont want to see so many error logs without the ability to understand what is the issue and how to fine-tune the timeouts

Steps to Reproduce the Problem

nothing special we just deployed keda on a relative large cluster with many namespaces ,and each namespace has 10-13 queues
it could be the developer has created TriggerAutheticationObject Per Queue instead per namespace which caused around 130 TriggerAuthentication object to create -> could that be the problem here ?
Cluster is configured with Managed Identity Support (aad-pod-identity) not with Workload Identity
issue happens both on keda 2.7.1 and 2.8.0

Logs from KEDA operator

2022-08-29T19:28:24Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"fc","namespace":"vi-be-map-dev6"}, "namespace": "vi-be-map-dev6", "name": "fc", "reconcileID": "e1df9a0e-a6bd-4558-bc4e-c07c94aed5d0"}
2022-08-29T19:28:24Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"tm","namespace":"vi-be-map-dev12"}, "namespace": "vi-be-map-dev12", "name": "tm", "reconcileID": "184cd058-cee7-4975-840b-3f58b8903305"}
2022-08-29T19:28:24Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"cb","namespace":"vi-be-map-dev6"}, "namespace": "vi-be-map-dev6", "name": "cb", "reconcileID": "ce4c60c5-2772-4be6-a902-785b50af41fa"}
2022-08-29T19:28:25Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"dp","namespace":"vi-be-map-dev8"}, "namespace": "vi-be-map-dev8", "name": "dp", "reconcileID": "8f3d16c4-6589-44f0-90ee-881eba5aa0d1"}
2022-08-29T19:32:09Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"aed","namespace":"vi-be-map-dev8"}, "namespace": "vi-be-map-dev8", "name": "aed", "reconcileID": "24b8af3c-c2c4-49c7-ac8b-77fe6123b73c"}
2022-08-29T19:32:09Z	INFO	Initializing Scaling logic according to ScaledObject Specification	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"aed","namespace":"vi-be-map-dev8"}, "namespace": "vi-be-map-dev8", "name": "aed", "reconcileID": "24b8af3c-c2c4-49c7-ac8b-77fe6123b73c"}
2022-08-29T19:32:10Z	INFO	scaleexecutor	Successfully set ScaleTarget replicas count to ScaledObject minReplicaCount	{"scaledobject.Name": "aed", "scaledObject.Namespace": "vi-be-map-dev8", "scaleTarget.Name": "vi-aed", "Original Replicas Count": 1, "New Replicas Count": 0}
2022-08-29T19:32:21Z	ERROR	azure_servicebus_scaler	error	{"type": "ScaledObject", "namespace": "vi-be-map-dev6", "name": "rc", "error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fservicebus.azure.net%2F\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledObjectActive
	/workspace/pkg/scaling/cache/scalers_cache.go:89
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:278
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:149
2022-08-29T19:32:23Z	INFO	Reconciling ScaledObject	{"controller": "scaledobject", "controllerGroup": "keda.sh", "controllerKind": "ScaledObject", "scaledObject": {"name":"aed","namespace":"vi-be-map-dev8"}, "namespace": "vi-be-map-dev8", "name": "aed", "reconcileID": "25e7b3df-cf95-42ef-b170-05fe46187017"}
2022-08-29T19:32:28Z	ERROR	azure_servicebus_scaler	error	{"type": "ScaledObject", "namespace": "vi-be-map-dev12", "name": "frameextraction", "error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fservicebus.azure.net%2F\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledObjectActive
	/workspace/pkg/scaling/cache/scalers_cache.go:89
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:278
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:149
2022-08-29T19:32:29Z	ERROR	azure_servicebus_scaler	error	{"type": "ScaledObject", "namespace": "vi-be-map-dev9", "name": "frameextraction", "error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fservicebus.azure.net%2F\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).IsScaledObjectActive
	/workspace/pkg/scaling/cache/scalers_cache.go:89
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).checkScalers
	/workspace/pkg/scaling/scale_handler.go:278
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).startScaleLoop
	/workspace/pkg/scaling/scale_handler.go:149

KEDA Version

2.7.1

Kubernetes Version

1.23

Platform

Microsoft Azure

Scaler Details

Azure Service Bus

Anything else?

happens both on 2.7.1 and 2.8.0 cluster
happens only on relative big clusters
No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2022-08-30T10:29:49Z

The url seems to be the aad-pod-identity instance. Could you check the logs you have there?
Did you have the same issue with previous versions?

tshaiman · 2022-08-30T11:44:28Z

@JorTurFer : that make sense as there is a limitation of max 20 calls concurrently to the IMDS , and we are doing much more.
will move to workload identity soon ;-)

JorTurFer · 2022-08-30T12:31:20Z

so, could this issue be more related with that than KEDA itself? I mean, do you think is KEDA related? We can keep this open till you move from pod identity to workload identity, but just to know if we need to go deeper or not

tshaiman · 2022-08-30T12:38:29Z

I think the ask from the Keda team is maybe to catch this exception and either ignore it if the 2nd / 3rd attempt succeeded
(in other words : don't throw if the retry mechanism is working )
and also try to be more verbose : what is the timeout , on which component/scaler , how can we configure longer timeouts etc.

tomkerkhove · 2022-08-30T13:02:38Z

We should also check if there is a way to optimize the integration to reduce amount of calls, what do you think?

JorTurFer · 2022-08-30T13:04:20Z

I don't think that we should ignore the error because we don't retry it in the same cycle. The reconciliation loop fails and it will be retried, but not only the request, the whole reconciliation loop. I mean, we don't know inside the cycle if another cycle will be executed, we could assume it, but we are not 100%, so we cannot trust on future executions.

WRT timeouts, this uses the default timeout for every HTTP request inside KEDA (3 seconds). This value (and how to change it) is reflected in the docs , you can modify it just setting the environment variable KEDA_HTTP_DEFAULT_TIMEOUT or modifying the helm value if you use helm

Finally, related with the verbosity, the log already prints the scaler type (azure_servicebus_scaler), the SO (frameextraction) where it has happened and the namespace (vi-be-map-dev9).

ERROR	azure_servicebus_scale error	{"type": "ScaledObject", "namespace": "vi-be-map-dev9", "name": "frameextraction", "error": "Get \"http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fservicebus.azure.net%2F\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

What do you think we could append to be clearer? I mean, the timeout is already known, but maybe something like error getting aad token: CURRENT MESSAGE could point better to the current issue? 🤔

tshaiman · 2022-08-30T13:06:45Z

yes that was my thoughts as the go stack is almost meaningless.
thanks for pointing out the ENV Variable to increase timeout ,I will Definity try it .
and yes, I do see the log is quite verbose , thank you

JorTurFer · 2022-08-30T13:06:48Z

We should also check if there is a way to optimize the integration to reduce amount of calls, what do you think?

@zroubalik and I have been talking recently about how to apply this. It'll improve amount of calls for sure, but maybe we can try to improve it more, IDK

JorTurFer · 2022-08-30T13:09:35Z

do you think that adding something like error getting aad-pod-identity token: in the error message would help to detect this in the future? It's a simple thing to do if it helps

tomkerkhove · 2022-08-30T14:57:22Z

I think that would be helpful, yes

tshaiman added the bug Something isn't working label Aug 29, 2022

tomkerkhove removed the bug Something isn't working label Aug 30, 2022

JorTurFer self-assigned this Sep 1, 2022

JorTurFer mentioned this issue Sep 1, 2022

chore: improve aad-pod-identity errors #3640

Merged

2 tasks

JorTurFer closed this as completed in #3640 Sep 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

many "Client.Timeout Exceeded while waiting header" on keda operator log #3610

many "Client.Timeout Exceeded while waiting header" on keda operator log #3610

tshaiman commented Aug 29, 2022 •

edited

Loading

JorTurFer commented Aug 30, 2022 •

edited

Loading

tshaiman commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

tshaiman commented Aug 30, 2022

tomkerkhove commented Aug 30, 2022

JorTurFer commented Aug 30, 2022 •

edited

Loading

tshaiman commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

tomkerkhove commented Aug 30, 2022

many "Client.Timeout Exceeded while waiting header" on keda operator log #3610

many "Client.Timeout Exceeded while waiting header" on keda operator log #3610

Comments

tshaiman commented Aug 29, 2022 • edited Loading

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Aug 30, 2022 • edited Loading

tshaiman commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

tshaiman commented Aug 30, 2022

tomkerkhove commented Aug 30, 2022

JorTurFer commented Aug 30, 2022 • edited Loading

tshaiman commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

JorTurFer commented Aug 30, 2022

tomkerkhove commented Aug 30, 2022

tshaiman commented Aug 29, 2022 •

edited

Loading

JorTurFer commented Aug 30, 2022 •

edited

Loading

JorTurFer commented Aug 30, 2022 •

edited

Loading