You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened: When kubelet returns a 401 or a 500, the timer doesn't recover from the exception and stops reporting kubernetes metrics
What you expected to happen: The metrics should still be publishing on schedule when an error occurs
How to reproduce it (as minimally and precisely as possible):
stop kubelet for some some seconds on the kubernetes node and start it again ```systemctl stop kubelet && sleep 30 && systemctl start kubelet``
see errors in the splunk-metrics-splunk-kubernetes-metrics where exception is thrown and timer is detached
Anything else we need to know?:
This issue is similar to splunk/splunk-connect-for-kubernetes#493.
It had been fixed in that ticket by adding an healthcheck on the pod, but the right solution would be for the fluent plugin to recover from that exception in the http client.
Environment:
Kubernetes version (use kubectl version): v1.27.13
Ruby version (use ruby --version): 2.6.10p210
OS (e.g: cat /etc/os-release): RHEL 9.2
Splunk version: splunk-connect-for-kubernetes 1.5.4 and fluent-plugin-kubernetes-metrics 1.2.3
Others:
The text was updated successfully, but these errors were encountered:
What happened: When kubelet returns a 401 or a 500, the timer doesn't recover from the exception and stops reporting kubernetes metrics
What you expected to happen: The metrics should still be publishing on schedule when an error occurs
How to reproduce it (as minimally and precisely as possible):
stop kubelet for some some seconds on the kubernetes node and start it again ```systemctl stop kubelet && sleep 30 && systemctl start kubelet``
see errors in the splunk-metrics-splunk-kubernetes-metrics where exception is thrown and timer is detached
Anything else we need to know?:
This issue is similar to splunk/splunk-connect-for-kubernetes#493.
It had been fixed in that ticket by adding an healthcheck on the pod, but the right solution would be for the fluent plugin to recover from that exception in the http client.
Environment:
kubectl version
): v1.27.13ruby --version
): 2.6.10p210cat /etc/os-release
): RHEL 9.2The text was updated successfully, but these errors were encountered: