Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filebeat_up metric issue #44

Open
Nilubkal opened this issue Jan 5, 2021 · 1 comment
Open

filebeat_up metric issue #44

Nilubkal opened this issue Jan 5, 2021 · 1 comment

Comments

@Nilubkal
Copy link

Nilubkal commented Jan 5, 2021

Hi guys, first of all thanks for the great work and support that you've put to this project !

I just want to mention an issue that i witnessed by deploying the beat-exporter as a side container in a pod next to filebeat in a kubernetes environment :

If the filebeat pod which exports the metrics on the port 5066 is in a state different than CrashLoopBackOff - filbeat_up returns 0 - which is the expected behavior everything works fine.If the filebeat pod enters in a condition of a CrashLoopBackOff then beat-exporter doesn't register anything related to the pod hence filebeat_up is absent and all the metrics for this particular pod.
CrashLoopBackOff status of the filebeat pod - beat -exporter logs:

{"level":"error","message":"Could not load beat type, with error: Get http://localhost:5066: dial tcp 127.0.0.1:5066: connect: connection refused, retrying in 1s","time":"2021-01-05T09:58:45Z"}
{"level":"error","message":"Could not load beat type, with error: Get http://localhost:5066: dial tcp 127.0.0.1:5066: connect: connection refused, retrying in 1s","time":"2021-01-05T09:58:46Z"}

and here is the case when the POD is not in a CrashLoopBackOff / Error but in a different failed state and the filebeat_up is evaluated correctly to 0 :

{"level":"error","message":"Failed getting /stats endpoint of target: Get http://localhost:5066/stats: dial tcp 127.0.0.1:5066: connect: connection refused","time":"2021-01-05T09:59:04Z"}
{"level":"error","message":"Could not fetch stats endpoint of target: http://localhost:5066","time":"2021-01-05T09:59:25Z"}
{"level":"error","message":"Failed getting /stats endpoint of target: Get http://localhost:5066/stats: dial tcp 127.0.0.1:5066: connect: connection refused","time":"2021-01-05T09:59:25Z"}
{"level":"error","message":"Could not fetch stats endpoint of target: http://localhost:5066","time":"2021-01-05T09:59:34Z"}
@shivas
Copy link
Contributor

shivas commented Jan 22, 2021

Issue here, is that in first case ☝️ your beat never reached "ready" state, that is, beat-exporter doesn't know what type beat to expect. In second case, this looks like beat crashed after being healthy previously, That is beat-exporter managed to get type of beat, initialize itself against it and then returning 0 when beat is crashed and is not reachable.

I'm referring to this: https://github.com/trustpilot/beat-exporter/blob/master/main.go#L93 initialization loop, in one case beat-exporter is stuck in this loop, in another case it's past that loop and in main "proxy" loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants