-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat stops sending logs when load balancing is enabled #1829
Comments
After some debugging, we have discovered what seems to be a deadlock in load balancer mode. This occurs when Logstash is rejecting connections due to some error condition, e.g. Elasticsearch being down, as described in the first commant At balance.go#L167 each worker reads retry attempts from When At this point everything is blocked as each worker is blocking at balance.go#L170, waiting for I noticed that the first select in |
I didn't check it in detail yet but that sounds like related to one of these? |
@ruflin Not sure, I can try reproducing with master. |
First select has no timeout on purpose. As events/messages are not allowed to be dropped. It's used if Guaranteed flag or max_retries<0 is set. One potential problem I see with code (receive method) is beats not enforcing any order of queue reads. The I will try to reproduce the issue with an unit test. Thanks for very detailed debugging information. Helps a ton identifying these kind of issues. Current workaround in filebeat is: disable |
@urso Thanks, we'll try the workaround and see how it goes. |
Thanks for the suggested workaround @urso. I've tried it in our environment and it looks acceptable both in terms of throughput and the fact that reconnect seems to work properly. |
I have a basic elastic setup where logs are harvested and published to logstash/elasticsearch using filebeat.
On several occasions I have observed that no logs are sent even though everything looks ok on elasticsearch and logstash ends.
The log pipeline will only return to a good state by restarting the filebeat process (restarting logstash or elasticsearch has no effect).
We have seen this issue in correlation with a network issue on or reboot of the elasticsearch nodes. (logstash logs "Beats input: the pipeline is blocked, temporary refusing new connection.")
After some experimenting I have been able to reproduce the issue:
This seems to be related to or even the same issue as #878
I have tested this with filebeat versions 1.2.3 and 5.0.0-alpha3. Below outputs is with filebeat 1.2.3.
The filebeat config:
Logstash logs:
Filebeat logs:
The text was updated successfully, but these errors were encountered: