-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fluentd v0.14: "Unexpected error raised. Stopping the timer." on heartbeat timeout #1596
Comments
Error said "Errno::ETIMEDOUT". You need to investigate why this error happen on your network environemnt. |
Experiencing the same issue, when in a HA setup the aggregator nodes are unavailable the error is raised and Timer is stopped.
|
Hmm... one way is ignoring diff --git a/lib/fluent/plugin/out_forward.rb b/lib/fluent/plugin/out_forward.rb
index 712447b..aa2b32c 100644
--- a/lib/fluent/plugin/out_forward.rb
+++ b/lib/fluent/plugin/out_forward.rb
@@ -402,7 +402,7 @@ module Fluent::Plugin
log.trace "sending heartbeat", host: n.host, port: n.port, heartbeat_type: @heartbeat_type
n.usock = @usock if @usock
n.send_heartbeat
- rescue Errno::EAGAIN, Errno::EWOULDBLOCK, Errno::EINTR, Errno::ECONNREFUSED
+ rescue Errno::EAGAIN, Errno::EWOULDBLOCK, Errno::EINTR, Errno::ECONNREFUSED, Errno::ETIMEDOUT
log.debug "failed to send heartbeat packet", host: n.host, port: n.port, heartbeat_type: @heartbeat_type, error: $!
end
} |
No, network errors are going to happen on any network. The bug I'm
reporting is that fluentd doesn't handle these errors correctly.
…On Mon, Jun 19, 2017, 05:14 Masahiro Nakagawa ***@***.***> wrote:
Error said "Errno::ETIMEDOUT". You need to investigate why this error
happen on your network environemnt.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1596 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABACdIxsVJvAe4l6up0HbLgm2ImETF_lks5sFkn4gaJpZM4N48cv>
.
|
Patch: #1602 |
Release v0.14.18 with the fix. |
That sounds good, but should any network error cause the heartbeat to stop? Even if I unplug a machine from the network entirely, and end up with Or, at least fluentd should exit with a non-zero exit code so Kubernetes / Docker can restart it for me. |
No, see patch. Continue to work.
Fluentd's worker exit with 1 for unexpected error / 2 for unrecoverable error, e.g. configuration error. |
It seems like a timeout during a heartbeat caused the
Unexpected error raised. Stopping the timer.
part, but rather than trying to reconnect to192.168.1.202
, it just keeps failing withno nodes are available
. I've checked that fluentd on192.168.1.202
is running and even restarted it, but the fluentd instance that is forwarding logs to it refuses to reconnect.config of the forwarder:
fluentd:v0.14
with image id860efe232d88
The text was updated successfully, but these errors were encountered: