Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple errors cause backend to exit #5

Open
dimas opened this issue Nov 21, 2017 · 2 comments
Open

Multiple errors cause backend to exit #5

dimas opened this issue Nov 21, 2017 · 2 comments

Comments

@dimas
Copy link

dimas commented Nov 21, 2017

Short version of the story is that we are using RabbitMQ server and from time to time it loses all connections to all clients and these events accompanied with exceptions that have references to lager_logstash_backend.

I have evidences of different exceptions all but all of them seem to cause exit of logstash backend.

2017-11-16 01:54:56.774 [error] <0.152.0> Lager event handler lager_logstash_backend exited with reason {'EXIT',{{badmatch,{error,closed}},[{lager_logstash_backend,send_log,2,[{file,"src/lager_logstash_backend.erl"},{line,125}]},{lager_logstash_backend,handle_event,2,[{file,"src/lager_logstash_backend.erl"},{line,104}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{line,533}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,515}]},{gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,256}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}}
2017-11-16 01:54:56.774 [error] <0.145.0>@lager_logstash_backend:send_log:125 gen_event lager_logstash_backend installed in lager_event terminated with reason: no match of right hand value {error,closed} in lager_logstash_backend:send_log/2 line 125

or

2017-11-16 12:13:40.650 [error] <0.152.0> Lager event handler lager_logstash_backend exited with reason {'EXIT',{badarg,[{jsx_parser,value,4,[{file,"src/jsx_parser.erl"},{line,139}]},{lager_logstash_json_formatter,format,2,[{file,"src/lager_logstash_json_formatter.erl"},{line,39}]},{lager_logstash_backend,handle_log,2,[{file,"src/lager_logstash_backend.erl"},{line,116}]},{lager_logstash_backend,handle_event,2,[{file,"src/lager_logstash_backend.erl"},{line,104}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{line,533}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,515}]},{gen_event,...},...]}}

We are using It is lager-3.5.1, jsx-2.7.0 and lager_logstash-0.1.3

Raised it with RabbitMQ (https://groups.google.com/forum/#!topic/rabbitmq-users/dK7r10t9jYI) because it does not feel right that any issue with logging subsystem should have such a big impact on normal operations.

Please see the lager issue I raised with logstash - erlang-lager/lager#423 - there is a very interesting discussion there.

@Vagabond
Copy link

I would suggest adding {send_timeout, 500}, {send_timeout_close, true} on https://github.com/rpt/lager_logstash/blob/master/src/lager_logstash_backend.erl#L75

Additionally, this backend doesn't handle the {tcp_closed, Sock} or the '{tcp_error, Socket, Reason}` messages, it probably should.

Here's an (untested) diff:

diff --git a/src/lager_logstash_backend.erl b/src/lager_logstash_backend.erl
index 059317a..57a79ad 100644
--- a/src/lager_logstash_backend.erl
+++ b/src/lager_logstash_backend.erl
@@ -72,7 +72,8 @@ arg(Name, Args, Default) ->
     end.

 connect({tcp, Host, Port}) ->
-    Opts = [binary, {active, false}, {keepalive, true}],
+    Opts = [binary, {active, false}, {keepalive, true},
+            {send_timeout, 500}, {send_timeout_close, true}],
     case gen_tcp:connect(Host, Port, Opts) of
         {ok, Socket} -> Socket;
         {error, _}   -> undefined
@@ -128,6 +129,10 @@ send_log(Payload, #state{output = {udp, Host, Port}, handle = Socket}) ->
 send_log(Payload, #state{output = {file, _}, handle = Fd}) ->
     ok = file:write(Fd, Payload).

+handle_info({tcp_closed, Socket}, #state{handle = Socket}) ->
+    {error, tcp_closed};
+handle_info({tcp_error, Socket, Reason}, #state{handle = Socket}) ->
+    {error, {tcp_error, Reason}};
 handle_info(_Info, State) ->
     {ok, State}.

Lager will reinitialize the handler if it crashes like this, rather than hanging on a dead TCP connection where the keepalive timeout has not fired yet.

@dimas
Copy link
Author

dimas commented Nov 22, 2017

Thanks a lot, @Vagabond! We will try patching as you suggested and create a PR or a fork if it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants