-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@num_errors is not zero cleared after a successful retry. #1379
Comments
I fixed |
Hello, right now you are able to check real retry count for each plugin by installing fluentd from master branch. See #1387 |
Hi! @CSharpRU |
@tagomoris san |
I'm wondering whether we should re-fix |
I'm not sure whether I may want to +1 not to re-fix I also appreciate for more comments/feedbacks. |
p.s. |
I think that overall num of errors is useful too. |
My 2 cents here is that datadog integration breaks because it is using monitor agent. You can't recover an alert on retries because the metric never goes back to 0. |
I'm facing the same issue as mentioned by @wimnat. The alerts never recover because metric never goes back to 0 and the only option is to restart the process. Is there a reliable metric to know if there are errors at any given instance? PS: If |
I'm not sure datadog integration but if datadog integration uses in_monitor_agent's code directly, it should be fixed. https://docs.fluentd.org/v1.0/articles/in_monitor_agent#in-retry They can use
Internal code uses |
BTW, this issue is old and we go with current implementation. So closed. |
No one send the patch to datadog so I send a patch. |
Same issue for elasticsearch plugin (td-agent 3.7.1). "retry_count" is not reseting to zero after a successfull retry and monitoring system always throws alert (trigger alert if retry_count > 0). Manual service restart is not "production" solution, imho. |
I'm trying to monitor
retry_count
withmonitor_agent
plugin, and was working as expected in0.12.x
.However in
0.14.x
retry_count acts different. It just seems to count up and not geting zero cleared after a successful flush.retry_count
is actually the value of@num_errors
in output plugins.https://github.com/fluent/fluentd/blob/be7ffb2/lib/fluent/plugin/in_monitor_agent.rb#L259
In
0.12
code, @num_errors is implemented to be zero cleared after a successful retry.https://github.com/fluent/fluentd/blob/v0.12.31/lib/fluent/output.rb#L353
In
0.14
, there is no corresponding implementation.https://github.com/fluent/fluentd/blob/be7ffb2/lib/fluent/plugin/output.rb
Will this be the new behavior of
@num_errors
in0.14
? or shall we say it is a bug or a TODO to be fixed?The text was updated successfully, but these errors were encountered: