Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drops event after spooler is blocked for a certain time #3091

Closed
wants to merge 1 commit into from

Commits on Dec 1, 2016

  1. Drops event after spooler is blocked for a certain time

    In case Logstash or Elasticsearch is not available, the filebeat output gets stuck. This has the side effect that also the close_* options do not apply anymore. Because of this files are kept open. Under Linux file rotation continues but the disk space is still used. Under Windows this can mean that file rotation works and a single file gets very large.
    
    The goal of this change is to keep the client node sane in case the output is blocked. Filebeat follows the at least once principle and tries to make sure all log lines arrive by the send whenever possible. Using `drop_after` breaks this principle and will lead to data loss!
    
    When setting `drop_after` to 5 minutes for example, events start to be dropped after the output is not available for 5 minutes. It will drop all events until the output is available again. As soon as the output becomes available again, the last batch in the publisher will be sent and it continues sending all new events which arrive from now in the log files. All events between the batch which was still in the publisher and the first event which is sent again are lost.
    
    The registry file will not be updated when the events are dropped. So in case filebeat is stopped and started again, it will continue at the position where it was before the output was not available. As soon as the output becomes available again, it will update the registry file. But for files where no new events appear, the old position stays in the registry itself.
    
    Dropping all events has the advantage, that when the publisher starts sending events again, it will not overwhelm the received with all the queue events.
    
    This implementation is different form turning guaranteed of or using udp in the way that events only start to be dropped after a certain time. If the output is only blocked for a time < drop_after all events will still be sent by filebeat and not events are dropped. `drop_after` is only an "emergency switch" in case the output is not available for a longer timer.
    
    Alternative implementation
    
    1. Drop only event older then
    
    An alternative implementation could be do only drop events where the timestamp is older then the predefined period. The advantage of this would be that not necessarly all events are dropped until the output becomes available again, but only the oldest one. This implementation is a little bit more complex and I don't it is needed.
    
    2. Drop events in publisher
    
    Instead of dropping events in the spooler, events could be dropped in the publisher. Advantage of having it in the publisher is that the registry file would also be updated, so no resending of events would happen on restart.
    
    Questions
    * Should the registry also be updated so no resending happens? -> publisher implementation needed
    
    TOOD:
    * Add tests
    * Add stats for events which were dropped
    * Add docs
    ruflin committed Dec 1, 2016
    Configuration menu
    Copy the full SHA
    e5472a4 View commit details
    Browse the repository at this point in the history