Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File output plugin cannot resume with path placeholders #2582

Closed
cflee opened this issue Aug 20, 2019 · 2 comments · Fixed by #2594
Closed

File output plugin cannot resume with path placeholders #2582

cflee opened this issue Aug 20, 2019 · 2 comments · Fixed by #2594
Assignees
Labels
bug Something isn't working

Comments

@cflee
Copy link

cflee commented Aug 20, 2019

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug

When using file output plugin with path containing placeholders like ${key1}, and default file buffer plugin without explicitly configuring path, fluentd does not resume existing buffers on start. Therefore, events that are buffered persistently are missing from the flushed file output, and there are accumulated buffer files that are never removed.

To Reproduce

Using config with file output plugin path containing placeholders like ${log_source_address}, start fluentd, and ensure that at least one buffer file at /fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M/buffer.*.log is present. Check the file to see which messages are inside.

After that, restart fluentd and then keep it running. After timekey (2m), check the number and last modified timestamps of buffer files in the buffer path /fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M/, and check the flushed output file for previous timekey to see if the messages inside the buffer file earlier are present in its appropriate file.

Expected behavior

Fluentd should resume from the buffered events, so that the flushed output file contains the events that were buffered before fluentd was restarted. Also, there should not be any old orphaned buffer files that are kept forever.

Instead, observed behaviour is that the buffered events from before restart are missing from the flushed final file, and there are orphaned files that are more than timekey old remaining inside the buffer directory forever.

Your Environment

  • Fluentd or td-agent version: fluentd 1.6.3
  • Operating system: using the official Debian docker image (fluent/fluentd:v1.6-debian = sha256:9576074ac4c0f72099c94f56746ce68ee37ba7021257653565e0a9c200d8fe83)
  • Kernel version: 4.15.0-54-generic

Your Configuration

<system>
  log_level debug
</system>

<source>
  @type dummy
  tag dummy
  dummy {"log_source_address":"1.2.3.4","message":"abc"}
</source>

<match fluent.**>
  @type stdout
</match>

<match **>
  @type file
  path /fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M
  #path /fluentd/log/forward_%Y-%m-%dT%H:%M
  append true
  compress gzip
  <buffer time,log_source_address>
  #<buffer>
    timekey 2m
    timekey_wait 0
  </buffer>
</match>

Your Error Log

2019-08-20 20:21:33 +0800 [info]: using configuration file: <ROOT>
  <system>
    log_level debug
  </system>
  <source>
    @type dummy
    tag "dummy"
    dummy {"log_source_address":"1.2.3.4","message":"abc"}
  </source>
  <match fluent.**>
    @type stdout
  </match>
  <match **>
    @type file
    path "/fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M"
    append true
    compress gzip
    <buffer time,log_source_address>
      timekey 2m
      timekey_wait 0
      path "/fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M"
    </buffer>
  </match>
</ROOT>
2019-08-20 20:21:33 +0800 [info]: starting fluentd-1.6.3 pid=6 ruby="2.6.3"
2019-08-20 20:21:33 +0800 [info]: spawn command to main:  cmdline=["/usr/local/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/local/bundle/bin/fluentd", "-c", "/fluentd/etc/fluent.conf", "-p", "/fluentd/plugins", "--under-supervisor"]
2019-08-20 20:21:33 +0800 [info]: gem 'fluentd' version '1.6.3'
2019-08-20 20:21:33 +0800 [info]: adding match pattern="fluent.**" type="stdout"
2019-08-20 20:21:33 +0800 [info]: adding match pattern="**" type="file"
2019-08-20 20:21:33 +0800 [info]: adding source type="dummy"
2019-08-20 20:21:33 +0800 [warn]: #0 both of Plugin @id and path for <storage> are not specified. Using on-memory store.
2019-08-20 20:21:33 +0800 [warn]: #0 both of Plugin @id and path for <storage> are not specified. Using on-memory store.
2019-08-20 20:21:33 +0800 [info]: #0 starting fluentd worker pid=13 ppid=6 worker=0
2019-08-20 20:21:33 +0800 [debug]: #0 buffer started instance=70263954869520 stage_size=0 queue_size=0
2019-08-20 20:21:33 +0800 [info]: #0 fluentd worker is now running worker=0
2019-08-20 20:21:33 +0800 [debug]: #0 flush_thread actually running
2019-08-20 20:21:33 +0800 [debug]: #0 enqueue_thread actually running
2019-08-20 20:21:33.798741502 +0800 fluent.info: {"worker":0,"message":"fluentd worker is now running worker=0"}
2019-08-20 20:21:33.798991161 +0800 fluent.debug: {"message":"flush_thread actually running"}
2019-08-20 20:21:33.800278236 +0800 fluent.debug: {"message":"enqueue_thread actually running"}
2019-08-20 20:21:34 +0800 [debug]: #0 Created new chunk chunk_id="5908b7f4a97beecdc426128ff7da2744" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303600, tag=nil, variables={:log_source_address=>"1.2.3.4"}>
2019-08-20 20:21:34.003534921 +0800 fluent.debug: {"chunk_id":"5908b7f4a97beecdc426128ff7da2744","metadata":"#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303600, tag=nil, variables={:log_source_address=>\"1.2.3.4\"}>","message":"Created new chunk chunk_id=\"5908b7f4a97beecdc426128ff7da2744\" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303600, tag=nil, variables={:log_source_address=>\"1.2.3.4\"}>"}
2019-08-20 20:22:00 +0800 [debug]: #0 Created new chunk chunk_id="5908b80d8804200f521045aacaa9530c" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303720, tag=nil, variables={:log_source_address=>"1.2.3.4"}>
2019-08-20 20:22:00.080503264 +0800 fluent.debug: {"chunk_id":"5908b80d8804200f521045aacaa9530c","metadata":"#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303720, tag=nil, variables={:log_source_address=>\"1.2.3.4\"}>","message":"Created new chunk chunk_id=\"5908b80d8804200f521045aacaa9530c\" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=1566303720, tag=nil, variables={:log_source_address=>\"1.2.3.4\"}>"}

Additional context

For example, when fluentd is first started with empty buffer dir, these buffer files are seen:

-rw-r--r-- 1 999 docker  972 Aug 20 20:20 buffer.b5908b7bc65b58119a238a839fd5c819f.log
-rw-r--r-- 1 999 docker   95 Aug 20 20:20 buffer.b5908b7bc65b58119a238a839fd5c819f.log.meta

After fluentd is restarted, these buffer files are seen:

-rw-r--r-- 1 999 docker 4455 Aug 20 20:21 buffer.b5908b7bc65b58119a238a839fd5c819f.log
-rw-r--r-- 1 999 docker   95 Aug 20 20:21 buffer.b5908b7bc65b58119a238a839fd5c819f.log.meta
-rw-r--r-- 1 999 docker  486 Aug 20 20:21 buffer.b5908b7f4a97beecdc426128ff7da2744.log
-rw-r--r-- 1 999 docker   95 Aug 20 20:21 buffer.b5908b7f4a97beecdc426128ff7da2744.log.meta

and more buffer files are created and retained forever every time fluentd is restarted:

-rw-r--r-- 1 999 docker 1232 Aug 20 19:11 buffer.b5908a8488e9e9872d07618224e57d9a6.log
-rw-r--r-- 1 999 docker   99 Aug 20 19:11 buffer.b5908a8488e9e9872d07618224e57d9a6.log.meta
-rw-r--r-- 1 999 docker  616 Aug 20 19:36 buffer.b5908adc7399d1b1e72639886afcf3618.log
-rw-r--r-- 1 999 docker   99 Aug 20 19:36 buffer.b5908adc7399d1b1e72639886afcf3618.log.meta
-rw-r--r-- 1 999 docker  704 Aug 20 19:38 buffer.b5908ae39aaf16b5e68ab53544b8c1847.log
-rw-r--r-- 1 999 docker   99 Aug 20 19:38 buffer.b5908ae39aaf16b5e68ab53544b8c1847.log.meta
-rw-r--r-- 1 999 docker 5104 Aug 20 20:00 buffer.b5908b3248422abed0a8a20adb5ded43f.log
-rw-r--r-- 1 999 docker   99 Aug 20 20:00 buffer.b5908b3248422abed0a8a20adb5ded43f.log.meta
-rw-r--r-- 1 999 docker 4455 Aug 20 20:21 buffer.b5908b7bc65b58119a238a839fd5c819f.log
-rw-r--r-- 1 999 docker   95 Aug 20 20:21 buffer.b5908b7bc65b58119a238a839fd5c819f.log.meta
-rw-r--r-- 1 999 docker  486 Aug 20 20:21 buffer.b5908b7f4a97beecdc426128ff7da2744.log
-rw-r--r-- 1 999 docker   95 Aug 20 20:21 buffer.b5908b7f4a97beecdc426128ff7da2744.log.meta

By inserting my own debug statement into buf_file.rb, confirm that the path passed to Dir.glob() is /fluentd/log/forward_${log_source_address}_%Y-%m-%dT%H:%M/buffer.*.log. It seems that {} is special character to Dir.glob(), so there are no files returned to be resumed, which explains why there is no debug log restoring buffer file: path = ....

@cflee cflee added the bug Something isn't working label Aug 20, 2019
@repeatedly
Copy link
Member

Good catch. I confirmed this behaviour. buffer path should work with placeholders.
Workaround is set path in <buffer> to avoid placeholder based path.

@repeatedly
Copy link
Member

Patch: #2594

repeatedly added a commit that referenced this issue Sep 2, 2019
buf_file/buf_file_single: fix to ignore placeholder based path. fix #2582
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants