Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage size computation in buffer plugin is not thread safe #2712

Closed
hlakshmi opened this issue Dec 3, 2019 · 3 comments
Closed

stage size computation in buffer plugin is not thread safe #2712

hlakshmi opened this issue Dec 3, 2019 · 3 comments

Comments

@hlakshmi
Copy link

hlakshmi commented Dec 3, 2019

Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.

Describe the bug
The computation of stage_size in buffer plugin is not thread safe. This is the line which can cause synchronization issues because the after the value staged_bytesize is computed, a chunk could be enqueued by the enqueue thread causing incorrect computation of stage_size

To Reproduce

Expected behavior

Your Environment

  • Fluentd or td-agent version: 1.4.1, 1.6.3
  • Operating system: CentOS 7
  • Kernel version: 3.10.0-1062.4.3.el7.x86_64

If you hit the problem with older fluentd version, try latest version first.

Your Configuration

<system>
# process_name fluentd-data
# log_level info
  log_level debug
# log_level trace
  suppress_repeated_stacktrace
</system>

#
# datastore pipeline
#
<source>
  @type monitor_agent
  bind 0.0.0.0
  port 25220
</source>

<source>
  @type forward
  port 25224
</source>

<match fluent.**>
# @type stdout
  @type null
</match>

#
# references.*
# flows.*
#
<match *.*>
  @type file
  path /xx/xxx/../${tag[1]}/${tag[0]}/%Y/%m/%d/%H
  append false
  compress gzip
  <buffer tag,time>
    @type file
    path <path-to-buffer>
    flush_mode interval
    flush_interval 10m
    flush_at_shutdown false
    timekey 86400
    timekey_use_utc
    chunk_limit_size 10mb
    chunk_limit_records 100000
    flush_thread_count 5
  </buffer>
  <format>
    @type json
  </format>
</match>

Your Error Log
As the computation of stage_size is not thread safe, it results in BufferOverflowError after a while even though the actual buffer directory has not reached the limit.

3602  2019-11-09 14:10:29 +0000 [warn]: failed to write data into buffer by buffer overflow action=:throw_exception
    1 2019-11-09 14:10:29 +0000 [warn]: emit transaction failed: error_class=Fluent::Plugin::Buffer::BufferOverflowError error="buffer space has too many data" location="/var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/bu
    2   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/buffer.rb:269:in `write'
    3   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:954:in `block in handle_stream_with_custom_format'
    4   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:880:in `write_guard'
    5   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:953:in `handle_stream_with_custom_format'
    6   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:872:in `execute_chunking'
    7   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/output.rb:793:in `emit_buffered'
    8   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/event_router.rb:96:in `emit_stream'
    9   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:300:in `on_message'
   10   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:211:in `block in handle_connection'
   11   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:248:in `block (3 levels) in read_messages'
   12   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `feed_each'
   13   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:247:in `block (2 levels) in read_messages'
   14   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin/in_forward.rb:256:in `block in read_messages'
   15   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/server.rb:612:in `on_read_without_connection'
   16   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:123:in `on_readable'
   17   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/io.rb:186:in `on_readable'
   18   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run_once'
   19   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/cool.io-1.5.3/lib/cool.io/loop.rb:88:in `run'
   20   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
   21   2019-11-09 14:10:29 +0000 [warn]: /var/external/lib/ruby/gems/2.4.0/gems/fluentd-1.4.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'

Additional context
Here is the snippet of the monitoring agent output and the disk usage of buffer directory

curl http://127.0.0.1:25220/api/plugins.json
{"plugins":[{"plugin_id":"object:c626f4","plugin_category":"input","type":"monitor_agent","config":{"@type":"monitor_agent","bind":"0.0.0.0","port":"25220"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:ca2c40","plugin_category":"input","type":"forward","config":{"@type":"forward","port":"25224"},"output_plugin":false,"retry_count":null},{"plugin_id":"object:47a018","plugin_category":"output","type":"null","config":{"@type":"null"},"output_plugin":true,"retry_count":0,"retry":{}},{"plugin_id":"object:819890","plugin_category":"output","type":"file","config":{"@type":"file","path":"/var/lib/datastore/fluentd/${tag[1]}/${tag[0]}/%Y/%m/%d/%H","append":"false","compress":"gzip"},"output_plugin":true,"buffer_queue_length":0,"buffer_total_queued_size":418693164,"retry_count":0,"retry":{}}]}

Disk Utilization of buffer directory:

du -sh /var/lib/datastore/fluentd/buffer
93M       /var/lib/datastore/fluentd/buffer

As you can see from the above, the monitoring agent says the buffer_total_queued_size is ~400M but the actual utilization on disk is only 93M

@hlakshmi
Copy link
Author

hlakshmi commented Dec 3, 2019

The fix here is to keep the adding_bytesize for each chunk separately and add the size to stage_size only if the chunk is in staged state.

@cosmo0920
Copy link
Contributor

#2734 is already merged.
Can we close this issue, @ganmacs ?

@ganmacs
Copy link
Member

ganmacs commented Jan 24, 2020

you're right. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants