Always lock chunks first to avoid deadlock #1721

abicky · 2017-10-20T06:44:26Z

This PR resolves #1549.

For example, deadlock occurs by the following steps before this commit:

input plugin thread receives multiple events
(metadata_and_data.size > 1)
input plugin thread processes the first metadata_and_data element
and aquires the lock of chunk1 (chunk.mon_exit in Buffer#write)
enqueue thread aquires the lock of buffer
(synchronize in Buffer#enqueue_chunk)
enqueue thread tries to aquire the lock of chunk1
(chunk.synchronize in Buffer#enqueue_chunk)
input plugin thread processes the second metadata_and_data element
and tries to aquire the lock of buffer (synchronize in Buffer#write_once)

All chunk classes which inherit Chunk have enqueued! method.

abicky · 2017-10-20T06:47:17Z

@repeatedly Could you review the change? I want to try the latest version (v0.14.22.rc2?) which includes this change in our production environment if it looks good to you.

repeatedly · 2017-10-20T07:56:33Z

Thanks! I will review it!

abicky · 2017-10-20T09:30:17Z

I found there were errors like unexpected error while checking flushed chunks. ignored. error_class=RuntimeError error="can't enqueue buffer file: path = /fluentd/log/..., error = 'closed stream'" and I'll investigate the cause 🙇

This commit resolves fluent#1549. For example, deadlock occurs by the following steps before this commit: 1. input plugin thread receives multiple events (metadata_and_data.size > 1) 2. input plugin thread processes the first metadata_and_data element and aquires the lock of chunk1 (chunk.mon_exit in Buffer#write) 3. enqueue thread aquires the lock of buffer (synchronize in Buffer#enqueue_chunk) 4. enqueue thread tries to aquire the lock of buffer (chunk.synchronize in Buffer#enqueue_chunk) 5. input plugin thread processes the second metadata_and_data element and tries to aquire the lock of buffer (synchronize in Buffer#write_once)

@buffer

In flush threads @buffer could call destructive methods of chunks in @Queue without chunk lock, so buffer lock is required.

abicky · 2017-10-21T07:25:57Z

I fixed "closed stream" error in 1a6f16b.

@repeatedly Please review. I've checked the behavior in our development environment and it works fine. I want your review before I try it in our production environment.
Some tests failed in Travis CI, but these tests seem to sometimes fails also in master branch, aren't they?

repeatedly · 2017-10-23T05:48:57Z

lib/fluent/plugin/buffer.rb

-            end
+        if block_given?
+          synchronize{ @stage.keys }.each do |metadata|
+            chunk = @stage[metadata]


No need synchronized guard here?

Hmm... deleting metadata from @stage happens in only enqueue_chunk so it seems safe in CRuby.

I think data race doesn't occur in CRuby thanks to GIL and synchronized is not necessary, but I'll change the code like below for other Ruby implementations:

diff --git a/lib/fluent/plugin/buffer.rb b/lib/fluent/plugin/buffer.rb index 337b3b79..61a904ed 100644 --- a/lib/fluent/plugin/buffer.rb +++ b/lib/fluent/plugin/buffer.rb @@ -351,11 +351,15 @@ module Fluent def enqueue_all log.trace "enqueueing all chunks in buffer", instance: self.object_id if block_given? - synchronize{ @stage.keys }.each do |metadata| - chunk = @stage[metadata] - next unless chunk - v = yield metadata, chunk - enqueue_chunk(metadata) if v + metadata_array = [] + synchronize do + @stage.each do |metadata, chunk| + v = yield metadata, chunk + metadata_array << metadata if v + end + end + metadata_array.each do |metadata| + enqueue_chunk(metadata) end else synchronize{ @stage.keys }.each do |metadata|

Yes and currently fluentd focus on CRuby so I think code change is not needed for now.
But adding comment is better for the future.

Okey, I added a comment 2261428

repeatedly · 2017-10-23T06:38:48Z

I re-run failed tests and all tests are passed.
So your patch doesn't break existing expected behaviour.

cf. fluent#1721 (comment)

Refactoring: Remove unnecessary conditions

e845897

All chunk classes which inherit Chunk have enqueued! method.

abicky mentioned this pull request Oct 20, 2017

Fluentd 0.14.11 deadlocked in file output #1549

Closed

repeatedly self-requested a review October 20, 2017 07:46

repeatedly self-assigned this Oct 20, 2017

repeatedly added bug Something isn't working v0.14 labels Oct 20, 2017

abicky changed the title ~~Always lock chunks first to avoid deadlock~~ [WIP] Always lock chunks first to avoid deadlock Oct 20, 2017

abicky force-pushed the hotfix/lock-chunks-first branch from 93088bf to 723038c Compare October 20, 2017 13:38

Process chunks with buffer lock in enqueue_chunk

1a6f16b

In flush threads @buffer could call destructive methods of chunks in @Queue without chunk lock, so buffer lock is required.

abicky force-pushed the hotfix/lock-chunks-first branch from 723038c to 1a6f16b Compare October 21, 2017 06:54

abicky changed the title ~~[WIP] Always lock chunks first to avoid deadlock~~ Always lock chunks first to avoid deadlock Oct 21, 2017

repeatedly reviewed Oct 23, 2017

View reviewed changes

abicky added a commit to abicky/fluentd that referenced this pull request Oct 23, 2017

Lock the buffer during selecting metadata

990cbea

cf. fluent#1721 (comment)

[ci skip] Add a comment

2261428

repeatedly merged commit 0848224 into fluent:master Oct 23, 2017

abicky deleted the hotfix/lock-chunks-first branch October 24, 2017 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always lock chunks first to avoid deadlock #1721

Always lock chunks first to avoid deadlock #1721

abicky commented Oct 20, 2017 •

edited

Loading

abicky commented Oct 20, 2017

repeatedly commented Oct 20, 2017

abicky commented Oct 20, 2017

abicky commented Oct 21, 2017

repeatedly Oct 23, 2017

repeatedly Oct 23, 2017 •

edited

Loading

abicky Oct 23, 2017

repeatedly Oct 23, 2017 •

edited

Loading

abicky Oct 23, 2017

repeatedly commented Oct 23, 2017

Always lock chunks first to avoid deadlock #1721

Always lock chunks first to avoid deadlock #1721

Conversation

abicky commented Oct 20, 2017 • edited Loading

abicky commented Oct 20, 2017

repeatedly commented Oct 20, 2017

abicky commented Oct 20, 2017

abicky commented Oct 21, 2017

repeatedly Oct 23, 2017

Choose a reason for hiding this comment

repeatedly Oct 23, 2017 • edited Loading

Choose a reason for hiding this comment

abicky Oct 23, 2017

Choose a reason for hiding this comment

repeatedly Oct 23, 2017 • edited Loading

Choose a reason for hiding this comment

abicky Oct 23, 2017

Choose a reason for hiding this comment

repeatedly commented Oct 23, 2017

abicky commented Oct 20, 2017 •

edited

Loading

repeatedly Oct 23, 2017 •

edited

Loading

repeatedly Oct 23, 2017 •

edited

Loading