-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
process of determining 'index' is not thread safe #326
Comments
Yes. This problem is similar to process wide conflict. |
I encountered this bug and consulted with tatsu-yama. |
Changing default value affects existing users. So need 2 steps for it.
|
I certainly forgot about the impact on existing users ;) |
Signed-off-by: Masahiro Nakagawa <[email protected]>
Patch for #327 |
Add warning for object conflict case. ref #326
The s3 plugin uses a default object key that is problematic in a few ways. 1. It makes HEAD requests for each chunk it uploads, starting from 1 each time. If you have uploaded 2000 log files within the same time slice, it will make 2001 HEAD requests to figure out if it exists. fluent/fluent-plugin-s3#160 2. The above check is not thread-safe, and two threads can race and decide to use the same `%{index}` value, with the loser of the race overwriting the chunk from the winner. fluent/fluent-plugin-s3#326 This is planned to change for v2, but there's no clear path to v2 right now. The plugin does warn already if you use multiple threads and don't use either `%{chunk_id}` or `%{uuid_hash}` in the object key.
The s3 plugin uses a default object key that is problematic in a few ways. 1. It makes HEAD requests for each chunk it uploads, starting from 1 each time. If you have uploaded 2000 log files within the same time slice, it will make 2001 HEAD requests to figure out if it exists. fluent/fluent-plugin-s3#160 2. The above check is not thread-safe, and two threads can race and decide to use the same `%{index}` value, with the loser of the race overwriting the chunk from the winner. fluent/fluent-plugin-s3#326 This is planned to change for v2, but there's no clear path to v2 right now. The plugin does warn already if you use multiple threads and don't use either `%{chunk_id}` or `%{uuid_hash}` in the object key.
The s3 plugin uses a default object key that is problematic in a few ways. 1. It makes HEAD requests for each chunk it uploads, starting from 1 each time. If you have uploaded 2000 log files within the same time slice, it will make 2001 HEAD requests to figure out if it exists. fluent/fluent-plugin-s3#160 2. The above check is not thread-safe, and two threads can race and decide to use the same `%{index}` value, with the loser of the race overwriting the chunk from the winner. fluent/fluent-plugin-s3#326 This is planned to change for v2, but there's no clear path to v2 right now. The plugin does warn already if you use multiple threads and don't use either `%{chunk_id}` or `%{uuid_hash}` in the object key.
As mentioned in a warning, as well as fluent#326 and fluent#160, the process of determining the index added to the default object key is not thread-safe. This adds some thread-safety until version 2.x is out where chunk_id is used instead of an index value. This is not a perfect implementation, since there can still be races between different workers if workers are enabled in fluentd, or if there are multiple fluentd instances uploading to the same bucket. This commit is just to resolve this problem short-term in a way that's backwards compatible.
As mentioned in a warning, as well as fluent#326 and fluent#160, the process of determining the index added to the default object key is not thread-safe. This adds some thread-safety until version 2.x is out where chunk_id is used instead of an index value. This is not a perfect implementation, since there can still be races between different workers if workers are enabled in fluentd, or if there are multiple fluentd instances uploading to the same bucket. This commit is just to resolve this problem short-term in a way that's backwards compatible.
As mentioned in a warning, as well as fluent#326 and fluent#160, the process of determining the index added to the default object key is not thread-safe. This adds some thread-safety until version 2.x is out where chunk_id is used instead of an index value. This is not a perfect implementation, since there can still be races between different workers if workers are enabled in fluentd, or if there are multiple fluentd instances uploading to the same bucket. This commit is just to resolve this problem short-term in a way that's backwards compatible. Signed-off-by: William Orr <[email protected]>
This issue has been automatically marked as stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 30 days |
This issue was automatically closed because of stale in 30 days |
In fluent-plugin-s3, if the value of flush_thread_count is greater than 1 , the data on S3 will be missing. I think that it is because
Fluent::Plugin::S3Output#write
method is not thread safe.td-agent.conf
Test data (
/tmp/td-agent-failure-sample/tmp/1500000.log
) was transferred to s3 with td-agent.As a result, the number of records in the original data and the data on s3 do not match. Of course, there are no errors in td-agent.log.
So, I made the following modifications to fluentd and fluent-plugin-s3 and transferred it.
The resulting td-agent.log.
Two threads uploaded to filename
20200512.0.dat.gz
. I think the process of determining%{index}
is not thread safe.I think using
uuid_flush
will probably work around this problem. However, since the default value ofs3_object_key_format
is"%{path}%{time_slice}_%{index}. %{file_extension}"
, so I think this will affect a lot of users.I think this issue is relevant.
#315
The text was updated successfully, but these errors were encountered: