Prevent thread switching in the interval between seek and write operations to pos_file #2118
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We're trying to migrate to Fluentd for transferring web-server logs to HDFS and I can't understand why we are the first who face this issue.
After deploying to one server with production load (read_from_head true) we restarted td-agent and after a while we found duplicated log records in HDFS storage. A quick examination led us to a strange inconsistency in pos_file like this line:
/data/htlogs/00000000b2ebab5f/data/htlogs/nginx/w11009.log 00000000029ba830 000000000000601a9/data00000000000332a15875.log 0000000000000000 0000000000421e96
I added debug log and found the mixing of seek and write operations from different threads. The first thread was updating log files current position (set file.pos, write) and the second thread was adding new records to pos_file (set file.pos, write, write, read file.pos, write, read file.pos).
Also, I added a warning for unparsable lines in pos_file.