-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crashes cause bad chunks that fail to send #1638
Comments
Does this mean fluentd running server hit unexpected machine shutdown, right? |
Yes - the power cut off, so nothing had time to shut down. I don't know if it's the metadata file or the actual chunk file. However, it shouldn't be too hard to just omit whatever incomplete data was written to the end of the file, right? I don't expect fluentd to save data that was submitted between the last time data was successfully flushed to the file buffer and the crash. However, it shouldn't cause the entire chunk to fail and the error/warning that it prints when it encounters bad data should be better... If I didn't know about that power failure I would be really confused. |
I got this error again after running out of disk space on a node. |
I'm experiencing this issue as well. It's occurring on fluentd restart, not a machine crash. It doesn't happen consistently, but is happening frequently. |
@mchesler It means you hit disk space issue and it causes broken chunk? BTW, I will add backup feature to move bad chunks to other place. |
@repeatedly None of the machines where this has happened to me have com anywhere close to filling their disks, they’ve simply had the fluentd process restarted. |
Experiencing the same, the fluent processes were OOMKilled before. @repeatedly did you had any chance to work on the "move bad chunks to other place" feature or could you point me the code handling this? |
I am using Elasticsearch plugin to send data to the ES with file buffer. In case of any exception in indexing the document to elasticsearch, the chunk is retained for retry, but this chunk can never be successful. Because of multiple retry attempts happening, the other chunks are not processed causing consistency in data flow. |
+1 we experience the exact same issues as @kumaravel29 . The problem is actually so pervasive that we're in the process of ripping td-agent out of our infrastructure. |
FYI: since v1.2.0, https://github.com/fluent/fluentd/blob/master/CHANGELOG.md#release-v120---20180430 |
I had a power failure and upon restarting the server, this happened:
Given the fact that this hasn't happened in the last month, and the chunk that it's referencing
buffer.q5541970ab3b8c59d8f012cd3222cc93c.log
was created right around the time the power failed, I think that it is somehow corrupted by an incomplete write. This is a problem because the error message makes no sense and crashes shouldn't cause corruption / data loss for things that are already written to the disk.It doesn't even start sending the other chunks that aren't corrupted, I need to shut down fluentd, remove the bad chunk, and restart fluentd to get it to continue sending.
I can send you the bad chunk and meta file if you want.
The text was updated successfully, but these errors were encountered: