-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluentd not picking new log files #3239
Comments
We are facing the some issue, after upgrading to 1.12. Not reading some file. |
Doesn't it reproduce with v1.11.x ? @indrajithgihan @pawankkamboj |
Recently test_rotate_file_with_open_on_every_update sometimes (often?) fails: https://travis-ci.org/github/fluent/fluentd/jobs/759131293 |
@ashie @repeatedly 2021-02-16 10:08:05 +0545 [info]: #0 Timeout flush: ms-logs-application:default |
Had the same issue here. And I found that fluentd stopping pick log randomly after deployment rolling update, and restart fluentd helps. Version 1.12 |
Confirmed we were seeing this with 1.12.1 across many different kubernetes clusters. Rolling back to 1.11.x has fixed it |
We are facing the same issue using fluentd version 1.12.1 on Red Hat Enterprise Linux 7.9 with kernel version: 3.10.0-1160.2.2.el7.x86_64 and running dockerd with It seems that the positions in the pos filefor some pods randomly are stuck at the max file size of ~20MiB:
If we follow the symbolic link:
and check the inode and the file size:
the inode of Input configuration in
|
can you guys provide info about your settings in the nodes:
|
|
root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_watches
fs.inotify.max_user_watches = 524288
root@fluentd-2656g:/fluentd# sysctl fs.inotify.max_user_instances
fs.inotify.max_user_instances = 8192
root@fluentd-2656g:/fluentd# |
Although I'm not sure all of your problems are same or not, #3274 or #3224 or #3292 may be the same issue. They are already fixed in master branch (#3275 and #3294) but not released yet. |
sysctl fs.inotify.max_user_instancesfs.inotify.max_user_instances = 128 sysctl fs.inotify.max_user_watchesfs.inotify.max_user_watches = 8192 I had the same issue with v1.11.5 as well. I am using fluntd-concat plugin and observed high cpu usage within fluentd pod with lots of timeout flushes in the log. Could this be an issue for not detecting new log files? |
@ashie I think the issues may be related, if someone leaves sysctl inotify parameters on default values for given operating system and has a lot of containers per node (like, hitting limits per instance + pods come and go/pods with mutliple sidecars) then the issue of not tailing logs may be also triggered, though AFAIR there should be another error message in that case. We had that problems with different logging shipper - promtail. |
@ashie we rolled back fluentd to 1.11.5 yesterday and so far haven't encountered any issues. |
We also had to roll back to 1.11.x due to this error. 🙁 |
It seems that some different problems are mixed in this issue. reproduces with both 1.11 and 1.12We'll continue to investigate it in this issue 1.12 specificProbably it will be resolved by 1.12.2 (we'll release it next week). @joshbranham @snorwin @TomasKohout TBD (probably 1.12 specific?)Rolling back to 1.11 may resolve your issue. |
Yes, it is working fine with version 1.11.
…On Thu, 25 Mar 2021, 08:03 Takuro Ashie, ***@***.***> wrote:
It seems that some different problems are mixed in this issue.
The original report by @indrajithgihan <https://github.com/indrajithgihan>
isn't 1.12 specific, so we don't treat 1.12 specific problems in this issue.
reproduces with both 1.11 and 1.12
Continue to investigate it in this issue
@indrajithgihan <https://github.com/indrajithgihan> @nvtkaszpir
<https://github.com/nvtkaszpir>
1.12 specific
Probably it will be resolved by 1.12.2 (we'll release it next week).
Please file a new issue if your problem sitll reproduces with 1.12.2.
@joshbranham <https://github.com/joshbranham> @snorwin
<https://github.com/snorwin> @TomasKohout <https://github.com/TomasKohout>
TBD (probably 1.12 specific?)
Rolling back to 1.11 may resolve your issue.
@pawankkamboj <https://github.com/pawankkamboj> @hulucc
<https://github.com/hulucc>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3239 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3YS4KXJJPXM6EDDTKVCJDTFKOHJANCNFSM4WXWNY5Q>
.
|
Can this be closed now? Has anyone tested with 1.12.2 to confirm If nobody can report back, today we build and test as well. |
We receive no 1.12 specific bug reports yet after releasing 1.12.2.
We don't close this issue yet because:
Probably the original comment is another issue that exists from past and rarely causes. |
@ashie we receive reports from internal user managing many nodes that Any guidelines for minimum kernel version and what dependencies are required for fluentd v1.12.2? |
Again, we don't treat 1.12 specific problems anymore at this issue. |
I remove bug label because we don't yet identify the cause of the original report of this issue. |
I'm having this issue in 1.11.5 running on k8s, the issue only seems to occur (or it might just be a coincidence) after a few weeks of uptime and only on nodes that start temporary pods frequently. Could this be caused by large .pos files or has that been ruled out already ? I've enabled compacting now but it might be a few weeks before i can know if it helped. |
@ashie , yes this is not a 1.12.2 specific problem. We verified yesterday. It is reproducible with 1.11.2 and 1.12.2 for us. We tried both of those version on top of nodes with Ubuntu 20.04 kernel 5.4 . This is why I am asking if this problem is related to dependencies or kernel versions. |
@Cryptophobia We don't have influential clues for this issue yet, still some different issues may be mixed. If you have an environment which doesn't reproduce this issue, please let me know the difference of them as detailed as possible. |
#3357 might be a same issue. It seems cause by a big log file (350MB)
|
Hmm, I've confirmed that in_tail plugin cannot run |
We've not encountered the issue since apr 9th when we enabled compacting of the position file. (so ~30 days uptime on fluentd without issues) |
Good. |
Probably #2478 is a same issue with this. |
I don't know disabling the stat/inotify watcher matters, (It was one of the first things we tried and we've never reverted it even though it didn't solve the issue, atleast not on its own) Also if this is an issue with large files in general and not just large pos files it might not help for everyone. (we have a lot of short lived containers so our pos file got significantly larger than our log files, and we're still monitoring things since we're not sure this helped or if we've just been lucky for the past month) |
Usage: https://github.com/fluent/fluentd-docs-gitbook/pull/259/files |
@ashie Does it require to disable stat watchers to use this feature? enable_stat_watcher false |
No, enabling it is also supported. |
Effective of this feature is confirmed at #3423. |
Check CONTRIBUTING guideline first and here is the list to help us investigate the problem.
Describe the bug
I have a situation where fluentd running as a daemonset in kubernetes cluster not picking new log files and this happens randomly. Sometimes fluentd restart works. Here is my config. Not seeing the app.log.pos file is being updated either. Appreciate if somebody can help me on this
To Reproduce
Run fluentd as a daemonset in K8 cluster and create lof gile directory /data/logs and under multiple subdirectories logs will be generated by pods.
Expected behavior
Fluentd shold be able to pick new log files and update the app.log.pos file.
Your Environment
If you hit the problem with older fluentd version, try latest version first.
Your Configuration
Your Error Log
Additional context
The text was updated successfully, but these errors were encountered: