You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Env & Setting
We are using fluentd in_tail plugin to parse log from pods running on k8s.
k8s log rotation policy: latest log will always write log file with same name, let's say 'container.log'. When file reach to certain size and need rotate, k8s will rename the original one to 'container.backup.log' as backup (inode keep the same) and create a new file still named 'container.log' (different inode)
In our env, follow_inode is enabled.
Issue Behavior
When log file rotated, fluentd in_tail sometimes no longer track new file change, no offset change in pos file although new entrypoint is created with no inode number.
RC Analysis
After update the in_tail.rb to print more troubleshooting logs, we locate the potential RC, which is a racing issue.
When a file is rotated, expectedly only old watcher for rotated file should be detach. But in some case, both old and newly created watcher are detached.
Here's the racing description:
File rotated: previous log file 'container.log' is renamed to ''container.backup.log'' (inode_old), new 'container.log' created (inode_new)
refresh_watchers regularly triggered, mark old watcher for inode_old to be detach and create a new watcher for inode_new (@tail[path] now point to new watcher)
old watcher on_rotate triggers update_watcher as watcher_needs_update = true.
in update_watcher, "rotated_tw = @Tails[path]" will return the new watcher to be detached because tw with same path has been updated to step 2.
After rotate_wait, both old and new watchers are detached in step 2 and 4, thus new watcher stuck
The main problem is in #4, when try to detach a watcher, need to determine whether the rotated_tw is still the old inode, which can be detach, or already refresh to new inode, which need skip detach (detached by other event).
Thank you for describing the cause in detail, it's very clear for me!
It's indeed one of the cause of in_tail's stall issue we are discussing in #3614
It should be fixed.
thanks @ashie! I'm also testing that fix with heavy log load to see any other stuck issue.
From code perspective, it is ideal (when follow_inode, tracking tails by inode number but not path).
A minor concern is this also add additionally 3 "if @follow_inode else", feeling we finally need a refactor one day to avoid the code full of this condition :)
Describe the bug
Env & Setting
We are using fluentd in_tail plugin to parse log from pods running on k8s.
k8s log rotation policy: latest log will always write log file with same name, let's say 'container.log'. When file reach to certain size and need rotate, k8s will rename the original one to 'container.backup.log' as backup (inode keep the same) and create a new file still named 'container.log' (different inode)
In our env, follow_inode is enabled.
Issue Behavior
When log file rotated, fluentd in_tail sometimes no longer track new file change, no offset change in pos file although new entrypoint is created with no inode number.
RC Analysis
After update the in_tail.rb to print more troubleshooting logs, we locate the potential RC, which is a racing issue.
When a file is rotated, expectedly only old watcher for rotated file should be detach. But in some case, both old and newly created watcher are detached.
Here's the racing description:
The main problem is in #4, when try to detach a watcher, need to determine whether the rotated_tw is still the old inode, which can be detach, or already refresh to new inode, which need skip detach (detached by other event).
To Reproduce
Here's the detail trace log to repro the racing issue
2023-05-31 03:09:08 +0000 [info]: #0 stop_watchers path="<log_path>" ino=<old_ino>
2023-05-31 03:09:08 +0000 [info]: #0 detach_watcher_after_rotate_wait path="<log_path>" ino=<old_ino>
2023-05-31 03:09:08 +0000 [info]: #0 construct_watcher path="<log_path>"
2023-05-31 03:09:08 +0000 [info]: #0 construct_watcher2 path="<log_path>" ino=<new_ino>
2023-05-31 03:09:08 +0000 [info]: #0 setup_watcher path="<log_path>" ino=<new_ino> pe_inode=<new_ino>
2023-05-31 03:09:08 +0000 [info]: #0 TailWatcher initialize path="<log_path>" ino=<new_ino>
2023-05-31 03:09:08 +0000 [info]: #0 on_rotate path="<log_path>" ino=<new_ino>
2023-05-31 03:09:08 +0000 [info]: #0 following tail of <log_path>
2023-05-31 03:09:08 +0000 [info]: #0 on_rotate path="<log_path>" ino=<old_ino>
2023-05-31 03:09:08 +0000 [info]: #0 watcher_needs_update path="<log_path>" ino=<old_ino>
2023-05-31 03:09:08 +0000 [info]: #0 detected rotation of <log_path> ino <old_ino> -> <new_ino>; waiting 5.0 seconds
2023-05-31 03:09:08 +0000 [info]: #0 rotated_tw <log_path> <new_ino> //unexpected
2023-05-31 03:09:08 +0000 [info]: #0 new_position_entry ino=<new_ino>
2023-05-31 03:09:08 +0000 [info]: #0 detach_watcher_after_rotate_wait path="<log_path>" ino=<old_ino>
2023-05-31 03:09:13 +0000 [info]: #0 detach_watcher path="<log_path>" tw_ino=<old_ino> expect_ino=<old_ino>
2023-05-31 03:09:13 +0000 [info]: #0 detaching a watcher path="<log_path>" ino=<old_ino>
2023-05-31 03:09:13 +0000 [info]: #0 detach_watcher path="<log_path>" tw_ino=<new_ino> expect_ino=<old_ino>
2023-05-31 03:09:13 +0000 [warn]: #0 detaching a watcher path="<log_path>" ino=<new_ino> //unexpected
Here's the detail trace log of normal rotation case:
2023-05-31 03:26:01 +0000 [info]: #0 on_rotate path="<log_path>" ino=<old_ino>
2023-05-31 03:26:01 +0000 [info]: #0 watcher_needs_update path="<log_path>" ino=<old_ino>
2023-05-31 03:26:01 +0000 [info]: #0 detected rotation of <log_path> ino <old_ino> -> <new_ino>; waiting 5.0 seconds
2023-05-31 03:26:01 +0000 [info]: #0 rotated_tw <log_path> <old_ino> //expected
2023-05-31 03:26:01 +0000 [info]: #0 new_position_entry ino=0
2023-05-31 03:26:01 +0000 [info]: #0 setup_watcher path="<log_path>" ino=<new_ino> pe_inode=0
2023-05-31 03:26:01 +0000 [info]: #0 TailWatcher initialize path="<log_path>" ino=<new_ino>
2023-05-31 03:26:01 +0000 [info]: #0 on_rotate path="<log_path>" ino=<new_ino>
2023-05-31 03:26:01 +0000 [info]: #0 following tail of <log_path>
2023-05-31 03:26:01 +0000 [info]: #0 detach_watcher_after_rotate_wait path="<log_path>" ino=<old_ino>
2023-05-31 03:26:06 +0000 [info]: #0 detach_watcher path="<log_path>" tw_ino=<old_ino> expect_ino=<old_ino>
2023-05-31 03:26:06 +0000 [info]: #0 detaching a watcher path="<log_path>" ino=<old_ino> //expected
Expected behavior
No log stuck, new watcher can keep monitor new file change
Your Environment
Your Configuration
Your Error Log
Additional context
No response
The text was updated successfully, but these errors were encountered: