Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: prevent wrongly unwatching with follow_inodes #4221

Closed

Conversation

daipom
Copy link
Contributor

@daipom daipom commented Jun 29, 2023

Which issue(s) this PR fixes:

What this PR does / why we need it:
We must not unwatch targets that still exist.
If unwatching an existing target, it causes log duplication.
This problem exists from v1.13.3 (the following fix).

This can occur when enabling follow_inodes.

The mechanism is as follows.

The TailWatcher for rotated old file can be newly created because the path is changed (the inode is the same). (file.log -> file.log.1)
If unwatching the inode before creating that new TailWatcher, it can't refer to the previous position and it reads logs in duplicate.

Docs Changes:
No need.

Release Note:
Same as the title.

How to Reproduce

Config:

<source>
  @type tail
  tag test
  path /test/fluentd/input4/test.log*
  pos_file /test/fluentd/pos/pos
  read_from_head true
  follow_inodes true
  refresh_interval 15s
  enable_stat_watcher false
  <parse>
    @type none
  </parse>
</source>

<match test.**>
  @type stdout
</match>
  • note about enable_stat_watcher
    • enable_stat_watcher false is for making sure to detect rotation.
      • I think this is problematic too... If stat_watcher notifies too quickly before the new current file is created, TailWatcher sets nil to it's rotate_handler. This causes some problems. (I will create an issue or PR about this later.)
      • This may have prevented this problem from occurring in a real environment. If in_tail can't detect the rotation of the current file because of this, then this problem does not occur. The focus is on whether the rotation of the current file can be detected when enable_stat_watcher is enabled. If not, then this can occur only when disabling enable_stat_watcher.

Files:

$ ls /test/fluentd/input4
test.log

$ cat /test/fluentd/input4/test.log
foooooaaa
Barrrrraaa

Script to rotate:

mv /test/fluentd/input4/test.log /test/fluentd/input4/test.log.1
echo "aaa" > /test/fluentd/input4/test.log

Run:

  • Run Fluentd
  • Run the script to rotate

Result:

2023-06-29 09:55:07 +0900 [info]: #0 starting fluentd worker pid=686256 ppid=686236 worker=0
2023-06-29 09:55:07 +0900 [info]: #0 following tail of /test/fluentd/input4/test.log
2023-06-29 09:55:07.232909021 +0900 test: {"message":"foooooaaa"}
2023-06-29 09:55:07.232911541 +0900 test: {"message":"Barrrrraaa"}
2023-06-29 09:55:07 +0900 [info]: #0 fluentd worker is now running worker=0
2023-06-29 09:55:12 +0900 [info]: #0 detected rotation of /test/fluentd/input4/test.log; waiting 5 seconds
2023-06-29 09:55:12 +0900 [info]: #0 following tail of /test/fluentd/input4/test.log
2023-06-29 09:55:12.234238467 +0900 test: {"message":"aaa"}
2023-06-29 09:55:22 +0900 [info]: #0 following tail of /test/fluentd/input4/test.log.1
2023-06-29 09:55:22.234924294 +0900 test: {"message":"foooooaaa"}
2023-06-29 09:55:22.234935622 +0900 test: {"message":"Barrrrraaa"}
  • The old rotated file was read in duplicate.
  • This occurs if detach is done before refresh_watcher.
    • It is because detach of update_watcher wrongly unwatches the existing inode.

We must not unwatch targets that still exist.
If unwatching an existing target, it causes log duplication.

The mechanism is as follows.

The TailWatcher for rotated old file can be newly created
because the path is changed (the inode is the same).
(file.log -> file.log.1)

If unwatching the inode before creating that new TailWatcher,
it can't refer to the previous position and it reads logs in
duplicate.

Signed-off-by: Daijiro Fukuda <[email protected]>
# (It is because the path of `@tails` will be overwritten here.)
# (It is not a big problem if we don't unwatch here, as long as we make
# sure to detach it. The position will be cleared on restart eventually.)
tail_watcher.unwatched = true unless expand_paths.key?(tail_watcher.ino)
Copy link
Member

@ashie ashie Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worry calling expand_paths here, because it's relatively high cost.
It might be too high cost on heavy rotated environment.
In addition, the file might be disappear soon just after calling expand_paths, as described in

# Even we just checked for existence, there is a race condition here as
# of which stat() might fail with ENOENT. See #3224.
begin
target_info = TargetInfo.new(path, Fluent::FileWrapper.stat(path).ino)
if @follow_inodes
hash[target_info.ino] = target_info
else
hash[target_info.path] = target_info
end
rescue Errno::ENOENT, Errno::EACCES => e
$log.warn "expand_paths: stat() for #{path} failed with #{e.class.name}. Skip file."
end

(Although it might not so big problem as you described in the comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review!

I'm worry calling expand_paths here, because it's relatively high cost.
It might be too high cost on heavy rotated environment.

I see.
I thought it would be okay since it was only called on rotation, but certainly, this is worrisome considering the case of so many files being rotated so often...

It may be better to cache the list in refresh_watcher and use it here...
However, I'm unsure if I should go this far for unwatching here.

(Although it might not so big problem as you described in the comment).

What we should be concerned about is that, without unwatching here, the memory and file for PositionFile would be larger.
(It will be cleared only when restarting. We can't use pos_file_compaction_interval to clear this because it doesn't check the target existence.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible fix:

  • A: Don't unwatch here.
  • B: Fix refresh_watcher to refresh PositionFile as well.
  • C: Cache the list in refresh_watcher and use it in update_watcher
  • D: Change the key of @tails to inode when enabling follow_inodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* A: Don't unwatch here.

We want to avoid this if possible since the memory and file for PositionFile would be larger.

* B: Fix `refresh_watcher` to refresh PositionFile as well.

It could be possible.

* C: Cache the list in `refresh_watcher` and use it in `update_watcher`

This is not good. It depends on the interval of refresh_watcher.

* D: Change the key of `@tails` to inode when enabling `follow_inodes`

It could be possible.
We should consider the following PR for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I close this PR since it should be fixed by B or D.
Are you OK for closing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I close this! Thanks!

@daipom
Copy link
Contributor Author

daipom commented Jul 7, 2023

We should do this in another way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants