Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[in_tail] when watched files are deleted, logged Error: undefined method `pe=' for #<Fluent::TailInput::TailWatcher::NullIOHandler #920

Closed
tako774 opened this issue Apr 27, 2016 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@tako774
Copy link

tako774 commented Apr 27, 2016

On Windows, in_tail plugin causes error when watched files are deleted.
Error message is like below.

undefined method `pe=' for #<Fluent::TailInput::TailWatcher::NullIOHandler:::0x0000000XXXXXXXX>"`

(I have not tested on Linux, so I can't say it is OK on Linux...)

Steps to Reproduce

  1. start fluentd with using in_tail
  2. make a watched file
  3. write "hoge" to the file and save
  4. delete file on Windows Explorer

fluentd or td-agent version.

both of them

  • fluentd.0.14.0.pre.1 revision: @c874f9521b0679c8cd4f1853002dcab6b422c454 (clone on Apr 27)
  • fluentd.0.14.0.pre.1 revision: @aa9b87d931be2b014a86c5aa87ecf0bdddef7351 (clone on Mar 7)
  • it doesn't reproduce with 0.14.0.pre.1 from rubygems.org

Environment information, e.g. OS.

  • OS: Windows 2012
  • Ruby: 2.2.4 x64 (Rubyinstaller)

Configuration and Logs

It is at revision @c874f9521b0679c8cd4f1853002dcab6b422c454 log.

2016-04-27 14:23:27 +0900 [info]: reading config file path=".\\fluent.conf"
2016-04-27 14:23:27 +0900 [info]: starting fluentd-0.14.0.pre.1
2016-04-27 14:23:27 +0900 [info]: spawn command to main (windows) : C:/app/Ruby22-x64/bin/ruby.exe 'C:/app/Ruby22-x64/bin/fluentd' -c .\fluent.conf -o fluentd.log --no-supervisor
2016-04-27 14:23:28 +0900 [info]: reading config file path=".\\fluent.conf"
2016-04-27 14:23:28 +0900 [info]: starting fluentd-0.14.0.pre.1 without supervision
2016-04-27 14:23:28 +0900 [debug]: install_main_process_winsigint_handler***** installed main winsiginthandler
2016-04-27 14:23:28 +0900 [info]: gem 'fluentd' version '0.14.0.pre.1'
2016-04-27 14:23:28 +0900 [info]: adding match pattern="**" type="stdout"
2016-04-27 14:23:28 +0900 [info]: adding source type="tail"
2016-04-27 14:23:28 +0900 [info]: using configuration file: <ROOT>
  <system>
    log_level debug
  </system>
  <source>
    @type tail
    @id tail
    path "C:\\app\\fluentd\\test.log"
    tag "test.log"
    format none
    pos_file "test.pos"
    read_from_head true
  </source>
  <match **>
    @type stdout
  </match>
</ROOT>
2016-04-27 14:23:35 +0900 [info]: following tail of C:\app\fluentd\test.log
2016-04-27 14:23:35 +0900 fluent.info: {"message":"following tail of C:\\app\\fluentd\\test.log"}
2016-04-27 14:23:40 +0900 test.log: {"message":"hoge"}
2016-04-27 14:23:44 +0900 [info]: detected rotation of C:\app\fluentd\test.log; waiting 5 seconds
2016-04-27 14:23:44 +0900 fluent.info: {"message":"detected rotation of C:\\app\\fluentd\\test.log; waiting 5 seconds"}
2016-04-27 14:23:50 +0900 [info]: detected rotation of C:\app\fluentd\test.log
2016-04-27 14:23:50 +0900 [error]: undefined method `pe=' for #<Fluent::TailInput::TailWatcher::NullIOHandler:0x000000027030d0>
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:481:in `swap_state'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:471:in `on_rotate'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:639:in `call'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:639:in `on_notify'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:406:in `on_notify'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:494:in `call'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:494:in `on_timer'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/cool.io-1.4.3/lib/cool.io/loop.rb:88:in `run_once'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/cool.io-1.4.3/lib/cool.io/loop.rb:88:in `run'
  2016-04-27 14:23:50 +0900 [error]: C:/app/Ruby22-x64/lib/ruby/gems/2.2.0/gems/fluentd-0.14.0.pre.1/lib/fluent/plugin/in_tail.rb:264:in `run'
2016-04-27 14:23:50 +0900 fluent.info: {"message":"detected rotation of C:\\app\\fluentd\\test.log"}
2016-04-27 14:23:50 +0900 fluent.error: {"message":"undefined method `pe=' for #<Fluent::TailInput::TailWatcher::NullIOHandler:0x000000027030d0>"}
@repeatedly repeatedly self-assigned this Apr 27, 2016
@repeatedly repeatedly added the bug Something isn't working label Apr 27, 2016
@repeatedly
Copy link
Member

We changed in_tail mechanizm in #788 and #811.
If you have a time, could you check these commits affect this issue or not?

@tako774
Copy link
Author

tako774 commented May 13, 2016

I confirmed this occurs at current master branch on my windows10 PC with Ruby2.2.

Then I unmerged #788 (@64c5d0c) from in_tail by text editor, it didn't reproduce.
Maybe change in #788 affects this issue.

Finally, I unmerge this section from current master, it didn't reproduce.

  -        io = FileWrapper.open(@path)
  -        stat = io.stat
  +       stat = FileWrapper.stat(@path)

open -> stat is good, but stat only is bad, maybe...

@repeatedly
Copy link
Member

Thanks for the investigation!

@nurse Do you know why this situation happens? open and stat have different error handling?

@nurse
Copy link
Contributor

nurse commented May 19, 2016

FileWrapper.open keeps given file opening forever, but FileWrapper.stat closes the file after create stat object.
By keeping the file open, it can't be deleted (or deleted from explorer but still exist on filesystem like Unix; this depends FILE_SHARE_DELETE flag).

@repeatedly
Copy link
Member

@nurse Do you have a good idea to resole this problem? Keeping same behaviour with Linux is better.

@nurse
Copy link
Contributor

nurse commented May 30, 2016

I don't understand why above difference affect this issue.

@tako774
Copy link
Author

tako774 commented May 31, 2016

I explored this issue with pry.

  • io = FileWrapper.open(@path) -> stat = io.stat pattern sequence
  1. "make file"
  2. => inode = <file index no.>, size = <file size> from stat = io.stat
  3. "delete file"
  4. => io = FileWrapper.open(@path) throws Errno::ENOENT
  5. => inode = nil, size = 0 from stat = io.stat
  6. => inode is changed, so on_rotate is called
  7. => logs detected rotation; wait 5 seconds
  8. => @rotate_handler = nil is executed
  • stat = FileWrapper.stat(@path) pattern sequence
  1. "make file"
  2. => inode = <file index no.>, size = <file size> from stat = FileWrapper.stat(@path)
  3. "delete file"
  4. => stat = FileWrapper.stat(@path) does not throws Errno::ENOENT
  5. => inode = 3484713384184393820 (=0x305c305c305c305c), size = file size from stat = FileWrapper.stat(@path)
  6. => inode is changed, so on_rotate is called
  7. => logs detected rotation; wait 5 seconds
  8. => @rotate_handler = nil is executed
  9. => `on_notify` is called in `on_rotate` > `update_watcher` > `setup_watcher` > `attach(loop)`
    
  10. => in on_notify, @inode = nil and @fsize = -1, but inode = 3484713384184393820, size = file size
  11. => then on_rotate is called (2nd time), and @io_handler is nil, so @io_handler = NullIOHandler.new is executed.
  12. on_notify, next~before 5 seconds elapse after No.6
  13. => inode and size not changed from No.5.
  14. on_notify, more than 5 seconds elapsed after No.6
  15. => stat = FileWrapper.stat(@path) throws exception Errno::ENOENT
  16. => inode = nil, size = 0
  17. => inode is changed, so on_rotate is called (3rd time)
  18. => logs detected rotation
  19. => undefined methodpe=' for #<Fluent::TailInput::TailWatcher::NullIOHandleroccurs inon_rotate>swap_state>@io_handler.pe`

@tako774
Copy link
Author

tako774 commented May 31, 2016

Maybe, fluentd expects FileWrapper.stat throws Errno::ENOENT, but it successfully returns with inode = 3484713384184393820 (=0x305c305c305c305c).

In FileWrapper.stat, GetFileInformationByHandle is called.
https://msdn.microsoft.com/ja-jp/library/cc429317.aspx
When file deleted at stat = FileWrapper.stat(@path) pattern, GetFileInformationByHandle returns 0x305c * 18.

@nurse
Copy link
Contributor

nurse commented May 31, 2016

As far as I checked, FileWrapper.stat(nonexistent) raised Errno::ENOENT as expected because the begginng of FileWrapper.stat calls File.stat(nonexistent_path), which raises Errno::ENOENT.
Even if it may raise other than Errno::ENOENT, when the path is invalid (EINVAL) or it doesn't have permission...

@tako774
Copy link
Author

tako774 commented Jun 9, 2016

FileWrapper.stat doesn't raise Errno::ENOENT or other exceptions only in 5 seconds from when the file is deleted (with explorer, cmd and so on) to when 5 seconds elapsed after "detected rotation; wait 5 seconds". After 5 seconds elapsed, FileWrapper.stat raises Errno::ENOENT as expected.

In that 5 seconds, deleted file can be seen from explorer or dir, but to open or access to the security descriptor by cacls is blocked with permission error.

FileWrapper.stat(deleted_but_existing_file_path) raises no exception, so this issue happens.

@nurse
Copy link
Contributor

nurse commented Jun 10, 2016

Thank your for inspection.

FileWrapper.stat doesn't raise Errno::ENOENT or other exceptions only in 5 seconds from when the file is deleted (with explorer, cmd and so on) to when 5 seconds elapsed after "detected rotation; wait 5 seconds". After 5 seconds elapsed, FileWrapper.stat raises Errno::ENOENT as expected.

Ah, it seems because of Lazy Write Back of Windows.
In that case, is it always returns 0x305c305c305c305c?
If so it can be avoided by following patch:

diff --git a/lib/fluent/plugin/file_wrapper.rb b/lib/fluent/plugin/file_wrapper.rb
index b6ecd41..6a7a39c 100644
--- a/lib/fluent/plugin/file_wrapper.rb
+++ b/lib/fluent/plugin/file_wrapper.rb
@@ -107,7 +107,11 @@ def ino
         return 0
       end

-      by_handle_file_information.unpack("I11Q1")[11] # fileindex
+      fileindex = by_handle_file_information.unpack("I11Q1")[11]
+      if fileindex == 0x305c305c305c305c
+        raise  Errno::ENOENT, @path
+      end
+      fileindex
     end

     def stat

@tako774
Copy link
Author

tako774 commented Jun 10, 2016

I checked with 32bit ruby (2.2.4p230 (2015-12-16 revision 53155) [i386-mingw32]), this issue does not happen.

Difference between 32bit and 64bit occurs in code of `WindowsFile.initialize' showed below.

      @file_handle = CreateFile.call(@path, access, sharemode,
                     0, creationdisposition, FILE_ATTRIBUTE_NORMAL, 0)
      if @file_handle == INVALID_HANDLE_VALUE
        err = GetLastError.call
        if err == ERROR_FILE_NOT_FOUND || err == ERROR_ACCESS_DENIED
          raise SystemCallError.new(2)
        end
        raise SystemCallError.new(err)
      end

When @path is deleted but existing file path,

  • with 32bit ruby: @file_handle = 0xFFFFFFFF, so @file_handle == INVALID_HANDLE_VALUE is satisfied and GetLastError.call returns 5, so err == ERROR_ACCESS_DENIED is staisfied. Then it does raise SystemCallError.new(2).
  • with 64bit ruby: @file_handle = 0xFFFFFFFFFFFFFFFF, so no error raised here.

I replaced all INVALID_HANDLE_VALUE to INVALID_HANDLE_VALUE64 and add INVALID_HANDLE_VALUE64 = 0xFFFFFFFFFFFFFFFF, then 64bit version works as 32bit.
But I don't know what is the best way to check invalid file handle with 64bit ruby (to check first 4 bytes, or last 4 bytes or all 8 bytes, or other ways...).

@tako774
Copy link
Author

tako774 commented Jun 10, 2016

In windows-pr library CreateFile is defined as return value L = Long
https://github.com/djberg96/windows-pr/blob/windows-pr-1.2.4/lib/windows/file.rb#L206

API.new('CreateFile', 'SLLPLLL', 'L')

Though INVALID_FILE_HANDLE is defined 0xFFFFFFFF.
https://github.com/djberg96/windows-pr/blob/windows-pr-1.2.4/lib/windows/handle.rb#L12

INVALID_HANDLE_VALUE = 0xFFFFFFFF

In 64bit environment it causes problem.

@nurse
Copy link
Contributor

nurse commented Jun 13, 2016

Hmm, it sounds windows-pr's issue. On mswin64, it must define INVALID_HANDLE_VALUE as 0xFFFFFFFFFFFFFFFF.

@repeatedly
Copy link
Member

After released new windows-pr, I will close this issue.

@cosmo0920
Copy link
Contributor

I've released windows-pr 1.2.5.
Please confirm this issue has been fixed.

@tako774
Copy link
Author

tako774 commented Jun 22, 2016

I updated window-pr and confirmed this issue didn't happen with both ruby2.2.4 x86/x64 for the scenario which is 2 times loop of make file -> delete file.
Thanks for your great job! 👍🐱👍

repeatedly added a commit that referenced this issue Jun 23, 2016
Update windows-pr lowest dependency to 1.2.5. fix #920
@repeatedly
Copy link
Member

I updated dependency version to v1.2.5.
Thanks for hard work!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants