Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot make to rotate log file of fluentd application on windows #2446

Closed
ISChetverikov opened this issue Jun 5, 2019 · 20 comments · Fixed by #2663
Closed

Cannot make to rotate log file of fluentd application on windows #2446

ISChetverikov opened this issue Jun 5, 2019 · 20 comments · Fixed by #2663
Labels

Comments

@ISChetverikov
Copy link

ISChetverikov commented Jun 5, 2019

fluentd or td-agent version.

td-agent-3.4.1-0-x64

Environment information:

Operating system:
Microsoft Windows 10 Enterprise 1703
BuildNumber: 15063
Version: 10.0.15063
OSArchitecture: 64-bit
Kernel version:

Your configuration

C:\opt\td-agent\c

# listen http port
<source>
  @type http
  port 9880
  bind 0.0.0.0
</source>

# send handled messages to elastic
<match **>
	@type copy
	<store>                                                 
		@type stdout
	</store> 
	<store>
	@type elasticsearch_dynamic
	host my.remote.host
	port 9200
	logstash_prefix buffer_test
	logstash_format true
	type_name logevent
	<buffer>
		@type file
		path buffer/Handled
		total_limit_size 10MB
		overflow_action drop_oldest_chunk
		queued_chunks_limit_size 10
		flush_interval 4s
		retry_forever true
		flush_thread_count 2
	</buffer>
	</store>
</match>

Your problem explanation. If you have an error logs, write it together.

Hello! I cannot make to rotate logs file of fluentd application. I use the following command under administrator user to launch program:

fluentd -c c -o C:\opt\td-agent\log --log-rotate-age 3 --log-rotate-size 1000000

But when file C:\opt\td-agent\log reaches the limit of 1000000 bytes, I get messages:

log shifting failed. Permission denied @ rb_file_s_rename - (C:\opt\td-agent\log, C:\opt\td-agent\log.0)
log writing failed. closed stream
log shifting failed. closed stream
log writing failed. closed stream
log shifting failed. closed stream
log writing failed. closed stream
log shifting failed. closed stream
log writing failed. closed stream
log shifting failed. closed stream
log writing failed. closed stream
log shifting failed. closed stream
log writing failed. closed stream
.... (after that many of these lines)
@repeatedly
Copy link
Member

Hmm... Does windows have different permission for file creation and file rename?

@repeatedly
Copy link
Member

rename requires delete permission. Could you check the permission?

@ISChetverikov
Copy link
Author

ISChetverikov commented Jun 12, 2019

Thank you for your response. I have checked file's ("C:\opt\td-agent\log") permissions and it has Full Control permissions including Delete.

@ISChetverikov
Copy link
Author

ISChetverikov commented Jun 12, 2019

I've tried to set Full Control permissions for Everyone on C:\opt folder, but log rotation still does not work

@ganmacs ganmacs added the windows label Jul 1, 2019
@Coldln
Copy link

Coldln commented Sep 16, 2019

I'm facing the same issue with Windows 7

@repeatedly
Copy link
Member

@cosmo0920 If you have a time, could you check this? fluentd log depends on serverengine.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Sep 26, 2019

Yep, I'll check this.

I guess that Windows API complains that opened files cannot delete from other process or same process.
In Windows, emulating unlink behavior on Unix like OS is very hard.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Sep 26, 2019

I checked the serverengine and Ruby logger library code.
ServerEngine delegates logrotate feature to Ruby logger library.

I found that Ruby logger does not use File::SHARE_DELETE flag which had been introduced in Ruby 2.3:

I guess that it's a bug for Ruby logger library.

@repeatedly
Copy link
Member

repeatedly commented Sep 26, 2019

Awesome! On windows, logger should set SHARE_DELETE by default or use parameter at initialize from user code?

@cosmo0920
Copy link
Contributor

On windows, logger should set SHARE_DELETE by default or use parameter at initialize from user code?

Yes!

@Coldln
Copy link

Coldln commented Sep 26, 2019

Unfourtantly, it isn't helping to add File::SHARE_DELETE in logger.rb

    def open_logfile(filename)
      begin
        open(filename, (File::WRONLY | File::APPEND | File::SHARE_DELETE))
      rescue Errno::ENOENT
        create_logfile(filename)
      end
    end
    def create_logfile(filename)
      begin
        logdev = open(filename, (File::WRONLY | File::APPEND | File::CREAT | File::EXCL | File::SHARE_DELETE))

@cosmo0920
Copy link
Contributor

Oops. Windows also complains lack of Permission in MoveFileEx (inside File.rename):

https://github.com/ruby/ruby/blob/e4be2fda3dbbfdb1f2ace697c96cf6bdd7dfef21/win32/win32.c#L5368

@cosmo0920
Copy link
Contributor

cosmo0920 commented Sep 26, 2019

Simple script does not complains permission denied....

require "logger"

logger = Logger.new('test.log', 5, 30*1024)

loop do
  logger.error "lost connection"
  logger.debug "got new connection"
end

It seems that the above script is working as expected.
It's curious....

@Ntzafrir
Copy link

Ntzafrir commented Sep 26, 2019

It seems that the above script is working as expected.
It's curious....

Try this one instead

require "logger"

logger = Logger.new('test.log', 5, 30*1024)
myOtherLogger = Logger.new('test.log', 5, 30*1024)

loop do
  logger.error "lost connection"
  logger.debug "got new connection"
end

A logger is created 3 times with same file name in the supervisor module

@Ntzafrir
Copy link

Unfourtantly, it isn't helping to add File::SHARE_DELETE in logger.rb

Quoting from the Release notes
new mode flag File::SHARE_DELETE is available. this flag means to permit deleting opened file on Windows, but currently this affect only files opened as binary. [Feature #11218]

@ShironB
Copy link

ShironB commented Oct 3, 2019

@Ntzafrir and I managed to isolate the problem, the problem is actually in fluentd code.
According to what we were able to understand is that the same log file (td-agent.log) is opened 3 times while the log shifting part closes only one of them before trying to rotate the log. When working with linux it doesn't affect the rotation as Linux OS doesn't hold the files in order to write to it, while Windows does.

In the code there are 3 places where the same log is being initialized:

  • 2 times in the fluentd.rb - one for the supervisor and one for the worker (while creating a new supervisor object) - lines 309-319
  • 1 time in the supervisor.rb - lines 279-283

As a workaround in our production system, we separated the logs to 3 different files via the following method:
in supervisor.rb:

      logger_initializer = Supervisor::LoggerInitializer.new(
        'c:\opt\td-agent\General.log', log_level, chuser, chgroup, log_opts,
        log_rotate_age: log_rotate_age,
        log_rotate_size: log_rotate_size
      )

in fluentd.rb:

if opts[:supervise]
  opts[:log_path] = 'c:\opt\td-agent\Supervisor.log'
  Fluent::Supervisor.new(opts).run_supervisor
else
  if opts[:standalone_worker] && opts[:workers] && opts[:workers] > 1
    puts "Error: multi workers is not supported with --no-supervisor"
    exit 2
  end
  opts[:log_path] = 'c:\opt\td-agent\td-agent.log'
  Fluent::Supervisor.new(opts).run_worker
end

That way most of the logs lines which are written by the worker are still written to td-agent.log and the rotation is able to perform.

@cosmo0920
Copy link
Contributor

I've reported this in Ruby language bug tracker.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 18, 2019

Log rotation relies on renaming the current log file (app.log -> app.log.0) and the reopening the log file (app.log). This does not work on Windows in a multi-process environment because you cannot rename a file open by another process on Windows. I think it should work in a multi-thread single-process environment on Windows.

I don't think this problem is fixable in logger. You should probably use an alternative logger implementation, maybe one that logs to another single process that can handle log rotation. Alternatively, you can have each worker process use its own log file, and then combine the log files later.

From the ticket, this issue seems to be Fluentd (or ServerEngine) logging strategy issue.
In Windows multi-process environment, we cannot rename a file which is opened by another process. This should be Windows system limitation.

Then, we should do the following:

  • Separate log file for each process on Windows
  • Create log process and aggregate log for other processes and other processes don't write log on Windows

Which is the better solution for this?

@repeatedly
Copy link
Member

I see... former seems to easy to implement because we can use SERVERENGINE_WORKER_ID to seperate log files.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Oct 18, 2019

How about this patch?
In my Windows 10 pro environment, logrotation works but still a bit unstable....

diff --git a/lib/fluent/command/fluentd.rb b/lib/fluent/command/fluentd.rb
index 93e2c35c..53883946 100644
--- a/lib/fluent/command/fluentd.rb
+++ b/lib/fluent/command/fluentd.rb
@@ -311,6 +311,12 @@ exit 0 if early_exit
 
 require 'fluent/supervisor'
 if opts[:supervise]
+  if Fluent.windows?
+    require 'pathname'
+
+    log_path = Pathname(opts[:log_path]).sub_ext("-supervisor#{Pathname(opts[:log_path]).extname}").to_s
+    opts[:log_path] = log_path
+  end
   Fluent::Supervisor.new(opts).run_supervisor
 else
   if opts[:standalone_worker] && opts[:workers] && opts[:workers] > 1
diff --git a/lib/fluent/supervisor.rb b/lib/fluent/supervisor.rb
index 44715b55..35f443f0 100644
--- a/lib/fluent/supervisor.rb
+++ b/lib/fluent/supervisor.rb
@@ -354,13 +354,21 @@ module Fluent
         @log_rotate_size = log_rotate_size
       end
 
+      def worker_id_suffixed_path(worker_id, path)
+        require 'pathname'
+
+        Pathname(path).sub_ext("-#{worker_id}#{Pathname(path).extname}").to_s
+      end
+
       def init(process_type, worker_id)
         @opts[:process_type] = process_type
         @opts[:worker_id] = worker_id
 
         if @path && @path != "-"
           @logdev = if @log_rotate_age || @log_rotate_size
-                     Fluent::LogDeviceIO.new(@path, shift_age: @log_rotate_age, shift_size: @log_rotate_size)
+                     Fluent::LogDeviceIO.new(Fluent.windows? ?
+                                               worker_id_suffixed_path(worker_id, @path) : @path,
+                                             shift_age: @log_rotate_age, shift_size: @log_rotate_size)
                    else
                      File.open(@path, "a")
                    end

Result

The above patch works as bellow:

PS> bundle exec fluentd -c .\some_fluent.conf -o fluentd.log --log-rotate-size 2048 --log-rotate-age 20 -v
> ls fluentd-*.log*                                                                                                                                                                 
# ...
-a----        2019/10/18    18:23            150 fluentd-0.log
-a----        2019/10/18    18:23           2059 fluentd-0.log.0                                                        
-a----        2019/10/18    18:23           2059 fluentd-0.log.1                                                       
-a----        2019/10/18    18:21           2059 fluentd-0.log.10                                                       
-a----        2019/10/18    18:20           2101 fluentd-0.log.11                                                       
-a----        2019/10/18    18:20           2064 fluentd-0.log.12                                                       
-a----        2019/10/18    18:20           2120 fluentd-0.log.13                                                       
-a----        2019/10/18    18:20           2087 fluentd-0.log.14                                                       
-a----        2019/10/18    18:20           2059 fluentd-0.log.15                                                       
-a----        2019/10/18    18:19           2059 fluentd-0.log.16                                                       
-a----        2019/10/18    18:19           2059 fluentd-0.log.17                                                       
-a----        2019/10/18    18:19           2059 fluentd-0.log.18                                                       
-a----        2019/10/18    18:22           2059 fluentd-0.log.2                                                        
-a----        2019/10/18    18:22           2114 fluentd-0.log.3                                                        
-a----        2019/10/18    18:22           2136 fluentd-0.log.4                                                        
-a----        2019/10/18    18:22           2064 fluentd-0.log.5                                                        
-a----        2019/10/18    18:22           2059 fluentd-0.log.6                                                        
-a----        2019/10/18    18:21           2059 fluentd-0.log.7                                                        
-a----        2019/10/18    18:21           2059 fluentd-0.log.8                                                        
-a----        2019/10/18    18:21           2059 fluentd-0.log.9                                                        
-a----        2019/10/18    18:22           1724 fluentd-supervisor-0.log                                               
-a----        2019/10/18    18:22           2052 fluentd-supervisor-0.log.0                                            
-a----        2019/10/18    18:20           2051 fluentd-supervisor-0.log.2                                            
-a----        2019/10/18    18:16           2051 fluentd-supervisor-0.log.4 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants