Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of SIGCONT after SIGTERM #4034

Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions lib/fluent/supervisor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,12 @@ def install_supervisor_signal_handlers
$log.debug 'fluentd supervisor process got SIGUSR2'
supervisor_sigusr2_handler
end

term_handler = trap :TERM do
# After SIGTERM, disable the sigdump file by ignoring SIGCONT.
trap(:CONT, nil)
term_handler.call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This term_handler.call() will call Server.stop() in ServerEngine.
I think this is a nice way.

Copy link
Contributor

@daipom daipom Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safer to consider the possibility that term_hanler is nil?
(It seems to be usually impossible though. So I understand the idea of not needing a nil check here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, term_handler may be a string.
I will add a check to see if it is a Proc.

Copy link
Contributor

@daipom daipom Feb 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! This can be String or Proc or nil according to the following document.

https://docs.ruby-lang.org/ja/latest/method/Signal/m/trap.html

I will add a check to see if it is a Proc.

So this looks good!

end
end

if Fluent.windows?
Expand Down Expand Up @@ -922,6 +928,9 @@ def install_main_process_signal_handlers
end

trap :TERM do
# After SIGTERM, disable the sigdump file by ignoring SIGCONT.
trap(:CONT, nil)

$log.debug "fluentd main process get SIGTERM"
unless @finished
@finished = true
Expand Down
88 changes: 86 additions & 2 deletions test/test_supervisor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ def setup
@tmp_dir = tmp_dir
@tmp_root_dir = File.join(@tmp_dir, 'root')
FileUtils.mkdir_p(@tmp_dir)
@sigdump_path = "/tmp/sigdump-#{$$}.log"
end

def teardown
Expand Down Expand Up @@ -191,7 +192,7 @@ def test_system_config
end
end

def test_main_process_signal_handlers
def test_usr1_in_main_process_signal_handlers
omit "Windows cannot handle signals" if Fluent.windows?

create_info_dummy_logger
Expand All @@ -213,6 +214,52 @@ def test_main_process_signal_handlers
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
end

def test_cont_in_main_processsignal_handlers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_cont_in_main_processsignal_handlers
def test_cont_in_main_process_signal_handlers

omit "Windows cannot handle signals" if Fluent.windows?

opts = Fluent::Supervisor.default_options
sv = Fluent::Supervisor.new(opts)
sv.send(:install_main_process_signal_handlers)

begin
Process.kill :CONT, $$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Process.pid is preferred than $$ for readability.

Suggested change
Process.kill :CONT, $$
Process.kill :CONT, Process.pid

rescue
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this begin ... rescue ... end?


sleep 1

assert_true File.exist?(@sigdump_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Power assert style is preferred for better failure message:

Suggested change
assert_true File.exist?(@sigdump_path)
assert do
File.exist?(@sigdump_path)
end

assert_true ... shows just File.exist?(@sigdump_path) is not true on failure.
But assert {...} shows not only File.exist?(@sigdump_path) value but also @sigdump_path and so on.

ensure
File.delete(@sigdump_path) if File.exist?(@sigdump_path)
end

def test_term_cont_in_main_processsignal_handlers
omit "Windows cannot handle signals" if Fluent.windows?

create_debug_dummy_logger

opts = Fluent::Supervisor.default_options
sv = Fluent::Supervisor.new(opts)
sv.send(:install_main_process_signal_handlers)

begin
Process.kill :TERM, $$
Process.kill :CONT, $$
rescue
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this begin ... rescue ... end?


sleep 1

debug_msg = '[debug]: fluentd main process get SIGTERM' + "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
debug_msg = '[debug]: fluentd main process get SIGTERM' + "\n"
debug_msg = "[debug]: fluentd main process get SIGTERM\n"

logs = $log.out.logs
assert{ logs.any?{|log| log.include?(debug_msg) } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert{ logs.any?{|log| log.include?(debug_msg) } }
assert{ logs.any?{|log| log.include?(debug_msg) } }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be the same code.
I would like to know if there is a better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry. This is garbage.


assert_false File.exist?(@sigdump_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_false File.exist?(@sigdump_path)
assert do
not File.exist?(@sigdump_path)
end

ensure
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
$log.out.reset if $log&.out&.respond_to?(:reset)

File.delete(@sigdump_path) if File.exist?(@sigdump_path)
end

def test_main_process_command_handlers
omit "Only for Windows, alternative to UNIX signals" unless Fluent.windows?

Expand All @@ -239,7 +286,7 @@ def test_main_process_command_handlers
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
end

def test_supervisor_signal_handler
def test_usr1_in_supervisor_signal_handler
omit "Windows cannot handle signals" if Fluent.windows?

create_debug_dummy_logger
Expand All @@ -260,6 +307,43 @@ def test_supervisor_signal_handler
$log.out.reset if $log && $log.out && $log.out.respond_to?(:reset)
end

def test_cont_in_supervisor_signal_handler
omit "Windows cannot handle signals" if Fluent.windows?

server = DummyServer.new
server.install_supervisor_signal_handlers
begin
Process.kill :CONT, $$
rescue
end

sleep 1

assert_true File.exist?(@sigdump_path)
ensure
File.delete(@sigdump_path) if File.exist?(@sigdump_path)
end

# TODO Sending SIGTERM terminates the test itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it seems to depend on the order of the test execution.

I haven't been able to check it clearly, but it looks like the proc obj of term_handler is set to that of the previous test.
This may cause unexpected test results.

We may need some measures for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was mistaken.

# I think we have to figure out a good way.
# def test_term_cont_in_supervisor_signal_handler
# omit "Windows cannot handle signals" if Fluent.windows?
#
# server = DummyServer.new
# server.install_supervisor_signal_handlers
# begin
# Process.kill :TERM, $$
# Process.kill :CONT, $$
# rescue
# end
#
# sleep 1
#
# assert_not File.exist?(@sigdump_path)
# ensure
# File.delete(@sigdump_path) if File.exist?(@sigdump_path)
# end

def test_windows_shutdown_event
omit "Only for Windows platform" unless Fluent.windows?

Expand Down