Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_exec: Can't handle non-ASCII characters output #4460

Closed
daipom opened this issue Apr 3, 2024 · 1 comment · Fixed by #4533
Closed

in_exec: Can't handle non-ASCII characters output #4460

daipom opened this issue Apr 3, 2024 · 1 comment · Fixed by #4533
Labels
bug Something isn't working

Comments

@daipom
Copy link
Contributor

daipom commented Apr 3, 2024

Describe the bug

in_exec can not handle non-ASCII characters output.

It is because of the specification of child_process_execute:

  • external_encoding: ascii-8bit
  • internal_encoding: utf-8
  • encoding_options: invalid: :replace, undef: :replace

This always breaks none non-ASCII characters.

def child_process_execute(
title, command,
arguments: nil, subprocess_name: nil, interval: nil, immediate: false, parallel: false,
mode: [:read, :write], stderr: :discard, env: {}, unsetenv: false, chdir: nil,
internal_encoding: 'utf-8', external_encoding: 'ascii-8bit', scrub: true, replace_string: nil,
wait_timeout: nil, on_exit_callback: nil,
&block
)

encoding_options = {}
if scrub
encoding_options[:invalid] = encoding_options[:undef] = :replace
if replace_string
encoding_options[:replace] = replace_string
end
end

readio.set_encoding(external_encoding, internal_encoding, **encoding_options)

We can easily confirm the io behavior by irb:

irb(main):001:0> require "open3"
=> true
irb(main):002:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 6>, #<IO:fd 7>, #<Process::Waiter:0x00007f7d942fea40 run>]
irb(main):003:0> r_io.read
=> "こんにちは\n"
irb(main):004:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 8>, #<IO:fd 9>, #<Process::Waiter:0x00007f7d942d45b0 run>]
irb(main):005:0> r_io.set_encoding(Encoding::ASCII_8BIT, Encoding::UTF_8, invalid: :replace, undef: :replace)
=> #<IO:fd 9>
irb(main):006:0> r_io.read
=> "���������������\n"
irb(main):007:0> 

I'm wondering if we should fix the implementation of in_exec as follows:

diff --git a/lib/fluent/plugin/in_exec.rb b/lib/fluent/plugin/in_exec.rb
index c2851366..ab514957 100644
--- a/lib/fluent/plugin/in_exec.rb
+++ b/lib/fluent/plugin/in_exec.rb
@@ -74,9 +74,9 @@ module Fluent::Plugin
       super

       if @run_interval
-        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], internal_encoding: nil, &method(:run))
       else
-        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], internal_encoding: nil,  &method(:run))
       end
     end

By specifying internal_encoding: nil, we can stop the automatic encoding conversion in child_process_execute.
This allows in_exec to handle non-ASCII characters.

Does the current automatic encoding conversion make any sense?
One possible cause could be that the encoding of the data must be utf-8.
Even if so, I believe it would be wrong to always convert the actual encode to utf-8 of the result of the command in in_exec.

To Reproduce

Run the following sample config.

Expected behavior

in_exec can handle non-ASCII characters output as well.

Your Environment

- Fluentd version: 1.16.5
- Operating system: Ubuntu 20.04.6 LTS, Windows 10
- Kernel version: 5.15.0-101-generic

Your Configuration

<source>
  @type exec
  command "echo こんにちは"
  tag test
  <parse>
    @type none
  </parse>
</source>

<match test>
  @type stdout
</match>

Your Error Log

(No error, but I put the stdout output here.)

2024-04-03 16:51:59 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: parsing config file is succeeded path="/test/fluentd/config/in_exec/1.conf"
2024-04-03 16:51:59 +0900 [info]: gem 'fluentd' version '1.16.5'
2024-04-03 16:51:59 +0900 [info]: using configuration file: <ROOT>
  <source>
    @type exec
    command "echo こんにちは"
    tag "test"
    <parse>
      @type "none"
    </parse>
  </source>
  <match test>
    @type stdout
  </match>
</ROOT>
2024-04-03 16:51:59 +0900 [info]: starting fluentd-1.16.5 pid=439655 ruby="3.2.2"
2024-04-03 16:51:59 +0900 [info]: spawn command to main:  cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/test/fluentd/config/in_exec/1.conf", "--under-supervisor"]
2024-04-03 16:51:59 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: adding match pattern="test" type="stdout"
2024-04-03 16:51:59 +0900 [info]: adding source type="exec"
2024-04-03 16:51:59 +0900 [info]: #0 starting fluentd worker pid=439675 ppid=439655 worker=0
2024-04-03 16:51:59 +0900 [info]: #0 fluentd worker is now running worker=0
2024-04-03 16:51:59.808444702 +0900 test: {"message":"���������������"}

Additional context

No response

@daipom
Copy link
Contributor Author

daipom commented Apr 3, 2024

#4058 describes Ruby 3.3 behavior.
The new specification of Ruby 3.3 will fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants