Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_encoding parameter to in_tail plugin #1067

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 23 additions & 9 deletions lib/fluent/plugin/in_tail.rb
Original file line number Diff line number Diff line change
Expand Up @@ -64,14 +64,10 @@ def initialize
config_param :multiline_flush_interval, :time, default: nil
desc 'Enable the additional watch timer.'
config_param :enable_watch_timer, :bool, default: true
desc 'The encoding after conversion of the input.'
config_param :encoding, default: nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify type of this parameter.

desc 'The encoding of the input.'
config_param :encoding, default: nil do |encoding_name|
begin
Encoding.find(encoding_name)
rescue ArgumentError => e
raise ConfigError, e.message
end
end
config_param :from_encoding, default: nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

desc 'Add the log path being tailed to records. Specify the field name to be used.'
config_param :path_key, :string, default: nil

Expand All @@ -92,6 +88,8 @@ def configure(conf)

configure_parser(conf)
configure_tag
@encoding = parse_encoding_param(conf['encoding'])
@from_encoding = parse_encoding_param(conf['from_encoding'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bad practice to get values by conf[] reference, because it's not affected from type conversion and default value configurations. It might be done by subclasses (3rd party plugin which inherits in_tail).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@from_encoding looks to require that @encoding is configured.
It should be checked at here.


@multiline_mode = conf['format'] =~ /multiline/
@receive_handler = if @multiline_mode
Expand All @@ -117,6 +115,14 @@ def configure_tag
end
end

def parse_encoding_param(encoding_name)
begin
Encoding.find(encoding_name) if encoding_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it's invalid to call this method with nil argument, and there are no code to do such things.
So if in postposition looks nonsense.

rescue ArgumentError => e
raise ConfigError, e.message
end
end

def start
super

Expand Down Expand Up @@ -251,7 +257,11 @@ def close_watcher_after_rotate_wait(tw)
def flush_buffer(tw)
if lb = tw.line_buffer
lb.chomp!
lb.force_encoding(@encoding) if @encoding
if @encoding && @from_encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is microbenchmark result. I think default should be fast. So f2 implementation is better for almost users.

% ruby b_enc_perf.rb
Rehearsal --------------------------------------------------------
f1 with nil / nil      0.840000   0.000000   0.840000 (  0.842853)
f2 with nil / nil      0.760000   0.000000   0.760000 (  0.769146)
f1 with true / true    0.870000   0.000000   0.870000 (  0.871896)
f2 with true / true    0.920000   0.000000   0.920000 (  0.917469)
f1 with true / false   0.990000   0.000000   0.990000 (  0.989552)
f2 with true / false   0.920000   0.010000   0.930000 (  0.926458)
----------------------------------------------- total: 5.310000sec

                           user     system      total        real
f1 with nil / nil      0.890000   0.000000   0.890000 (  0.890333)
f2 with nil / nil      0.820000   0.000000   0.820000 (  0.822715)
f1 with true / true    0.890000   0.000000   0.890000 (  0.885648)
f2 with true / true    0.940000   0.000000   0.940000 (  0.938831)
f1 with true / false   1.000000   0.000000   1.000000 (  1.003493)
f2 with true / false   0.950000   0.000000   0.950000 (  0.956331)

Code:

require 'benchmark'

class Foo
  def initialize(enc, from_enc)
    @enc = enc
    @from_enc = from_enc
  end

  def f1(num)
    num.times {
      if @enc && @from_enc
        1
      elsif @enc
        2
      end
    }
  end

  def f2(num)
    num.times {
      if @enc
        if @from_enc
          1
        else
          2
        end
      end
    }
  end
end

n = 20000000

Benchmark.bmbm do |x|
  x.report('f1 with nil / nil') {
    foo = Foo.new(nil, nil)
    foo.f1(n)
  }
  x.report('f2 with nil / nil') {
    foo = Foo.new(nil, nil)
    foo.f2(n)
  }

  x.report('f1 with true / true') {
    foo = Foo.new(true, true)
    foo.f1(n)
  }
  x.report('f2 with true / true') {
    foo = Foo.new(true, true)
    foo.f2(n)
  }

  x.report('f1 with true / false') {
    foo = Foo.new(true, false)
    foo.f1(n)
  }
  x.report('f2 with true / false') {
    foo = Foo.new(true, false)
    foo.f2(n)
  }
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for microbenchmark result!

lb.encode!(@encoding, @from_encoding)
elsif @encoding
lb.force_encoding(@encoding)
end
@parser.parse(lb) { |time, record|
if time && record
tag = if @tag_prefix || @tag_suffix
Expand Down Expand Up @@ -300,7 +310,11 @@ def receive_lines(lines, tail_watcher)
def convert_line_to_event(line, es, tail_watcher)
begin
line.chomp! # remove \n
line.force_encoding(@encoding) if @encoding
if @encoding && @from_encoding
line.encode!(@encoding, @from_encoding)
elsif @encoding
line.force_encoding(@encoding)
end
@parser.parse(line) { |time, record|
if time && record
record[@path_key] ||= tail_watcher.path unless @path_key.nil?
Expand Down
58 changes: 58 additions & 0 deletions test/plugin/test_in_tail.rb
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,17 @@ def test_configure_encoding
end
end

def test_configure_from_encoding
# valid from_encoding
d = create_driver(SINGLE_LINE_CONFIG + 'from_encoding utf-8')
assert_equal Encoding::UTF_8, d.instance.from_encoding

# invalid from_encoding
assert_raise(Fluent::ConfigError) do
create_driver(SINGLE_LINE_CONFIG + 'from_encoding no-such-encoding')
end
end

# TODO: Should using more better approach instead of sleep wait

def test_emit
Expand Down Expand Up @@ -403,6 +414,28 @@ def test_encoding(data)
assert_equal(encoding, emits[0][2]['message'].encoding)
end

def test_from_encoding
d = create_driver %[
format /(?<message>.*)/
read_from_head true
from_encoding cp932
encoding utf-8
]

d.run do
sleep 1

File.open("#{TMP_DIR}/tail.txt", "w:cp932") {|f|
f.puts "\x82\xCD\x82\xEB\x81\x5B\x82\xED\x81\x5B\x82\xE9\x82\xC7".force_encoding(Encoding::CP932)
}
sleep 1
end

emits = d.emits
assert_equal("\x82\xCD\x82\xEB\x81\x5B\x82\xED\x81\x5B\x82\xE9\x82\xC7".force_encoding(Encoding::CP932).encode(Encoding::UTF_8), emits[0][2]['message'])
assert_equal(Encoding::UTF_8, emits[0][2]['message'].encoding)
end

# multiline mode test

def test_multiline
Expand Down Expand Up @@ -507,6 +540,31 @@ def test_multiline_encoding_of_flushed_record(data)
end
end

def test_multiline_from_encoding_of_flushed_record
d = create_driver %[
format multiline
format1 /^s (?<message1>[^\\n]+)(\\nf (?<message2>[^\\n]+))?(\\nf (?<message3>.*))?/
format_firstline /^[s]/
multiline_flush_interval 2s
read_from_head true
from_encoding cp932
encoding utf-8
]

d.run do
sleep 1
File.open("#{TMP_DIR}/tail.txt", "w:cp932") { |f|
f.puts "s \x82\xCD\x82\xEB\x81\x5B\x82\xED\x81\x5B\x82\xE9\x82\xC7".force_encoding(Encoding::CP932)
}

sleep 4
emits = d.emits
assert_equal(1, emits.length)
assert_equal("\x82\xCD\x82\xEB\x81\x5B\x82\xED\x81\x5B\x82\xE9\x82\xC7".force_encoding(Encoding::CP932).encode(Encoding::UTF_8), emits[0][2]['message1'])
assert_equal(Encoding::UTF_8, emits[0][2]['message1'].encoding)
end
end

def test_multiline_with_multiple_formats
File.open("#{TMP_DIR}/tail.txt", "wb") { |f| }

Expand Down