Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add from_encoding parameter to in_tail plugin #1067

Merged

Conversation

footaku
Copy link
Contributor

@footaku footaku commented Jun 23, 2016

Encoding parameter was added in_tail on v.0.12.24 and I used it.
When I reading logs encoded in CP932+LF, fluentd output warn log like a below.

2016-06-01 05:14:40 +0000 [warn]: ... error="invalid byte sequence in UTF-8"
<source>
  @type tail
  path /path/to/log/foo.log
  pos_file /path/to/log/foo.log.pos
  tag foo.var
  encoding utf-8
  format /^ something to read $/
</source>
<match **>
  @type stdout
</match>

footaku added 2 commits June 23, 2016 21:53
Add configuration check new param "from_encoding",
and test convert from input source is "Hello world" in japanese Hiragana
that is encoded in cp932 to UTF-8.
Add new param "from_encoding".
If "encoding" param is only specified, process is same way as ever
to keep backword compatibility.
If two params, "encoding" and "from_encoding" are specified, process uses
```String.encode!(to, from)```.
@footaku footaku changed the title [WIP] Add from_encoding parameter to in_tail plugin Add from_encoding parameter to in_tail plugin Jun 26, 2016
@@ -64,14 +64,10 @@ def initialize
config_param :multiline_flush_interval, :time, default: nil
desc 'Enable the additional watch timer.'
config_param :enable_watch_timer, :bool, default: true
desc 'The encoding after conversion of the input.'
config_param :encoding, default: nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify type of this parameter.

@footaku
Copy link
Contributor Author

footaku commented Jun 27, 2016

@tagomoris
Thank you for reviewing. I would like to fix them.

end
else
if @from_encoding
$log.warn "'from_encoding' parameter must be specified in conjunction with 'encoding' paramter."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use log instead of $log to enable plugin logger (log_level for each plugin sections).
And I think it's ok to say just must be specified with "encoding" parameter.

Use log instead of $log.
Change the log messages so that the users are easy to understand.
Return as soon as possible, if encoding parameters are not specified.
@tagomoris
Copy link
Member

LGTM!

@tagomoris
Copy link
Member

@repeatedly Please check just once more.

@repeatedly repeatedly self-assigned this Aug 3, 2016
unless @encoding
if @from_encoding
log.warn "'from_encoding' parameter must be specified with 'encoding' parameter."
log.warn "'from_encoding' is ignored"
Copy link
Member

@repeatedly repeatedly Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log is confusing.
If from_encoding must be specified with encoding, we should raise an ConfigError.

In addition, no need two line logs. 2nd line should be merged into 1st.

footaku added 2 commits August 4, 2016 06:24
I deleted 2nd log message.
Changed to raise error if 'encoding' and 'from_encoding' paramters are bad cofiguration.
Corresponding to ConfigError
@@ -251,7 +267,11 @@ def close_watcher_after_rotate_wait(tw)
def flush_buffer(tw)
if lb = tw.line_buffer
lb.chomp!
lb.force_encoding(@encoding) if @encoding
if @encoding && @from_encoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is microbenchmark result. I think default should be fast. So f2 implementation is better for almost users.

% ruby b_enc_perf.rb
Rehearsal --------------------------------------------------------
f1 with nil / nil      0.840000   0.000000   0.840000 (  0.842853)
f2 with nil / nil      0.760000   0.000000   0.760000 (  0.769146)
f1 with true / true    0.870000   0.000000   0.870000 (  0.871896)
f2 with true / true    0.920000   0.000000   0.920000 (  0.917469)
f1 with true / false   0.990000   0.000000   0.990000 (  0.989552)
f2 with true / false   0.920000   0.010000   0.930000 (  0.926458)
----------------------------------------------- total: 5.310000sec

                           user     system      total        real
f1 with nil / nil      0.890000   0.000000   0.890000 (  0.890333)
f2 with nil / nil      0.820000   0.000000   0.820000 (  0.822715)
f1 with true / true    0.890000   0.000000   0.890000 (  0.885648)
f2 with true / true    0.940000   0.000000   0.940000 (  0.938831)
f1 with true / false   1.000000   0.000000   1.000000 (  1.003493)
f2 with true / false   0.950000   0.000000   0.950000 (  0.956331)

Code:

require 'benchmark'

class Foo
  def initialize(enc, from_enc)
    @enc = enc
    @from_enc = from_enc
  end

  def f1(num)
    num.times {
      if @enc && @from_enc
        1
      elsif @enc
        2
      end
    }
  end

  def f2(num)
    num.times {
      if @enc
        if @from_enc
          1
        else
          2
        end
      end
    }
  end
end

n = 20000000

Benchmark.bmbm do |x|
  x.report('f1 with nil / nil') {
    foo = Foo.new(nil, nil)
    foo.f1(n)
  }
  x.report('f2 with nil / nil') {
    foo = Foo.new(nil, nil)
    foo.f2(n)
  }

  x.report('f1 with true / true') {
    foo = Foo.new(true, true)
    foo.f1(n)
  }
  x.report('f2 with true / true') {
    foo = Foo.new(true, true)
    foo.f2(n)
  }

  x.report('f1 with true / false') {
    foo = Foo.new(true, false)
    foo.f1(n)
  }
  x.report('f2 with true / false') {
    foo = Foo.new(true, false)
    foo.f2(n)
  }
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for microbenchmark result!

For almost users, default setting should be faster than
the setting is specified encoding.
@tagomoris tagomoris merged commit 579fbf4 into fluent:master Aug 18, 2016
@tagomoris
Copy link
Member

Merged. Thank you for contribution!

ganmacs pushed a commit to ganmacs/fluentd that referenced this pull request Aug 31, 2016
* Add test code to check from_encoding param

Add configuration check new param "from_encoding",
and test convert from input source is "Hello world" in japanese Hiragana
that is encoded in cp932 to UTF-8.

* Add from_encoding param to in_tail plugin

Add new param "from_encoding".
If "encoding" param is only specified, process is same way as ever
to keep backword compatibility.
If two params, "encoding" and "from_encoding" are specified, process uses
```String.encode!(to, from)```.

* Specify type of encoding and from_encoding

* Fix configuration encoding and from_encoding

* Fix test code

Add test pattern from_encoding is only specified.

* Fix configure_encoding

Use log instead of $log.
Change the log messages so that the users are easy to understand.
Return as soon as possible, if encoding parameters are not specified.

* Use ConfigError instead of warnning-log

I deleted 2nd log message.
Changed to raise error if 'encoding' and 'from_encoding' paramters are bad cofiguration.

* Fix test code

Corresponding to ConfigError

* Change implementation of encode statement

For almost users, default setting should be faster than
the setting is specified encoding.
repeatedly pushed a commit that referenced this pull request Sep 5, 2016
* Add test code to check from_encoding param

Add configuration check new param "from_encoding",
and test convert from input source is "Hello world" in japanese Hiragana
that is encoded in cp932 to UTF-8.

* Add from_encoding param to in_tail plugin

Add new param "from_encoding".
If "encoding" param is only specified, process is same way as ever
to keep backword compatibility.
If two params, "encoding" and "from_encoding" are specified, process uses
```String.encode!(to, from)```.

* Specify type of encoding and from_encoding

* Fix configuration encoding and from_encoding

* Fix test code

Add test pattern from_encoding is only specified.

* Fix configure_encoding

Use log instead of $log.
Change the log messages so that the users are easy to understand.
Return as soon as possible, if encoding parameters are not specified.

* Use ConfigError instead of warnning-log

I deleted 2nd log message.
Changed to raise error if 'encoding' and 'from_encoding' paramters are bad cofiguration.

* Fix test code

Corresponding to ConfigError

* Change implementation of encode statement

For almost users, default setting should be faster than
the setting is specified encoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants