Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for RFC5424 parsing #2816

Merged
merged 4 commits into from
Feb 12, 2020
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions lib/fluent/plugin/parser_syslog.rb
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,11 @@ class SyslogParser < Parser
REGEXP = /^(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
# From in_syslog default pattern
REGEXP_WITH_PRI = /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
REGEXP_RFC5424 = /\A(?<time>[^ ]+) (?<host>[!-~]{1,255}) (?<ident>[!-~]{1,48}) (?<pid>[!-~]{1,128}) (?<msgid>[!-~]{1,32}) (?<extradata>(?:\-|\[(.*)\]))(?: (?<message>.+))?\z/m
REGEXP_RFC5424_WITH_PRI = /\A\<(?<pri>[0-9]{1,3})\>[1-9]\d{0,2} (?<time>[^ ]+) (?<host>[!-~]{1,255}) (?<ident>[!-~]{1,48}) (?<pid>[!-~]{1,128}) (?<msgid>[!-~]{1,32}) (?<extradata>(?:\-|\[(.*)\]))(?: (?<message>.+))?\z/m
REGEXP_RFC5424 = <<~'EOS'.chomp
(?<time>[^ ]+) (?<host>[!-~]{1,255}) (?<ident>[!-~]{1,48}) (?<pid>[!-~]{1,128}) (?<msgid>[!-~]{1,32}) (?<extradata>(?:\-|(?:\[.*?(?<!\\)\])+))(?: (?<message>.+))?
EOS
REGEXP_RFC5424_NO_PRI = Regexp.new('\\A' + REGEXP_RFC5424 + '\\z', Regexp::MULTILINE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\ for escaping seems not to be necessary since p Regexp.new('\\A', Regexp::MULTILINE) == Regexp.new('\A', Regexp::MULTILINE) reutrns true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They work the same. I preferring always escape it for readability, so people who come from other languages (like me) can distinguish string literal from control chars easily. What do you think of it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's IMO but I prefer the one without escape because the current value is so.
I can't think that (?:\\-|(?:\\[.*?(?<!\\\\)\\])+)) is readeable...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I tried to make it more readable by using Heredoc :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to change (?<extradata>(?:\-|(?:\[.*?(?<!\\)\])+)) ?
it makes sense to me for adding + after \]). but I can't see the point of adding the part of ?(?<!\\).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ applies for the whole group, because the message can carry mutiple SD-ELEMENT .

(?<!...) is a negative lookbehind to not match \] as an end of SD-ELEMENT since ] is allowed in SD PARAM-VALUE by escaping with a \ prefix

REGEXP_RFC5424_WITH_PRI = Regexp.new('\\A<(?<pri>[0-9]{1,3})\\>[1-9]\\d{0,2} ' + REGEXP_RFC5424 + '\\z', Regexp::MULTILINE)
ZigZagT marked this conversation as resolved.
Show resolved Hide resolved
REGEXP_DETECT_RFC5424 = /^\<.*\>[1-9]\d{0,2}/

config_set_default :time_format, "%b %d %H:%M:%S"
Expand Down Expand Up @@ -73,7 +76,7 @@ class << self
end
@time_format = @rfc5424_time_format unless conf.has_key?('time_format')
@support_rfc5424_without_subseconds = true
@with_priority ? REGEXP_RFC5424_WITH_PRI : REGEXP_RFC5424
@with_priority ? REGEXP_RFC5424_WITH_PRI : REGEXP_RFC5424_NO_PRI
when :auto
class << self
alias_method :parse, :parse_auto
Expand All @@ -96,7 +99,7 @@ def parse(text)

def parse_auto(text, &block)
if REGEXP_DETECT_RFC5424.match(text)
@regexp = @with_priority ? REGEXP_RFC5424_WITH_PRI : REGEXP_RFC5424
@regexp = @with_priority ? REGEXP_RFC5424_WITH_PRI : REGEXP_RFC5424_NO_PRI
@time_parser = @time_parser_rfc5424
@support_rfc5424_without_subseconds = true
parse_plain(text, &block)
Expand Down
12 changes: 6 additions & 6 deletions test/plugin/test_parser_syslog.rb
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ def test_parse_with_rfc5424_message_and_without_priority
assert_equal "-", record["extradata"]
assert_equal "Hi, from Fluentd!", record["message"]
end
assert_equal(Fluent::Plugin::SyslogParser::REGEXP_RFC5424,
assert_equal(Fluent::Plugin::SyslogParser::REGEXP_RFC5424_NO_PRI,
@parser.instance.patterns['format'])
end

Expand All @@ -254,7 +254,7 @@ def test_parse_with_rfc5424_empty_message_and_without_priority
assert_equal "-", record["extradata"]
assert_nil record["message"]
end
assert_equal(Fluent::Plugin::SyslogParser::REGEXP_RFC5424,
assert_equal(Fluent::Plugin::SyslogParser::REGEXP_RFC5424_NO_PRI,
@parser.instance.patterns['format'])
end

Expand Down Expand Up @@ -294,14 +294,14 @@ def test_parse_with_rfc5424_structured_message
'message_format' => 'rfc5424',
'with_priority' => true,
)
text = '<16>1 2017-02-06T13:14:15.003Z 192.168.0.1 fluentd 11111 ID24224 [exampleSDID@20224 iut="3" eventSource="Application" eventID="11211"] Hi, from Fluentd!'
text = '<16>1 2017-02-06T13:14:15.003Z 192.168.0.1 fluentd 11111 ID24224 [exampleSDID@20224 iut="3" eventSource="Application" eventID="11211"] [Hi] from Fluentd!'
ZigZagT marked this conversation as resolved.
Show resolved Hide resolved
@parser.instance.parse(text) do |time, record|
assert_equal(event_time("2017-02-06T13:14:15.003Z", format: '%Y-%m-%dT%H:%M:%S.%L%z'), time)
assert_equal "11111", record["pid"]
assert_equal "ID24224", record["msgid"]
assert_equal "[exampleSDID@20224 iut=\"3\" eventSource=\"Application\" eventID=\"11211\"]",
record["extradata"]
assert_equal "Hi, from Fluentd!", record["message"]
assert_equal "[Hi] from Fluentd!", record["message"]
end
end

Expand All @@ -328,14 +328,14 @@ def test_parse_with_rfc5424_message_includes_right_bracket
'message_format' => 'rfc5424',
'with_priority' => true,
)
text = '<16>1 2017-02-06T13:14:15.003Z 192.168.0.1 fluentd 11111 ID24224 [exampleSDID@20224 iut="3" eventSource="Application" eventID="11211"] Hi, from Fluentd]!'
text = '<16>1 2017-02-06T13:14:15.003Z 192.168.0.1 fluentd 11111 ID24224 [exampleSDID@20224 iut="3" eventSource="Application" eventID="11211"] [Hi] from Fluentd]!'
@parser.instance.parse(text) do |time, record|
assert_equal(event_time("2017-02-06T13:14:15.003Z", format: '%Y-%m-%dT%H:%M:%S.%L%z'), time)
assert_equal "11111", record["pid"]
assert_equal "ID24224", record["msgid"]
assert_equal "[exampleSDID@20224 iut=\"3\" eventSource=\"Application\" eventID=\"11211\"]",
record["extradata"]
assert_equal "Hi, from Fluentd]!", record["message"]
assert_equal "[Hi] from Fluentd]!", record["message"]
end
end

Expand Down