-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement new parser for parser_syslog #2599
Conversation
39ec522
to
ede1ff1
Compare
Signed-off-by: Masahiro Nakagawa <[email protected]>
ede1ff1
to
68f3ec7
Compare
Signed-off-by: Masahiro Nakagawa <[email protected]>
if @with_priority | ||
if text.start_with?(PRI_START_CHAR) | ||
i = text.index('>'.freeze, 1) | ||
pri = text.slice(1, i - 1).to_i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pri
can be 0
if i < 2
or text.slice(1, i - 1)
contains not number charactors. Is it acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://tools.ietf.org/html/rfc3164#section-4.1.1
The PRI part MUST have three, four, or five characters
Need it check i
is less than 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because the purpose of this parser is for supporting more format, not strict parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so because the purpose of this parser is for supporting more format, not strict parser.
What kind of format do we want to support?
If this parser does not follow the rfc3164, it's probably better to use new message_format
name (rfc3164_ext or like that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this parser does not follow the rfc3164, it's probably better to use new message_format name (rfc3164_ext or like that).
The problem is existing products uses rfc3164
for it. rfc3164 describes the collection of existing message format and BSD spec but many existing tools doesn't follow rfc3164 strictly. This parser can support strict rfc3164 format, so it doesn't block user's usecase for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pri can be 0 if i < 2 or text.slice(1, i - 1) contains not number charactors. Is it acceptable?
Is it okay?
if time_end == SPLIT_CHAR | ||
time_str = text.slice(start, diff) | ||
start += 16 # time + ' ' | ||
elsif time_end == '.'.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rfc3164 seems not to support subsecond time https://tools.ietf.org/html/rfc3164#section-4.1.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but some products send rfc3164 syslog message with subsecond time.
regexp version also supports subsecond time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then, why don't you change the method name? this is not parse_rfc3164
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment it on above.
lib/fluent/plugin/parser_syslog.rb
Outdated
end | ||
text.slice(i + 1, text.bytesize) | ||
else | ||
text.slice(i - diff, text.bytesize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, ident(TAG in rfc) does not exist, does message follow the rfc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but some product sends such message. And this is good for capturing all MSG broken syslog messages.
lib/fluent/plugin/parser_syslog.rb
Outdated
msg.chomp! | ||
record['message'] = msg | ||
|
||
time = @time_parser.parse(time_str.squeeze(SPLIT_CHAR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time_str was sliced with size(15), right? why do we need to call String#squeeze?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember the actual product name but users hit the problem with the number of ' '.
regexp version also calls squeeze
for avoiding this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value.gsub(/ +/, "")
is introduced by #68 ce1c410
But we don't need using String#squeeze
to parse time string including redundant spaces.
Now we can remove String#squeeze
here.
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 05 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> strptime = Strptime.new("%b %d %H:%M:%S")
=> #<Strptime:0x000055e8adc01110>
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 05 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 05 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing regex does not specify the size of chars. but here specifies the fixed size(15).
So if the character which needs to call squeeze
is passed to this filter plugin, the parser will fail to parse it in any case, right?.
lib/fluent/plugin/parser_syslog.rb
Outdated
host = text.slice(start, diff) | ||
start += (diff + 1) | ||
|
||
i = text.index(SPLIT_CHAR, start) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the text is ${TIMESTAMP} ${HOST} test:val
, this parser won't work. and I think test:val
is valid syslog message.
ident
should be test
and msg
should be val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very difficult problem because rfc3164 doesn't define how ident and message are separated. For the experience, here is an actual examples:
HEADER ident msg # No problem
HEADER ident: msg # No problem
HEADER ident:msg # Existing products send this as only msg, not ident:msg.
So to support both cases, adding option is better.
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
lib/fluent/plugin/parser_syslog.rb
Outdated
end | ||
|
||
# header part | ||
time_diff = 15 # skip Mmm dd hh:mm:ss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about time_width
or time_size
or time_str_size
?
lib/fluent/plugin/parser_syslog.rb
Outdated
cursor += 16 # time + ' ' | ||
elsif time_end == '.'.freeze | ||
# support subsecond time | ||
i = text.index(SPLIT_CHAR, time_diff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about time_end_pos
or something instead of i
?
return | ||
end | ||
|
||
i = text.index(SPLIT_CHAR, cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about host_end_pos
instead of i
?
lib/fluent/plugin/parser_syslog.rb
Outdated
yield nil, nil | ||
return | ||
end | ||
host_diff = i - cursor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about host_width
or host_size
?
lib/fluent/plugin/parser_syslog.rb
Outdated
end | ||
host_diff = i - cursor | ||
host = text.slice(cursor, host_diff) | ||
cursor += (host_diff + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant brackets?
lib/fluent/plugin/parser_syslog.rb
Outdated
else | ||
if text[i - 1] == ':'.freeze | ||
if text[i - 2] == ']'.freeze | ||
j = text.index('['.freeze, cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about left_bracket_pos
instead of j
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why .freeze
for ":", "[", "]"
? Shoud we extract them to constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can extract it. This is used at only one point so I use literal instead.
lib/fluent/plugin/parser_syslog.rb
Outdated
if text[i - 1] == ':'.freeze | ||
if text[i - 2] == ']'.freeze | ||
j = text.index('['.freeze, cursor) | ||
record['ident'] = text.slice(cursor, j - cursor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a local value ident_size = j - cursor
?
lib/fluent/plugin/parser_syslog.rb
Outdated
if text[i - 2] == ']'.freeze | ||
j = text.index('['.freeze, cursor) | ||
record['ident'] = text.slice(cursor, j - cursor) | ||
record['pid'] = text.slice(j + 1, i - j - 3) # remove '[' / ']:' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding local values pid_start_pos = j + 1
and pid_size = i - j - 3
?
j
may be changed to left_bracket_pos
.
record['ident'] = text.slice(cursor, j - cursor) | ||
record['pid'] = text.slice(j + 1, i - j - 3) # remove '[' / ']:' | ||
else | ||
record['ident'] = text.slice(cursor, i - cursor - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
lib/fluent/plugin/parser_syslog.rb
Outdated
msg.chomp! | ||
record['message'] = msg | ||
|
||
time = @time_parser.parse(time_str.squeeze(SPLIT_CHAR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value.gsub(/ +/, "")
is introduced by #68 ce1c410
But we don't need using String#squeeze
to parse time string including redundant spaces.
Now we can remove String#squeeze
here.
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 5 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> Time.strptime("Feb 05 11:11:11", "%b %d %H:%M:%S")
=> 2019-02-05 11:11:11 +0900
>> strptime = Strptime.new("%b %d %H:%M:%S")
=> #<Strptime:0x000055e8adc01110>
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 05 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 05 11:11:11")
=> 2019-02-05 11:11:11 +0900
>> strptime.exec("Feb 5 11:11:11")
=> 2019-02-05 11:11:11 +0900
@parser.configure('parser_type' => 'string') | ||
@parser.instance.parse("1990 Oct 22 10:52:01 TZ-6 scapegoat.dmz.example.org 10.1.2.32 sched[0]: That's All Folks!") { |time, record| | ||
expected = {'host' => 'scapegoat.dmz.example.org', 'ident' => 'sched', 'pid' => '0', 'message' => "That's All Folks!"} | ||
assert_not_equal(expected, record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the result of parsing it? I think assert_equal
should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actual generated record is not important for this case.
This test shows parser can't parse non-standard HEADER/MSG syslog message correctly, so assert_not_equal
seems no problem for me.
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
Signed-off-by: Masahiro Nakagawa <[email protected]>
If no other concerns, I will merge this patch. |
Signed-off-by: Masahiro Nakagawa [email protected]
Which issue(s) this PR fixes:
Fixes #2585
What this PR does / why we need it:
Changing regexp is difficult because it may break existing user environment, so
add non-regexp based parser to support more syslog message.
This parser is also 2x faster.
Docs Changes:
Add new parameters and log example.
Release Note:
Same as title.