Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new parser for parser_syslog #2599

Merged
merged 8 commits into from
Sep 9, 2019
109 changes: 106 additions & 3 deletions lib/fluent/plugin/parser_syslog.rb
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ class SyslogParser < Parser
config_param :message_format, :enum, list: [:rfc3164, :rfc5424, :auto], default: :rfc3164
desc 'Specify time format for event time for rfc5424 protocol'
config_param :rfc5424_time_format, :string, default: "%Y-%m-%dT%H:%M:%S.%L%z"
desc 'The parser type used to parse syslog message'
config_param :parser_type, :enum, list: [:regexp, :string], default: :regexp
ganmacs marked this conversation as resolved.
Show resolved Hide resolved
desc 'support colonless ident in string parser'
config_param :support_colonless_ident, :bool, default: true

def initialize
super
Expand All @@ -50,10 +54,17 @@ def configure(conf)
@time_parser_rfc3164 = @time_parser_rfc5424 = nil
@time_parser_rfc5424_without_subseconds = nil
@support_rfc5424_without_subseconds = false
@regexp_parser = @parser_type == :regexp
@regexp = case @message_format
when :rfc3164
class << self
alias_method :parse, :parse_plain
if @regexp_parser
class << self
alias_method :parse, :parse_plain
end
else
class << self
alias_method :parse, :parse_rfc3164
end
end
@with_priority ? REGEXP_WITH_PRI : REGEXP
when :rfc5424
Expand Down Expand Up @@ -88,11 +99,16 @@ def parse_auto(text, &block)
@regexp = @with_priority ? REGEXP_RFC5424_WITH_PRI : REGEXP_RFC5424
@time_parser = @time_parser_rfc5424
@support_rfc5424_without_subseconds = true
parse_plain(text, &block)
else
@regexp = @with_priority ? REGEXP_WITH_PRI : REGEXP
@time_parser = @time_parser_rfc3164
if @regexp_parser
parse_plain(text, &block)
else
parse_rfc3164(text, &block)
end
end
parse_plain(text, &block)
end

def parse_plain(text, &block)
Expand Down Expand Up @@ -137,6 +153,93 @@ def parse_plain(text, &block)

yield time, record
end

SPLIT_CHAR = ' '.freeze

def parse_rfc3164(text, &block)
pri = nil
cursor = 0
if @with_priority
if text.start_with?('<'.freeze)
i = text.index('>'.freeze, 1)
if i < 2
yield nil, nil
return
end
pri = text.slice(1, i - 1).to_i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pri can be 0 if i < 2 or text.slice(1, i - 1) contains not number charactors. Is it acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://tools.ietf.org/html/rfc3164#section-4.1.1

The PRI part MUST have three, four, or five characters

Need it check i is less than 5?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because the purpose of this parser is for supporting more format, not strict parser.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so because the purpose of this parser is for supporting more format, not strict parser.

What kind of format do we want to support?
If this parser does not follow the rfc3164, it's probably better to use new message_format name (rfc3164_ext or like that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this parser does not follow the rfc3164, it's probably better to use new message_format name (rfc3164_ext or like that).

The problem is existing products uses rfc3164 for it. rfc3164 describes the collection of existing message format and BSD spec but many existing tools doesn't follow rfc3164 strictly. This parser can support strict rfc3164 format, so it doesn't block user's usecase for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pri can be 0 if i < 2 or text.slice(1, i - 1) contains not number charactors. Is it acceptable?

Is it okay?

cursor = i + 1
else
yield nil, nil
return
end
end

# header part
time_size = 15 # skip Mmm dd hh:mm:ss
time_end = text[cursor + time_size]
if time_end == SPLIT_CHAR
time_str = text.slice(cursor, time_size)
cursor += 16 # time + ' '
elsif time_end == '.'.freeze
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rfc3164 seems not to support subsecond time https://tools.ietf.org/html/rfc3164#section-4.1.2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but some products send rfc3164 syslog message with subsecond time.
regexp version also supports subsecond time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, why don't you change the method name? this is not parse_rfc3164.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment it on above.

# support subsecond time
i = text.index(SPLIT_CHAR, time_size)
time_str = text.slice(cursor, i - cursor)
cursor = i + 1
else
yield nil, nil
return
end

i = text.index(SPLIT_CHAR, cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about host_end_pos instead of i?

if i.nil?
yield nil, nil
return
end
host_size = i - cursor
host = text.slice(cursor, host_size)
cursor += host_size + 1

record = {'host' => host}
ganmacs marked this conversation as resolved.
Show resolved Hide resolved
record['pri'] = pri if pri

i = text.index(SPLIT_CHAR, cursor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ident_and_pid_end instead of i?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value is sometimes not ident/pid end so i seems no problem.
i is popular name for current index.


# message part
msg = if i.nil? # for 'only non-space content case'
text.slice(cursor, text.bytesize)
else
if text[i - 1] == ':'.freeze
if text[i - 2] == ']'.freeze
left_braket_pos = text.index('['.freeze, cursor)
record['ident'] = text.slice(cursor, left_braket_pos - cursor)
record['pid'] = text.slice(left_braket_pos + 1, i - left_braket_pos - 3) # remove '[' / ']:'
else
record['ident'] = text.slice(cursor, i - cursor - 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

end
text.slice(i + 1, text.bytesize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

else
if @support_colonless_ident
if text[i - 1] == ']'.freeze
left_braket_pos = text.index('['.freeze, cursor)
record['ident'] = text.slice(cursor, left_braket_pos - cursor)
record['pid'] = text.slice(left_braket_pos + 1, i - left_braket_pos - 2) # remove '[' / ']'
else
record['ident'] = text.slice(cursor, i - cursor)
end
text.slice(i + 1, text.bytesize)
else
text.slice(cursor, text.bytesize)
end
end
end
msg.chomp!
ganmacs marked this conversation as resolved.
Show resolved Hide resolved
record['message'] = msg

time = @time_parser.parse(time_str)
record['time'] = time_str if @keep_time_key

yield time, record
end
end
end
end
Loading