Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FluentD syslog parser problem #2856

Closed
radekgs opened this issue Mar 3, 2020 · 5 comments · Fixed by #2886
Closed

FluentD syslog parser problem #2856

radekgs opened this issue Mar 3, 2020 · 5 comments · Fixed by #2886
Assignees
Labels
bug Something isn't working

Comments

@radekgs
Copy link

radekgs commented Mar 3, 2020

Describe the bug
The syslog parser is including the hostname and ident as part of the time field

To Reproduce
We are sending syslog messages in rfc3164 format but with timestamps in "%Y-%m-%dT%H:%M:%SZ" format eg:

<14>2020-02-25T10:37:41Z podname-v1-5cc4869f45-sd7fr servicename[1089]: {\"level\":\"info\",\"msg\":\"http: TLS handshake error from 127.0.0.1:59100: tls: failed to verify client's certificate: x509: certificate signed by unknown authority\\n\",\"Version\":\"0.0.1\",\"Caller File\":\"/usr/local/go/src/log/log.go\",\"Caller Line\":172}

The logs as they appear in stdout include the hostname and ident as part of the date field. The syslog parser does not appear to be parsing the date correctly. eg:
logs we are seeing:

2020-03-03 10:14:29.000000000 +0000 syslog.user.info: {"time":"2020-03-03T10:14:29Z podname-v1-5cc4869f45-kqf2g servicename[260]:","host":"{\"level\":\"info\",\"msg\":\"http:","ident":"TLS","message":"41282: tls: failed to verify client's certificate: x509: certificate signed by unknown authority\\n\",\"Version\":\"0.0.1\",\"Caller File\":\"/usr/local/go/src/log/log.go\",\"Caller Line\":172}"}

--

Expected behavior
The delimiter for the end of the timestamp field should be recognised and the timestamp/host and ident fields should be parsed seperately.

Your Environment
Runing in kubernetes pods

fluentd-1.3.3 from gcr.io/google-containers/fluentd-elasticsearch:v2.4.0 image which is the latest version available

Your Configuration

<match fluent.**>
    @type null
  </match>
  <source>
    @type syslog
    port 10514
    bind "0.0.0.0"
    tag "syslog"
    <parse>
      @type "syslog"
      keep_time_key true
      with_priority true
      message_format rfc3164
      time_format "%Y-%m-%dT%H:%M:%SZ"
    </parse>
  </source>
  <source>
    @type syslog
    port 24224
    bind "0.0.0.0"
    protocol_type tcp
    tag "syslog"
    <parse>
      @type "syslog"
      keep_time_key true
      with_priority true
      message_format rfc3164
      time_format "%Y-%m-%dT%H:%M:%SZ"
    </parse>
  </source>
  <match **>
    @type stdout
    <buffer>
      @type "memory"
      flush_interval 1s
    </buffer>
  </match>
  <match syslog.**>
    @type elasticsearch
    log_level info
    include_tag_key true
    host "elasticsearch"
    port 9200
    logstash_format true
    buffer_chunk_limit 2M
    buffer_queue_limit 8
    flush_interval 5s
    max_retry_wait 30
    disable_retry_limit
    num_threads 2
    <buffer>
      flush_thread_count 2
      flush_interval 5s
      retry_forever
      retry_max_interval 30
      chunk_limit_size 2M
      queue_limit_length 8
    </buffer>
  </match>

Your Error Log

$ kubectl logs -n logging fluentd-es-6f9d5c58dc-twxwz
2020-03-03 10:14:28 +0000 [info]: starting fluentd-1.3.3 without supervision pid=1 ruby="2.3.3"
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-concat' version '2.3.0'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.11'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.0.2'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.1.6'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2020-03-03 10:14:28 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.1'
2020-03-03 10:14:28 +0000 [info]: gem 'fluentd' version '1.3.3'
2020-03-03 10:14:28 +0000 [info]: adding match pattern="fluent.**" type="null"
2020-03-03 10:14:28 +0000 [info]: adding match pattern="**" type="stdout"
2020-03-03 10:14:28 +0000 [info]: adding match pattern="syslog.**" type="elasticsearch"
2020-03-03 10:14:28 +0000 [warn]: 'log_level' is deprecated parameter name. use '@log_level' instead.
2020-03-03 10:14:28 +0000 [warn]: 'log_level' is deprecated parameter name. use '@log_level' instead.
2020-03-03 10:14:28 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2020-03-03 10:14:28 +0000 [warn]: Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5.
2020-03-03 10:14:28 +0000 [info]: adding source type="syslog"
2020-03-03 10:14:28 +0000 [info]: adding source type="syslog"
2020-03-03 10:14:28 +0000 [info]: using configuration file: <ROOT>
  <match fluent.**>
    @type null
  </match>
  <source>
    @type syslog
    port 10514
    bind "0.0.0.0"
    tag "syslog"
    <parse>
      @type "syslog"
      keep_time_key true
      with_priority true
      message_format rfc3164
      time_format "%Y-%m-%dT%H:%M:%SZ"
    </parse>
  </source>
  <source>
    @type syslog
    port 24224
    bind "0.0.0.0"
    protocol_type tcp
    tag "syslog"
    <parse>
      @type "syslog"
      keep_time_key true
      with_priority true
      message_format rfc3164
      time_format "%Y-%m-%dT%H:%M:%SZ"
    </parse>
  </source>
  <match **>
    @type stdout
    <buffer>
      @type "memory"
      flush_interval 1s
    </buffer>
  </match>
  <match syslog.**>
    @type elasticsearch
    log_level info
    include_tag_key true
    host "elasticsearch"
    port 9200
    logstash_format true
    buffer_chunk_limit 2M
    buffer_queue_limit 8
    flush_interval 5s
    max_retry_wait 30
    disable_retry_limit
    num_threads 2
    <buffer>
      flush_thread_count 2
      flush_interval 5s
      retry_forever
      retry_max_interval 30
      chunk_limit_size 2M
      queue_limit_length 8
    </buffer>
  </match>
</ROOT>
2020-03-03 10:14:28 +0000 [info]: starting fluentd worker pid=1 ppid=0 worker=0
2020-03-03 10:14:28 +0000 [info]: listening syslog socket on 0.0.0.0:24224 with tcp
2020-03-03 10:14:28 +0000 [info]: listening syslog socket on 0.0.0.0:10514 with udp
2020-03-03 10:14:28 +0000 [info]: fluentd worker is now running worker=0
2020-03-03 10:14:29.000000000 +0000 syslog.user.info: {"time":"2020-03-03T10:14:29Z podname-v1-5cc4869f45-kqf2g servicename[260]:","host":"{\"level\":\"info\",\"msg\":\"http:","ident":"TLS","message":"41282: tls: failed to verify client's certificate: x509: certificate signed by unknown authority\\n\",\"Version\":\"0.0.1\",\"Caller File\":\"/usr/local/go/src/log/log.go\",\"Caller Line\":172}"}

Additional context

@repeatedly
Copy link
Member

The problem is your log doesn't follow RFC3164 timestamp.
See: https://tools.ietf.org/html/rfc3164#section-4.1.2
Built-in rfc3164 parser assumes Mmm dd hh:mm:ss format to detect hostname/ident fields. So if timestamp doesn't follow this format, built-in parser doesn't work.

If you want to parse non-standard format, you need to use regexp parser or write parser for you non-standard format.

@radekgs
Copy link
Author

radekgs commented Mar 4, 2020

Shouldn't the time_format parameter work in this case (https://docs.fluentd.org/parser/syslog#time_format) ?

I have it set to time_format "%Y-%m-%dT%H:%M:%SZ" but it doesnt seem to recognise the end of the date in each log message

@ganmacs
Copy link
Member

ganmacs commented Mar 4, 2020

It's a bug of fluentd. it doesn't work time_format and rfc5424_time_format because fluentd use this fixed regex

REGEXP = /^(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
to match input texts.

Fluentd uses the regex first to match and time_format is applied to matched texts.

m = @regexp.match(text)

@ganmacs ganmacs added the bug Something isn't working label Mar 4, 2020
@radekgs
Copy link
Author

radekgs commented Mar 10, 2020

I have updated my config to use the regexp parser but now I get "failed to parse message data" error messages.

I have tried the regexp in fluentuar and it worked as expected.

The full error log is:

2020-03-10 07:32:59 +0000 [warn]: failed to parse message data="<14>2020-03-10T07:32:59Z podname-v1-5cc4869f45-kqf2g servicename[260]: {\"level\":\"info\",\"msg\":\"http: TLS handshake error from 127.0.0.1:41060: tls: failed to verify client's certificate: x509: certificate signed by unknown authority\\n\",\"Version\":\"0.0.1\",\"Caller File\":\"/usr/local/go/src/log/log.go\",\"Caller Line\":172}"

The updated config file is:

2020-03-10 07:31:37 +0000 [info]: starting fluentd-1.3.3 without supervision pid=1 ruby="2.3.3"
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-concat' version '2.3.0'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-detect-exceptions' version '0.0.11'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '3.0.2'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '2.1.6'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-multi-format-parser' version '1.0.0'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2020-03-10 07:31:37 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.1'
2020-03-10 07:31:37 +0000 [info]: gem 'fluentd' version '1.3.3'
2020-03-10 07:31:37 +0000 [info]: adding match pattern="fluent.**" type="null"
2020-03-10 07:31:37 +0000 [info]: adding match pattern="**" type="stdout"
2020-03-10 07:31:37 +0000 [info]: adding match pattern="syslog.**" type="elasticsearch"
2020-03-10 07:31:37 +0000 [warn]: 'log_level' is deprecated parameter name. use '@log_level' instead.
2020-03-10 07:31:37 +0000 [warn]: 'log_level' is deprecated parameter name. use '@log_level' instead.
2020-03-10 07:31:37 +0000 [info]: 'flush_interval' is configured at out side of <buffer>. 'flush_mode' is set to 'interval' to keep existing behaviour
2020-03-10 07:31:37 +0000 [warn]: Could not connect Elasticsearch or obtain version. Assuming Elasticsearch 5.
2020-03-10 07:31:37 +0000 [info]: adding source type="syslog"
2020-03-10 07:31:37 +0000 [info]: adding source type="syslog"
2020-03-10 07:31:37 +0000 [info]: using configuration file: <ROOT>
  <match fluent.**>
    @type null
  </match>
  <source>
    @type syslog
    port 10514
    bind "0.0.0.0"
    tag "syslog"
    <parse>
      @type "regexp"
      expression /^\<(?<pri>[0-9]+)\>(?<time>[^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
      time_format "%Y-%m-%dT%H:%M:%S"
    </parse>
  </source>
  <source>
    @type syslog
    port 24224
    bind "0.0.0.0"
    protocol_type tcp
    tag "syslog"
    <parse>
      @type "regexp"
      expression /^\<(?<pri>[0-9]+)\>(?<time>[^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
      time_format "%Y-%m-%dT%H:%M:%S"
    </parse>
  </source>
  <match **>
    @type stdout
    <buffer>
      @type "memory"
      flush_interval 1s
    </buffer>
  </match>
  <match syslog.**>
    @type elasticsearch
    log_level info
    include_tag_key true
    host "elasticsearch"
    port 9200
    logstash_format true
    buffer_chunk_limit 2M
    buffer_queue_limit 8
    flush_interval 5s
    max_retry_wait 30
    disable_retry_limit
    num_threads 2
    <buffer>
      flush_thread_count 2
      flush_interval 5s
      retry_forever
      retry_max_interval 30
      chunk_limit_size 2M
      queue_limit_length 8
    </buffer>
  </match>
</ROOT>
2020-03-10 07:31:37 +0000 [info]: starting fluentd worker pid=1 ppid=0 worker=0
2020-03-10 07:31:37 +0000 [info]: listening syslog socket on 0.0.0.0:24224 with tcp
2020-03-10 07:31:37 +0000 [info]: listening syslog socket on 0.0.0.0:10514 with udp
2020-03-10 07:31:37 +0000 [info]: fluentd worker is now running worker=0

@radekgs
Copy link
Author

radekgs commented Mar 10, 2020

ok it turns out the problem was caused by having the priority capture group in the regexp, according to the documentation the syslog input plugin strips the priority from the message before sending it to the parser.

Changing the regexp to the following solved the problem:

<parse>
        @type regexp
        expression /^(?<time>[^ ]*) (?<host>[^ ]*) (?<ident>[^ :\[]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
        time_format %Y-%m-%dT%H:%M:%S
</parse>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants