Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support containerd log format #412

Open
byrnedo opened this issue Feb 23, 2020 · 34 comments
Open

Support containerd log format #412

byrnedo opened this issue Feb 23, 2020 · 34 comments
Assignees

Comments

@byrnedo
Copy link

byrnedo commented Feb 23, 2020

Hi, I'm running k3s using containerd instead of docker.
The log format is different to docker's.
AFAIK it would just involve changing the @type json to a regex for the container logs, see k3s-io/k3s#356 (comment)
Would anyone be up for doing this? With maybe some kind of env var to switch on the container d support, eg CONTAINER_RUNTIME=docker as default, with containerd as an alternative

@arthurdarcet
Copy link

you can add the env variable FLUENT_CONTAINER_TAIL_PARSER_TYPE with value /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ - it'll make this daemonset ok with the containerd logs

@strigazi
Copy link

The above regex worked for me! thanks!

Could we make it work for both containerd and docker without setting the type?

@faust64
Copy link

faust64 commented May 31, 2020

Hi,

It may look like it works, though having dealt with OpenShift a lot lately: you're missing something. Eventually, you'll see log messages being split into several records.

I've had to patch the /fluentd/etc/kubernetes.conf file.

We could indeed set FLUENT_CONTAINER_TAIL_PARSER_TYPE to /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.+)$/.

However we also need to add the following:

    <filter kubernetes.**>
      @type concat
      key log
      partial_key logtag
      partial_value P
      separator ""
    </filter>

Note that I'm setting a logtag field, from the F and P values, that @arthurdarcet drops with a [^ ]*.
We actually need those re-constructing multi-line message (P means you have a partial log, while F notes the last part of a message).

@vfolk30
Copy link

vfolk30 commented Aug 11, 2020

I have enabled rancher logging with fluentd for containerD , but still getting issue.Below our the enn variable i have pasted in daemon set
https://rancher.com/docs/rancher/v2.x/en/cluster-admin/tools/logging/

env:
- name: FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
value: /var/log/containers/fluentd*
- name: FLUENTD_SYSTEMD_CONF
value: disable
- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
value: /^(?.+) (?stdout|stderr) [^ ]* (?.*)$/

output:

2020-08-11 16:07:29 +0000 [warn]: #0 pattern not matched: "2020-08-11T18:07:28.606198265+02:00 stdout F 2020-08-11 16:07:28 +0000 [warn]: #0 pattern not matched: \"2020-08-11T18:07:27.620512318+02:00 stdout F 2020-08-11 16:07:27 +0000 [warn]: #0 pattern not matched: \\\"2020-08-11T18:07:26.541424158+02:00 stdout F 2020-08-11 16:07:26 +0000 [warn]: #0 pattern not matched: \\\\\\\"2020-08-11T18:07:25.531461018+02:00 stdout F 2020-08-11 16:07:25 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\"2020-08-11T18:07:24.528268248+02:00 stdout F 2020-08-11 16:07:24 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"2020-08-11T18:07:23.524149263+02:00 stdout F 2020-08-11 16:07:23 +0000 [warn]: #0 pattern not matched: \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"2020-08-11T18:07:23.187045754+02:00 stdout F 2020-08-11 16:07:23.186 [INFO][57] int_dataplane.go 976: Finished applying updates to dataplane. msecToApply=1.434144\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"\\\\\\\\\\\\\\\"\\\\\\\"\\\"\""

@arren-ru
Copy link

@arthurdarcet @faust64 How the regex string supposed to work in FLUENT_CONTAINER_TAIL_PARSER_TYPE if that variable translated to @type value in the configuration of parser?
kubernetes.conf inside container contains:

…
<source>
  @type tail
  …
  <parse>
    @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </parse>
</source>
…

Allowed @types https://docs.fluentd.org/configuration/parse-section#type

@DarkBlaez
Copy link

Maybe this should just be addressed with a flag. The issue has been present for such a long time and impacts other vendors that choose to spin value-added products around this. Word to the wise, docker is not the only front-end to containers and container evolution continues. Addressing this now external to the sloppy work arounds with regex or manipulation would be a good thing. Better to get in front of the issue than lag behind.

DB

@repeatedly
Copy link
Member

We can put additional plugin into plugins directory, e.g. https://github.com/fluent/fluentd-kubernetes-daemonset/tree/master/docker-image/v1.11/debian-elasticsearch7/plugins
So if anyone provides containerd log format parser, we can configure it via FLUENT_CONTAINER_TAIL_PARSER_TYPE.

@DarkBlaez
Copy link

DarkBlaez commented Aug 18, 2020

That would work as I am willing to write a custom parser for this to contribute and save others the same issues. Or re-phrased, that is perhaps the best option, an additional plugin for this specific use case

@faust64
Copy link

faust64 commented Aug 18, 2020

@arren-ru : you are right, my mistake. FLUENTD_CONTAINER_TAIL_PARSER_TYPE should be set to regexp, and then you'ld set anexpression, with your actual regexp.

Either way, that's not something you can currently configure only using environment variables.
You're looking for something like this: https://github.com/faust64/kube-magic/blob/master/custom/roles/logging/templates/fluentd.j2#L31-L51

@arren-ru
Copy link

@faust64 I solved this by overriding kubernetes.conf with configmap mounted in place of original configuration with modified content, this gives basic working solution

      <source>
        @type tail
        @id in_tail_container_logs
        path /var/log/containers/*.log
        pos_file /var/log/fluentd-containers.log.pos
        tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
        exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
        read_from_head true
        <parse>
          @type regexp
          expression /^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<flags>[^ ]+) (?<message>.*)$/
          time_format %Y-%m-%dT%H:%M:%S.%N%:z
        </parse>
      </source>

      <filter kubernetes.**>
        @type kubernetes_metadata
        @id filter_kube_metadata
        kubernetes_url "#{'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
      </filter>

@i300543
Copy link

i300543 commented Sep 23, 2020

The kind of solutions presented here will cause json log to be parsed as a string, and no fields defined in json itself will be recongnized as Elasticsearch fields correct ?

@arren-ru
Copy link

arren-ru commented Oct 6, 2020

The kind of solutions presented here will cause json log to be parsed as a string, and no fields defined in json itself will be recongnized as Elasticsearch fields correct ?

Not sure understood you, but CRI logs are represented as a string line, this is not a docker logs, so if you want to parse json further you may want to add pipelined parser or filter

@mickdewald
Copy link

I got an issue, that my logfile was filled with backslashes. I am using containerd instead of docker. I solved it by putting in the following configuration:

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

@m-usmanayub
Copy link

I got an issue, that my logfile was filled with backslashes. I am using containerd instead of docker. I solved it by putting in the following configuration:

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

Did not work for me on 1.20.1 hosted on VMs. Still the same backslashes full of error

@vipinjn24
Copy link

I am using containerd as the CRI for kubernetes and used FLUENT_CONTAINER_TAIL_PARSER_TYPE env var.
But it seems that now the logs are somewhat readable but the time format is incorrect so the error is shown for that.

Any solution to this problem or can we change the time format by any env var?

@vipinjn24
Copy link

vipinjn24 commented Jan 8, 2021

Ok, got it on how to fix this one.

First we know that we need to change the logging format as containerd do not use json format and is a regular text format.
so we add the below environment variable to the daemonset.

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/

Now when we do this, it still shows error with the time format.
To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.

  volumeMounts:
  - name: fluentd-config
    mountPath: /fluentd/etc/kubernetes.conf
    subPath: kubernetes.conf
volumes:
- name: fluentd-config
  configMap:
    name: fluentd-config
    items:
    - key: kubernetes.conf
      path: kubernetes.conf

So to fix the error, we update the following value inside the source.

<source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%N%:z
      </parse>
</source>

time_format %Y-%m-%dT%H:%M:%S.%NZ
to
time_format %Y-%m-%dT%H:%M:%S.%N%:z

Now deploy the daemonset, it will work.

@cosmo0920
Copy link
Contributor

cosmo0920 commented Mar 5, 2021

I'd published the new parser included images: #521, 2736b68

With FLUENT_CONTAINER_TAIL_PARSER_TYPE, we can specify cri type parser for parsing CRI format Logs.

ref: https://github.com/fluent/fluentd-kubernetes-daemonset#use-cri-parser-for-containerdcri-o-logs

@cosmo0920 cosmo0920 self-assigned this Mar 5, 2021
@hari819
Copy link

hari819 commented Apr 22, 2021

we are facing this issue with slashes "\\" , we use v1.12-debian-elasticsearch7-1 version of daemonset , we are currently testing the workarounds mentioned in this issue ,

would like to know if there would be a newer version of the daemonset after fixing the issue or do we need to use the workarounds permanently,

thanks,

@optimus-kart
Copy link

With BDRK-3386 is this issue fixed?

@faust64
Copy link

faust64 commented Sep 4, 2021

From what I can see, there's still no way to concatenate partial logs coming from containerd or cri-o. Nor to pass a regular expression, when FLUENT_CONTAINER_TAIL_PARSER_TYPE would be set to regexp

Containerd and cri-o requires something such as, reconstructing logs split into multiple lines (partials):

    <filter kubernetes.**>
      @type concat
      key log
      partial_key logtag
      partial_value P
      separator ""
    </filter>

The filter above relies on some logtag, defined as following:

      <parse>
         @type regexp
         expression /^(?<logtime>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/i
         time_key logtime
         time_format %Y-%m-%dT%H:%M:%S.%N%Z
      </parse>

I'm not sure how to make those filter block and regexpr conditional.
Nor that we can come up with a configuration that would suit both containerd/cri-o and Docker.
You would also need to change some inputs sources (systemd units, docker log file).
I've given up and written my own ConfigMap, based on the configuration shipping in this image, fixing the few bits I need.

@huangzixun123
Copy link

you can add the env variable FLUENT_CONTAINER_TAIL_PARSER_TYPE with value /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/ - it'll make this daemonset ok with the containerd logs

I was stuck this question all day until i see you answer! Love this answer and the author !!!!!!!!!!!!!!

@ethanhallb
Copy link

As per discussion and this change, make sure to turn off greedy parsing for the timestamp. e.g.

^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$
^(?<time>.+?) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$

With greedy parsing, there's a chance of runaway logging (log errors caused by scraping log errors). Context:

fluent/fluent-bit#5078
fluent/fluent-bit@cf239c2

@nhnam6
Copy link

nhnam6 commented Nov 13, 2022

Ok, got it on how to fix this one.

First we know that we need to change the logging format as containerd do not use json format and is a regular text format. so we add the below environment variable to the daemonset.

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/

Now when we do this, it still shows error with the time format. To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.

  volumeMounts:
  - name: fluentd-config
    mountPath: /fluentd/etc/kubernetes.conf
    subPath: kubernetes.conf
volumes:
- name: fluentd-config
  configMap:
    name: fluentd-config
    items:
    - key: kubernetes.conf
      path: kubernetes.conf

So to fix the error, we update the following value inside the source.

<source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%N%:z
      </parse>
</source>

time_format %Y-%m-%dT%H:%M:%S.%NZ to time_format %Y-%m-%dT%H:%M:%S.%N%:z

Now deploy the daemonset, it will work.

cool. its work well

@huangzixun123
Copy link

huangzixun123 commented Nov 13, 2022 via email

@helloxk617
Copy link

Ok, got it on how to fix this one.

First we know that we need to change the logging format as containerd do not use json format and is a regular text format. so we add the below environment variable to the daemonset.

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) (?<logtag>[FP]) (?<log>.*)$/

Now when we do this, it still shows error with the time format. To solve this we will extract the kubernetes.conf file from a running fluentd container and copy the contents to a config map and mount that value to the kubernetes.conf location i.e. /fluentd/etc/kubernetes.conf.

  volumeMounts:
  - name: fluentd-config
    mountPath: /fluentd/etc/kubernetes.conf
    subPath: kubernetes.conf
volumes:
- name: fluentd-config
  configMap:
    name: fluentd-config
    items:
    - key: kubernetes.conf
      path: kubernetes.conf

So to fix the error, we update the following value inside the source.

<source>
      @type tail
      @id in_tail_container_logs
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag "#{ENV['FLUENT_CONTAINER_TAIL_TAG'] || 'kubernetes.*'}"
      exclude_path "#{ENV['FLUENT_CONTAINER_TAIL_EXCLUDE_PATH'] || use_default}"
      read_from_head true
      <parse>
        @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
        time_format %Y-%m-%dT%H:%M:%S.%N%:z
      </parse>
</source>

time_format %Y-%m-%dT%H:%M:%S.%NZ to time_format %Y-%m-%dT%H:%M:%S.%N%:z

Now deploy the daemonset, it will work.

it works for me ! you are so gorgeous @vipinjn24

@maitza
Copy link

maitza commented Jan 18, 2023

I have seperated file, outside kubernetes.conf named tail_container_parse.conf, inside:

<parse>
  @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
  time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
</parse>

Just use env FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT in daemonset with above time_format fixed for me problem.

@huangzixun123
Copy link

huangzixun123 commented Jan 18, 2023 via email

@vipinjn24
Copy link

I have seperated file, outside kubernetes.conf named tail_container_parse.conf, inside:

<parse>
  @type "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TYPE'] || 'json'}"
  time_format "#{ENV['FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT'] || '%Y-%m-%dT%H:%M:%S.%NZ'}"
</parse>

Just use env FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT in daemonset with above time_format fixed for me problem.

Hmm let me see this one.

@QkiZMR
Copy link

QkiZMR commented Feb 27, 2023

After reading the whole thread, and experimenting with different settings posted here I managed to set up fluentd working with OKD4.

- name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
  value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
- name: FLUENT_CONTAINER_TAIL_PARSER_TIME_FORMAT
  value: '%Y-%m-%dT%H:%M:%S.%N%:z'

I set these two env vars and it works without overwriting any config files in the container.

@faust64
Copy link

faust64 commented Feb 27, 2023

For the record, as it's now the 7th answer suggesting this ...
At some point, I gave the following sample: #412 (comment)
This is still valid.

With something like /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/, your [^ ]* drops a character, that may be a P (partial) or an F (final).
If you do not concat partial lines up until the next final one, you will, eventually, have some logs broken down into several records. At which point: good luck finding anything in Kibana/elasticsearch.

@kfirfer
Copy link

kfirfer commented Mar 18, 2023

Hi

I dont know why the logs is not trying parsed as json first
currently all my logs to elastic for example is under "log" field
this is my containers.input.conf configuration:

    <source>
      @id fluentd-containers.log
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/containers.log.pos
      tag raw.kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
        </pattern>
        <pattern>
          format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
          time_format %Y-%m-%dT%H:%M:%S.%N%:z
        </pattern>
      </parse>
    </source>

    # Detect exceptions in the log output and forward them as one log entry.
    <match raw.kubernetes.**>
      @id raw.kubernetes
      @type detect_exceptions
      remove_tag_prefix raw
      message log
      stream stream
      multiline_flush_interval 5
      max_bytes 500000
      max_lines 1000
    </match>

    # Concatenate multi-line logs
    <filter **>
      @id filter_concat
      @type concat
      key log
      use_first_timestamp true
      multiline_end_regexp /\n$/
      separator ""
      timeout_label @NORMAL
      flush_interval 5
    </filter>

    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      @id filter_kubernetes_metadata
      @type kubernetes_metadata
      skip_labels true
    </filter>

    # Fixes json fields in Elasticsearch
    <filter kubernetes.**>
      @id filter_parser
      @type parser
      key_name log
      reserve_time true
      reserve_data true
      remove_key_name_field true
      <parse>
        @type multi_format
        <pattern>
          format json
        </pattern>
        <pattern>
          format none
        </pattern>
      </parse> 
    </filter>

@huangzixun123
Copy link

huangzixun123 commented Mar 18, 2023 via email

@kfirfer
Copy link

kfirfer commented Mar 18, 2023

Hi

I dont know why the logs is not trying parsed as json first currently all my logs to elastic for example is under "log" field this is my containers.input.conf configuration:

    <source>
      @id fluentd-containers.log
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/containers.log.pos
      tag raw.kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
        </pattern>
        <pattern>
          format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
          time_format %Y-%m-%dT%H:%M:%S.%N%:z
        </pattern>
      </parse>
    </source>

    # Detect exceptions in the log output and forward them as one log entry.
    <match raw.kubernetes.**>
      @id raw.kubernetes
      @type detect_exceptions
      remove_tag_prefix raw
      message log
      stream stream
      multiline_flush_interval 5
      max_bytes 500000
      max_lines 1000
    </match>

    # Concatenate multi-line logs
    <filter **>
      @id filter_concat
      @type concat
      key log
      use_first_timestamp true
      multiline_end_regexp /\n$/
      separator ""
      timeout_label @NORMAL
      flush_interval 5
    </filter>

    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      @id filter_kubernetes_metadata
      @type kubernetes_metadata
      skip_labels true
    </filter>

    # Fixes json fields in Elasticsearch
    <filter kubernetes.**>
      @id filter_parser
      @type parser
      key_name log
      reserve_time true
      reserve_data true
      remove_key_name_field true
      <parse>
        @type multi_format
        <pattern>
          format json
        </pattern>
        <pattern>
          format none
        </pattern>
      </parse> 
    </filter>

Nevermind, successed with this config:

    <source>
      @id fluentd-containers.log
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/containers.log.pos
      tag raw.kubernetes.*
      #read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
        </pattern>
        <pattern>
          format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
          time_format %Y-%m-%dT%H:%M:%S.%N%:z
        </pattern>
      </parse>
    </source>

    # Detect exceptions in the log output and forward them as one log entry.
    <match raw.kubernetes.**>
      @id raw.kubernetes
      @type detect_exceptions
      remove_tag_prefix raw
      message log
      stream stream
      multiline_flush_interval 5
      max_bytes 500000
      max_lines 1000
    </match>

    ## Concatenate multi-line logs
    #<filter **>
    #  @id filter_concat
    #  @type concat
    #  key log
    #  use_first_timestamp true
    #  multiline_end_regexp /\n$/
    #  separator ""
    #  timeout_label @NORMAL
    #  flush_interval 5
    #</filter>

    # Enriches records with Kubernetes metadata
    <filter kubernetes.**>
      @id filter_kubernetes_metadata
      @type kubernetes_metadata
      skip_labels true
    </filter>

    # Fixes json fields in Elasticsearch
    <filter kubernetes.**>
      @id filter_parser
      @type parser
      key_name log
      reserve_time true
      reserve_data true
      remove_key_name_field true
      <parse>
        @type multi_format
        <pattern>
          format json
        </pattern>
        <pattern>
          format none
        </pattern>
      </parse>
    </filter>

htquach added a commit to htquach/amazon-cloudwatch-container-insights that referenced this issue May 3, 2023
The `containerd` runtime generate logs as a non-JSON string.  When switched to `containerd` runtime, `fluentd` will fail to parse any non-JSON log message and produce a large amount of parse error messages in its container logs.

Here is an open issue at `fluentd` repo:  fluent/fluentd-kubernetes-daemonset#412

**docker** runtime (a valid JSON string)
`{"log":"2023-05-02 20:17:16 +0000 [info]: #0 [filter_kube_metadata_host] stats - namespace_cache_size: 0, pod_cache_size: 0\n","stream":"stdout","time":"2023-05-02T20:17:16.666667387Z"}`

**containerd** runtime (just a string)
`2023-05-02T20:17:28.143532061Z stdout F 2023-05-02 20:17:28 +0000 [info]: #0 [filter_kube_metadata_host] stats - namespace_cache_size: 0, pod_cache_size: 0`

Here is an example of a short entry from a `fluentd` container log.
```
2023-05-02 19:51:40 +0000 [warn]: #0 [in_tail_fluentd_logs] pattern not matched: "2023-05-02T19:51:17.411234908Z stdout F \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\""
```
@rahulpandit21
Copy link

We are getting issue in cri parser of fluentbit after EKS Upgrade to 1.24

With below parser , log: prefix is missing when it forward logs to splunk

[PARSER]
    Name        cri
    Format      regex
    Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L%z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests