Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline log guidance #100

Closed
HaroonSaid opened this issue Nov 8, 2020 · 97 comments
Closed

Multiline log guidance #100

HaroonSaid opened this issue Nov 8, 2020 · 97 comments
Labels
enhancement Feature request or enhancement on existing features

Comments

@HaroonSaid
Copy link

HaroonSaid commented Nov 8, 2020

We have the following configuration

{
      "essential": true,
      "name": "log_router",
      "firelensConfiguration": {
        "type": "fluentbit",
        "options": {
          "enable-ecs-log-metadata": "true"
        }
      }
      "memoryReservation": 50,
      "image": "906394416424.dkr.ecr.${AWS_REGION}.amazonaws.com/aws-for-fluent-bit:latest"
    },

We want to have multiline logs for stack trace etc.
How should I configure fluentbit

@PettitWesley
Copy link
Contributor

Fluent Bit unfortunately does not yet have generic multiline logging support that can be used with FireLens. We are planning to work on it. For now, you must use Fluentd: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluentd/multiline-logs

@zhonghui12 zhonghui12 added the enhancement Feature request or enhancement on existing features label Nov 20, 2020
@belangovan
Copy link

@zhonghui12 , @PettitWesley we are using firelens configuration with aws-for-fluent-bit for multi-destination log routing which includes cloudwatch as one of the sources. Multiline log grouping we need to make the most out of our logs. whether is any custom parser we can use to achieve it also fine.

@PettitWesley
Copy link
Contributor

@belangovan there has been no change in guidance since my last comment on this issue. Fluent Bit still only have multiline support that works when tailing a log file. It does not have generic multiline support that works with FireLens. We are planning to work on that some time in the next few months. Until then, you have to use Fluentd for multiline.

@HaroonSaid
Copy link
Author

Is this feature you guys are planning on working soon?

Just want to know on how to plan for our organization

Do we switch to fluentd or wait. If we wait - how long

@PettitWesley
Copy link
Contributor

@HaroonSaid We have begun investigation for this project. We hope to get it launched within 2 months, however, there are no guarantees.

@corleyma
Copy link

@PettitWesley Any updates re: whether this project is launching as intended? Debating whether we have to change an internal logging system to support fluentd or if we can wait for fluentbit multiline support to land.

@PettitWesley
Copy link
Contributor

@corleyma The upstream maintainers are working on it apparently- I've been told that it should be ready/launched sometime in May.

@silvervest
Copy link

@PettitWesley May... this year? Any update on this? It'd be a very useful feature for us.

@PettitWesley
Copy link
Contributor

@silvervest Yeah it was supposed to be May of this year. Progress has been made upstream but the launch is delayed till sometime in June.

@PettitWesley
Copy link
Contributor

This is launching very soon: fluent/fluent-bit#337 (comment)

@aaronrl95
Copy link

Just to clarify, is the multi-line support now available for use in this image? Or are we still awaiting that implementation?

@hossain-rayhan
Copy link
Contributor

Hi @aaronrl95, it was included in v2.18.0.

@aaronrl95
Copy link

Ah great, thank you. Could you point me to the documentation around implementing that feature in our firelens configuration? I'm struggling to find any

@hossain-rayhan
Copy link
Contributor

You can follow this Firelens example.

@aaronrl95
Copy link

@hossain-rayhan thank you for that, that's just what I'm looking for

@vinaykrish25aws
Copy link

@hossain-rayhan Does this solution also applicable for JSON format logs produced by Docker container ?

@hossain-rayhan
Copy link
Contributor

@hossain-rayhan Does this solution also applicable for JSON format logs produced by Docker container ?

@zhonghui12 or @PettitWesley can you answer this?

@zhonghui12
Copy link
Contributor

zhonghui12 commented Aug 4, 2021

@hossain-rayhan Does this solution also applicable for JSON format logs produced by Docker container ?

@zhonghui12 or @PettitWesley can you answer this?

I assume that if the JSON format logs are split into multiple lines, then it can be concatenated as there is no obvious limit here: https://docs.fluentbit.io/manual/pipeline/filters/multiline-stacktrace. But maybe @PettitWesley can give a more certain answer here.

Or maybe we should help to test it out.

@StasKolodyuk
Copy link

@hossain-rayhan @zhonghui12 @PettitWesley hi guys, I've been trying to use multiline support to concat partial messages splitted by containerd (AWS Fargate), however it didn't work. I've been using approach described by @hossain-rayhan with the following config:

[SERVICE]
    Flush 1
    Grace 30
    Log_Level debug

[FILTER]
    name                  multiline
    match                 *
    multiline.key_content log
    multiline.parser      cri, docker

Could you please take a look, thanks!

More details on my setup and what I'm trying to achieve:
I have a Spring Boot App that logs to stdout using Logstash-logback-encoder to log in JSON format (one JSON log entry per line). There's a JSON field called "stack_trace" that may be very long. When the log line is longer > 16k chars (which usually occurs for a stack trace), containerd (AWS Fargate 1.4 runtime) splits it into several parts. Then Fluent bit receives those JSON parts. At this point I'd like Fluent Bit to merge them and parse as JSON. However, as I said, this is what I fail to get working right now

@PettitWesley
Copy link
Contributor

@StasKolodyuk you to create a custom multiline parser I think. I don't know exactly how to solve this use case with the new multiline support. I suspect with a custom parser with a custom regex it should be possible.

https://docs.fluentbit.io/manual/pipeline/filters/multiline-stacktrace

@PettitWesley
Copy link
Contributor

@vinaykrish25aws Yes the new filter will work with json logs from Docker. In that case, the log content is in the log key and you specify that key in the filer:

    multiline.key_content log

If the content of that key is itself nested json that need to be recombined or something then that's a more complicated use case which might need custom parser and/or additional parsing steps.

@opteemister
Copy link

Hi, I have similar problem.
We also have json logs split by docker running on AWS Fargate cluster.
I don't think that json is really matters here because it is just a string. But even with mutiline filter - fluentbit can't concatenate such logs.
I double checked that our logs have log key and all configurations are as same as in documentation.

@shijupaul
Copy link

Following configuration is not working for me, not merging java stack-trace to single entry. Any thoughts?

Dockerfile
Screenshot 2021-08-18 at 17 30 59

parsers_multiline.conf
Screenshot 2021-08-18 at 17 31 12

extra.conf
Screenshot 2021-08-18 at 17 31 25

Section from Task Definition
Screenshot 2021-08-18 at 17 34 58

@PettitWesley
Copy link
Contributor

@shijupaul Unfortunately, since this feature is new, we are still learning and understanding as well, and there are very few working examples that we have as well... so right now everyone is figuring it out.

So actually, if you or anyone in this thread get a working example for a use case you think is decently common, please do share it. This will benefit the community. I'm also slowly working on slowly improving our FireLens/Fluent Bit FAQ/examples, and this data can be used for that.

Can you share what these java stack traces look like? And I recommend that you (and everyone) test their own logs with the regular expressions that you write in the multiline parser using the rubular website: https://rubular.com/

If the regex's don't work there with your logs... then that's the problem. That should be your first debug step.

@lbunschoten
Copy link

Hello 👋 I thought I'd share my attempts as well here, as it might be useful to someone. I've been trying to get this to work for a couple days now as well, but so far without any luck. I have a pretty much identical setup as @shijupaul (I don't have the grep filter). I've playing around with these regexes quite a bit, but it doesn't seem to have any effect at all. Even if I put in a regex like /.*/ for both rules, you don't see any difference in the end result. I am getting the feeling now that problems is elsewhere to be honest.

To verify my hypothesis, I have been trying a couple of things:

  • I've verified that my custom image is actually picked up -> Hash of the image on fargate matches my local version
  • Tried breaking the conf file on purpose by removing the [SERVICE] block -> task failed to start, so the conf file is picked up
  • Tried using a ton of different regexes, including crazy things like /.*/, no change in the outcome
  • Tried removing the multiline.key_content, no change either
  • Tried setting a larger duration for the flush -> no change either

I also ran it locally using fluent-bit -c multiline-parser.conf. I tried to mimic the fargate config, but used a tail input instead:

[SERVICE]
    Parsers_File parsers.conf
    Flush 1
    Grace 30

[INPUT]
    name              tail
    path              log.txt
    read_from_head    true

[FILTER]
    name                  multiline
    match                 *
    multiline.key_content log
    multiline.parser      multiline-regex-test

[FILTER]
    Name                  parser
    Match                 *
    Key_Name              log
    Parser                json
    Reserve_Data          True

[OUTPUT]
    name                  stdout
    match                 *

The interesting thing is that there I do see that it has as an effect. I can see how multiple log lines are combined. I have a couple of theories now:

  • The multiline.key_content field is not supposed to be log, but something else. I don't have access to the raw logs yet, so it is a bit hard to verify.
  • The multiline_parser does not work with the forward input for some reason.

Any tips or tricks are appreciated! In the meantime, I'll keep debugging

@f0o
Copy link

f0o commented Aug 19, 2021

@lbunschoten @PettitWesley This is what I experienced as well...

Correct me if I'm wrong but I believe that the issue is the Source of the logs - Our images only get it as Forwarded messages from the emitter (https://github.com/aws/aws-for-fluent-bit/blob/mainline/fluent-bit.conf#L1-L4).

This might make it pointless to try to concat it through the use of metadata (like CRI's logtag or Docker's partial_message) because those could be filtered out or not forwarded to us in the first place.

That would match our experienced behavior here.

@magg
Copy link

magg commented Mar 6, 2022

@PettitWesley can you share how to configure buffer mode in fluent-bit with multiline support? I know the aws image already has it preconfigured. According to this thread and your link I should enabled buffer mode, but the official fluent bit documentation does not mention what are the valid values for buffer is it on/off or true/false?

[MULTILINE_PARSER]
    name          multiline-regex-test
    type          regex
    flush_timeout 1000
    Time_Key time
    Time_Format %d-%m-%y %H:%M:%S.%L
    #
    # Regex rules for multiline parsing
    # ---------------------------------
    #
    # configuration hints:
    #
    #  - first state always has the name: start_state
    #  - every field in the rule must be inside double quotes
    #
    # rules |   state name  | regex pattern                  | next state
    # ------|---------------|--------------------------------------------
    rule      "start_state"   "/^\[(?<time>[^\]]*)\] \[(?<source>[^\]]*)\] \[(?<level>[^\]]*)\] \[(?<uuid>[^\]]*)\] (?<log>.+)$/"  "cont"
    rule      "cont"          "/^\s+at.*/"                     "cont"
[SERVICE]
    HTTP_Server  On
    Parsers_File fluent-bit-parsers.conf
    Flush 5
[INPUT]
    Name forward
[FILTER]
    Match *
    Name parser
    multiline.key_content log
    multiline.parser      java, multiline-regex-test
    buffer on

Do you think this configuration will work?

@7nwwrkdnakht3
Copy link

I have a very strange issue.

  1. I run a ECS container with supervisord as a root user. Supervisord calls fluentbit. This command ships logs to s3 and logzio. Having tested the multiline configuration in stdout locally it works fine. However the fluentbit command does not work as the initial command.
  2. Logging into ECS and executing the same command without altering configuration files makes multiline work.

@PettitWesley
Copy link
Contributor

@magg

valid values for buffer is it on/off or true/false?

On/off/true/false case insensitive are all supported for Fluent Bit bool config values.

Your configuration looks correct to me. Looks like my doc changes only got deployed for 1.9 pre release: https://docs.fluentbit.io/manual/v/1.9-pre/pipeline/filters/multiline-stacktrace

@7nwwrkdnakht3 Can you please open a separate issue to troubleshoot your problem and please provide us with more details like the FB config.

@magg
Copy link

magg commented Mar 18, 2022

hi @PettitWesley I have succesfully tested with both images (the aws image and the fluent bit one) the multiline parser with ECS containers in fargate to send logs to OpenSearch

but the downside is when I used a regular parser I was able to match fields I declared on my regex (i.e. time, source, level, uuid) and see them on OpenSearch

this regular parser uses the same regex as my multiline parser on my comment above

[PARSER]
    Name main_parser
    Format regex
    Regex ^\[(?<time>[^\]]*)\] \[(?<source>[^\]]*)\] \[(?<level>[^\]]*)\] \[(?<uuid>[^\]]*)\] (?<log>.+)$
    Time_Key time
    Time_Format %d-%m-%y %H:%M:%S.%L

results into

{
  "_index": "okay-logs-2022.03.06",
  "_type": "_doc",
  "_id": "Pw2JXX8BrnHC10AfjuOW",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2022-03-06T04:42:49.699Z",
    "source": "multi-tenant-gateway",
    "level": "INFO",
    "uuid": "4a12f88f-2f85-44aa-aaa6-81998551da82",
    "log": "[http-nio-9080-exec-2] c.p.g.s.t.i.AsyncTenantCallbackService - Posting message to Tenant#10000."
  },
  "fields": {
    "@timestamp": [
      "2022-03-06T04:42:49.699Z"
    ]
  },
  "sort": [
    1646541769699
  ]
}

but now with multiline parsers I get this on OpenSearch

{
  "_index": "okay-logs-2022.03.11",
  "_type": "_doc",
  "_id": "Ww1TeX8BrnHC10AfP-zR",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2022-03-11T14:12:49.000Z",
    "level": "INFO",
    "logger": "com.protectoria.gateway.service.tenant.impl.AsyncTenantCallbackService",
    "thread": "http-nio-9080-exec-6",
    "message": "[11-03-22 14:12:49.703] [multi-tenant-gateway] [INFO] [bb973f29-3954-4d94-b7b4-d477f046c3a9] [http-nio-9080-exec-6] c.p.g.s.t.i.AsyncTenantCallbackService - Posting message to Tenant#10000.\n"
  },
  "fields": {
    "@timestamp": [
      "2022-03-11T14:12:49.000Z"
    ]
  },
  "sort": [
    1647007969000
  ]
}

Do you know if matching is not enabled on multiline parsers or why my configuration can't match the fields I put on my regex

@PettitWesley
Copy link
Contributor

@magg I think your use case is for parsing the log to split out the data into separate fields? That's a use case for the "norma'" parsers and the normal filter parser. I think this is not a use case for multiline- multiline is for joining multiple events into one event.

Also, you can use both. You can have two filters. You can concat events together with multiline first, and then split out fields from the log using normal parser, or the reverse.

@magg
Copy link

magg commented Mar 18, 2022

@PettitWesley yes I get the idea, I'm able to see the full java stack trace thanks the to multiline line parser.

But all my log lines including the first one that contain the stacktrace have the format [timestamp][service][level][thread][class] I wanted extract those into fields for a better experience on OpenSearch. I wanted both options.

Do you have any example of using both filters? I cannot find it anywhere :(

@trallnag
Copy link

trallnag commented Mar 29, 2022

So from what I understand it is currently not possible to use the input plugin tail with multiline.parser cri at the same time with the filter plugin multiline?

I'm using AWS EKS and I want to merge Java stack trace records into a single log record in CloudWatch, but I'm failing to do so.

[INPUT]
  name              tail
  tag               application.*
  path              /var/log/containers/*.log
  multiline.parser  cri
  skip_long_lines   on
  refresh_interval  10
  rotate_wait       30
  db                /var/fluent-bit/state/input-kube.db
  db.locking        true
  storage.type      filesystem
  mem_buf_limit     32MB

[FILTER]
  name                 kubernetes
  match                application.*
  kube_tag_prefix      application.var.log.containers.
  merge_log            on
  keep_log             off
  k8s-logging.parser   on
  k8s-logging.exclude  on
  labels               on
  annotations          off
  use_kubelet          true
  buffer_size          0

@PettitWesley
Copy link
Contributor

@magg

Using two filters is just a matter of have two filter definitions like:


[FILTER]
     Name multiline
     Match *
     ....

[FILTER]
     Name parser
     Match *
     ....

@PettitWesley
Copy link
Contributor

@trallnag

So from what I understand it is currently not possible to use the input plugin tail with multiline.parser cri at the same time with the filter plugin multiline?

You should be able to use both the tail multiline functionality and also the filter. You shouldn't use the same multiline parser with each, use a different parser, but it should work.

@trallnag
Copy link

trallnag commented Mar 31, 2022

Yes! It works! Thanks a lot @PettitWesley, no idea what I did wrong last time I tried it. I will add my config here, maybe it can help someone in the future

Logs before getting it to work:

{"log": "com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server"}
{"log": "\tat com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:112) ~[eureka-client-1.9.17.jar!/:1.9.17]"}

Now I add the multiline filter to my config:

[INPUT]
    name              tail
    tag               kube.*
    path              /var/log/containers/*.log
    multiline.parser  cri
    skip_long_lines   on
    refresh_interval  10
    rotate_wait       30
    db                /var/fluent-bit/state/input-kube.db
    db.locking        true
    storage.type      filesystem
    mem_buf_limit     32MB
    buffer_max_size   128k

[FILTER]
    name multiline
    match kube.*
    multiline.parser java
    multiline.key_content log

And it merges the stack traces. Here is a screenshot from CloudWatch:

image

Relevant line from fluent-bit logs:

[2022/03/31 21:59:02] [ info] [filter:multiline:multiline.0] created new multiline stream for tail.0_kube.var.log.containers[...]

Edit: As a "micro-optimization" you can also set buffer off on the multiline filter as it is not necessary for tail input.

@PettitWesley
Copy link
Contributor

Note please see: fluent/fluent-bit#5235

If you have more than one multiline filter definition and they match the same records, it can cause all sorts of trouble. I am trying to figure out how to fix this.

@tatsuo48
Copy link

tatsuo48 commented Apr 7, 2022

I want to combine docker logs.

https://docs.fluentbit.io/manual/v/1.9-pre/pipeline/filters/multiline-stacktrace

If you wish to concatenate messages read from a log file, it is highly recommended to use the multiline support in the Tail plugin itself. This is because performing concatenation while reading the log file is more performant. Concatenating messages originally split by Docker or CRI container engines, is supported in the Tail plugin.

Because of this caution, I want to write multiline.parser docker in INPUT, but my INPUT is forward.
Is there any effort to make multiline.parser docker available in forward INPUT?

@PettitWesley
Copy link
Contributor

@tatsuo48 that statement only applies to tail, because tail gets the logs all at once in chunks read from the file and so it's most efficient to do the multiline processing there. For forward plugin, based on my understanding of the underlying code, there shouldn't really be much of a difference between implementing buffering and multiline concat directly in forward vs in my filter. The way filters actually work is that they are sort of like run in the same context as the input, they're sort of like extensions attached to an input. If that makes sense/helps. So please just use the filter :)

@tatsuo48
Copy link

tatsuo48 commented Apr 8, 2022

@PettitWesley

that statement only applies to tail, because tail gets the logs all at once in chunks read from the file and so it's most efficient to do the multiline processing there.

I understand what you are saying.
I will try it with FILTER. thank you!

@James96315
Copy link

path /var/log/containers/*.log
multiline.parser cri
skip_long_lines on
refresh_interval 10
rotate_wait 30

Yes! It works! Thanks a lot @PettitWesley, no idea what I did wrong last time I tried it. I will add my config here, maybe it can help someone in the future

Logs before getting it to work:

{"log": "com.netflix.discovery.shared.transport.TransportException: Cannot execute request on any known server"}
{"log": "\tat com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:112) ~[eureka-client-1.9.17.jar!/:1.9.17]"}

Now I add the multiline filter to my config:

[INPUT]
    name              tail
    tag               kube.*
    path              /var/log/containers/*.log
    multiline.parser  cri
    skip_long_lines   on
    refresh_interval  10
    rotate_wait       30
    db                /var/fluent-bit/state/input-kube.db
    db.locking        true
    storage.type      filesystem
    mem_buf_limit     32MB
    buffer_max_size   128k

[FILTER]
    name multiline
    match kube.*
    multiline.parser java
    multiline.key_content log

And it merges the stack traces. Here is a screenshot from CloudWatch:

image

Relevant line from fluent-bit logs:

[2022/03/31 21:59:02] [ info] [filter:multiline:multiline.0] created new multiline stream for tail.0_kube.var.log.containers[...]

Edit: As a "micro-optimization" you can also set buffer off on the multiline filter as it is not necessary for tail input.

if you set "Path_Key" in input, can you still collect logs?

@trallnag
Copy link

Btw I'm having issues with multiline atm:


@James96315, I don't see why it shouldn't work. But I haven't tried it.

@PettitWesley
Copy link
Contributor

@James96315 yea Path_Key should work

@James96315
Copy link

multiline.parser java
multiline.key_content log

I use "amazon/aws-for-fluent-bit:2.24.0", if I add "path_key" in "[INPUT]", it can't work. I got some info from fluent bit log:

2022-05-25T14:04:37.249061747Z stderr F [2022/05/25 14:04:37] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/app-spring-boot-demo-6d5dbf7b55-zm5jm_spring-boot-ns_app-spring-boot-demo-4178b6d80bd2adaeee57038e894318cbb7ba81971a75bd3ccc
d861eca856baa6.log, inode 31464022
2022-05-25T14:04:37.249067397Z stderr F [2022/05/25 14:04:37] [debug] [input:tail:tail.0] 0 new files found on path '/var/log/containers/app-spring-boot-demo*'
2022-05-25T14:04:40.544133604Z stderr F [2022/05/25 14:04:40] [debug] [input:tail:tail.0] inode=31464022 events: IN_MODIFY
2022-05-25T14:04:40.54419313Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545447254Z stderr F [2022/05/25 14:04:40] [debug] [input:tail:tail.0] inode=31464022 events: IN_MODIFY
2022-05-25T14:04:40.545687975Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545724571Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545737097Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545739314Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545741563Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545768163Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545776386Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545779263Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes
2022-05-25T14:04:40.545788355Z stderr F [2022/05/25 14:04:40] [debug] [input chunk] skip ingesting data with 0 bytes

Here is my config:

  fluent-bit.conf: |
    [SERVICE]
        Flush                       5
        Daemon                      off
        Log_level                   Debug
        Http_server                 On
        Http_listen                 0.0.0.0
        Http_port                   2022
        Parsers_File                parsers.conf
        storage.path                /var/fluent-bit/state/flb-storage/
        storage.sync                normal
        storage.checksum            Off
        storage.backlog.mem_limit   5M
    
    [INPUT]
        Name                tail
        Tag                 kube.var.log.containers.spring-boot.*
        Path                /var/log/containers/app-spring-boot-demo*
        Path_Key            file_name
        Skip_Long_Lines     On
        multiline.parser    cri
        DB                  /var/fluent-bit/state/flb_container-spring-boot.db
        DB.locking          true
        Docker_Mode         Off
        
        Mem_Buf_Limit       50MB
        Buffer_max_size     64K
        Refresh_Interval    10
        Rotate_Wait         30
        Storage.type        filesystem
        Read_from_Head      True

    [FILTER]
        Name                    multiline
        Match                   kube.var.log.containers.spring-boot.*
        multiline.key_content   log
        Multiline.parser        java

    [OUTPUT]
        Name                kinesis_streams
        Match               kube.var.log.containers.spring-boot.*
        Region              ap-south-1
        Stream              kds-spring-boot
        Retry_Limit         False
    

The sample log is as follows:

2022-05-23T16:01:30.941156659Z stdout F 2022-05-23 16:01:30.940 INFO  [http-nio-0.0.0.0-8080-exec-6] com.demo.petstore.PetstoreApplication : nginx forward
2022-05-23T16:01:30.944619148Z stdout F 2022-05-23 16:01:30.943 ERROR [http-nio-0.0.0.0-8080-exec-6] com.demo.petstore.PetstoreApplication : hello processing failed
2022-05-23T16:01:30.944654786Z stdout F java.lang.RuntimeException: bad request
2022-05-23T16:01:30.94465975Z stdout F  at com.demo.petstore.PetstoreApplication.hello(PetstoreApplication.java:24)
2022-05-23T16:01:30.944663547Z stdout F         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-05-23T16:01:30.944666992Z stdout F         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2022-05-23T16:01:30.944673364Z stdout F         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2022-05-23T16:01:30.944677207Z stdout F         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
2022-05-23T16:01:30.94468074Z stdout F  at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
2022-05-23T16:01:30.944683631Z stdout F         at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150)
2022-05-23T16:01:30.944687277Z stdout F         at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:117)
2022-05-23T16:01:30.944699432Z stdout F         at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
2022-05-23T16:01:30.944702574Z stdout F         at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808)

If the parsing is correct, the number of output log records should be 2, but it has not been successful

@PettitWesley
Copy link
Contributor

@James96315 Thanks for the report, just to be clear, when you remove Path_Key, you see these records concatenated? Can you share an example of concatenated/unconcatenated results please.

@James96315
Copy link

James96315 commented May 26, 2022

concatenated

  1. If I remove "Path_Key", I also have to set "Buffer False", otherwise the pod will crash.
    [FILTER]
        Name                    multiline
        Match                   kube.var.log.containers.spring-boot.*
        multiline.key_content   log
        Multiline.parser        java
       Buffer                     False

2.Even if I set the buffer to False and remove the Path_Key, the log parsing is not right. The number of parsed records is sometimes 3, sometimes 5, which seem to be random. The logs keep getting split and not parsed correctly.
3.The correct parsing result should be as follows, just 2 records.

2022-05-23 16:01:30.940 INFO  [http-nio-0.0.0.0-8080-exec-6] com.demo.petstore.PetstoreApplication : nginx forward

2022-05-23 16:01:30.943 ERROR [http-nio-0.0.0.0-8080-exec-6] com.demo.petstore.PetstoreApplication : hello processing failed
java.lang.RuntimeException: bad request
at com.demo.petstore.PetstoreApplication.hello(PetstoreApplication.java:24)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
         at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:150)
         at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:117)
         at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:895)
         at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808)

@dschaaff
Copy link

@clouddev-code
Copy link

Is it possible to pass first through the multiline filter and then through the rewrite_tag filter?
It looks like the mutliline stream is running twice from fluentbit container.

[2022/08/30 03:03:28] [ info] [filter:multiline:multiline.1] created new multiline stream for forward.1_app-firelens-0955bd55b48543688b97f84fd50df999
[2022/08/30 03:03:38] [ info] [filter:multiline:multiline.1] created new multiline stream for emitter.7_application_app-firelens-0955bd55b48543688b97f84fd50df999

Fluentbit version 2.24.0

[FILTER]
   name                  multiline
   match                 *
   multiline.key_content log
   multiline.parser      multiline-regex-test

[FILTER]
   Name rewrite_tag
   Match *-firelens-*
   Rule  $log ^(.*\[access\]).*$ access_$TAG false

@PettitWesley
Copy link
Contributor

@clouddev-code https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#rewrite_tag-filter-and-cycles-in-the-log-pipeline

Rewrite tag moves data to head of the pipeline, so it moves through the filter again with new tag.

@TWCM
Copy link

TWCM commented Dec 29, 2022

Hi, I'm looking for a guide of multiline for eks fargate logging. Here is the official documentation .

Is it possible to apply the mutiline parser in the built-in log router, I checked the document, and it's not allowed to add a multiline filter in the filter.config.
Here is the content mentioned in the doc,

When creating the ConfigMap, take into account the following rules that Fargate uses to validate fields:

[FILTER], [OUTPUT], and [PARSER] are supposed to be specified under each corresponding key. For example, [FILTER] must be under filters.conf. You can have one or more [FILTER]s under filters.conf. The [OUTPUT] and [PARSER] sections should also be under their corresponding keys. By specifying multiple [OUTPUT] sections, you can route your logs to different destinations at the same time.

Fargate validates the required keys for each section. Name and match are required for each [FILTER] and [OUTPUT]. Name and format are required for each [PARSER]. The keys are case-insensitive.

Environment variables such as ${ENV_VAR} aren't allowed in the ConfigMap.

The indentation has to be the same for either directive or key-value pair within each filters.conf, output.conf, and parsers.conf. Key-value pairs have to be indented more than directives.

Fargate validates against the following supported filters: grep, parser, record_modifier, rewrite_tag, throttle, nest, modify, and kubernetes.

Fargate validates against the following supported output: es, firehose, kinesis_firehose, cloudwatch, cloudwatch_logs, and kinesis.

At least one supported Output plugin has to be provided in the ConfigMap to enable logging. Filter and Parser aren't required to enable logging.

@PettitWesley
Copy link
Contributor

@TWCM Supporting custom multiline parsers on EKS Fargate would be a feature request, can you please open a request here for that: https://github.com/aws/containers-roadmap/issues

And @ mention me in it.

@PettitWesley
Copy link
Contributor

I'm going to close this issue as it is very old and the full multiline support was launched last year. Please open a new issue for new multiple issues or requests.

@svrviny1324
Copy link

@PettitWesley
hi here iam trying to use multiline parser and trying to merge logs which are related to same pod
below is my log formate before using multiline parser i can view logs in cloudwatch in below formate

{
"log": "2023-04-28T09:42:39.72883634Z stderr F [2023/04/28 09:42:39] [ info] [input:tail:tail.2] multiline core started",
"kubernetes": {
"pod_name": "fca-de-green-kafka-consumer-offset-resetter-6cf9856b8-5jffb",
"namespace_name": "",
"pod_id": "",
"host": "",
"container_name": "resetter",
"docker_id": "",
"container_hash": "",
"container_image": ""
}
}
{
"log": "2023-04-28T09:42:39.729908443Z stderr F [2023/04/28 09:42:39] [ info] [input:systemd:systemd.3] seek_cursor=s=7f2279a6a5d640418ee14fca72a59e8a;i=f38... OK",
"kubernetes": {
"pod_name": "fca-de-green-kafka-consumer-offset-resetter-6cf9856b8-5jffb",
"namespace_name": "",
"pod_id": "",
"host": "",
"container_name": "resetter",
"docker_id": "",
"container_hash": "",
"container_image": ""
}
}

here i have modified my config by adding filter multiline to merge logs
configmap:
[SERVICE]
Flush 2
Log_Level info
Daemon Off
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020

[INPUT]
    Name                tail
    Tag                 application.*
    Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
    Path                /var/log/containers/*.log
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  container_firstline
    #Parser              docker
    multiline.parser    docker
    DB                  /var/fluent-bit/state/flb_container.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Rotate_Wait         30
    Read_from_Head      Off

[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/fluent-bit*
    #Parser              docker
    multiline.parser    docker
    DB                  /var/fluent-bit/state/flb_log.db
    Mem_Buf_Limit       5MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Read_from_Head      Off

[INPUT]
    Name                tail
    Tag                 application.*
    Path                /var/log/containers/cloudwatch-agent*
    Docker_Mode         On
    Docker_Mode_Flush   5
    Docker_Mode_Parser  cwagent_firstline
    #Parser              docker
    multiline.parser    docker
    DB                  /var/fluent-bit/state/flb_cwagent.db
    rotate_wait         15
    Mem_Buf_Limit       15MB
    Skip_Long_Lines     On
    Refresh_Interval    25
    #Read_from_Head      true

[FILTER]
    Name                kubernetes
    Match               application.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
    Kube_Tag_Prefix     application.var.log.containers.
    Merge_Log           On
    Merge_Log_Trim      Off
    Merge_Log_Key      log
    K8S-Logging.Parser  On
    K8S-Logging.Exclude Off
    Labels              Off
    Annotations         Off
    
  [FILTER]
    name                  multiline
    match                 application.*
    multiline.key_content log
    multiline.parser      docker
    
     [OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   ${APP_POD_NAMESPACE}-
    auto_create_group   true
    extra_user_agent    container-insights

custom_parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%LZ

     [PARSER]
    Name                cwagent_firstline
    Format              regex
    Regex               (?<log>(?<="log":")\d{4}[\/-]\d{1,2}[\/-]\d{1,2}[ T]\d{2}:\d{2}:\d{2}(?!\.).?)(?<!\\)".(?<stream>(?<="stream":").?)".(?<time>\d{4}-\d{1,2}-\d{1,2}T\d{2}:\d{2}:\d{2}\.\w*).*(?=})
    Time_Key            time
    Time_Format         %Y-%m-%dT%H:%M:%S.%LZ
    
    ---------- logs got merged but here i want to append /n between log to log ,how i can do it please can you help for this @PettitWesley 

    After using multiline
    {

"log": "2023-04-27T10:13:27.942291886Z stdout F Kafka Mirror Maker consumer configuration:2023-04-27T10:13:27.950593993Z stdout F # Bootstrap servers2023-04-27T10:13:27.950616934Z stdout F bootstrap.servers=adept-ingress-kafka-bootstrap.adept:90922023-04-27T10:13:27.950621654Z stdout F # Consumer group2023-04-27T10:13:27.950625174Z stdout F group.id=adept-rtti-mirrormaker2023-04-27T10:13:27.950628244Z stdout F # Provided configuration2023-04-27T10:13:27.950631164Z stdout F auto.commit.interval.ms=30002023-04-27T10:13:27.950633984Z stdout F auto.offset.reset=latest2023-04-27T10:13:27.950636494Z stdout F enable.auto.commit=true2023-04-27T10:13:27.950638424Z stdout F isolation.level=read_committed2023-04-27T10:13:27.950640334Z stdout F 2023-04-27T10:13:27.950642154Z stdout F 2023-04-27T10:13:27.950644074Z stdout F security.protocol=PLAINTEXT2023-04-27T10:13:27.950645995Z stdout F 2023-04-27T10:13:27.950647745Z stdout F 2023-04-27T10:13:27.950649485Z stdout F 2023-04-27T10:13:27.950730916Z stdout F 2023-04-27T10:13:27.950740377Z stdout F Kafka Mirror Maker producer configuration:2023-04-27T10:13:27.957949969Z stdout F # Bootstrap servers2023-04-27T10:13:27.95796353Z stdout F bootstrap.servers=rtti-ingress-kafka-bootstrap:90922023-04-27T10:13:27.95796602Z stdout F # Provided configuration2023-04-27T10:13:27.95796868Z stdout F batch.size=327682023-04-27T10:13:27.95797073Z stdout F compression.type=snappy2023-04-27T10:13:27.95797277Z stdout F linger.ms=1002023-04-27T10:13:27.95797469Z stdout F 2023-04-27T10:13:27.95797654Z stdout F 2023-04-27T10:13:27.95797891Z stdout F security.protocol=PLAINTEXT2023-04-27T10:13:27.9579808Z stdout F 2023-04-27T10:13:27.95798263Z stdout F 2023-04-27T10:13:27.95798444Z stdout F 2023-04-27T10:13:27.958075052Z stdout F 2023-04-27T10:13:33.105771564Z stdout F 2023-04-27 10:13:33,105 INFO Starting readiness poller (io.strimzi.mirrormaker.agent.MirrorMakerAgent) [main]",
"kubernetes": {
"pod_name": "green-mirror-maker-5df8d48f89-6ngsw",
"namespace_name": "realtimetraffic",
"pod_id": "",
"host": "",
"container_name": "green-mirror-maker",
"docker_id": "",
"container_hash": "",
"container_image": ""
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request or enhancement on existing features
Projects
None yet
Development

No branches or pull requests