Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: partial_message concat #25

Closed
gbleu opened this issue Feb 1, 2020 · 35 comments
Closed

Feature request: partial_message concat #25

gbleu opened this issue Feb 1, 2020 · 35 comments
Assignees

Comments

@gbleu
Copy link

gbleu commented Feb 1, 2020

Docker logs are split into several parts with partial_message when exceeding a certain size, json logs ends up broken and not indexed properly.

@gbleu gbleu changed the title Feature request: concat plugin Feature request: partial_message concat Feb 1, 2020
@PettitWesley
Copy link
Contributor

@gbleu This is a Fluent Bit core feature request; I believe it's covered by this issue: fluent/fluent-bit#337

@kaisermario
Copy link

Hi @PettitWesley we are currently facing the same issue @gbleu was already facing.
We are using aws-fluent-bit image as log router for getting logs into sumologic as described here: aws/containers-roadmap#39
This is basically working well but we also have the problem with spliited logs by docker daemon according to 16k limitation.
fluent-bit#337 seems to handle in general to merge multi line logs.

I have learned that there is a docker_mode for exactly our problem when using tail input in fluent bit.
https://docs.fluentbit.io/manual/pipeline/inputs/tail#docker_mode
Unfortunately we do not have to use tail according to:
https://help.sumologic.com/03Send-Data/Collect-from-Other-Data-Sources/AWS_Fargate_log_collection#A._Create_a_Fargate_launch_type

In the meantime ist there a solution I can pick up?
Thanks in advance.

@PettitWesley
Copy link
Contributor

@kaisermario Unfortunately I am not aware of a solution with Fluent Bit right now.

Fluentd has a concat filter though which can be used to concatenate messages, and could solve this use case: https://github.com/fluent-plugins-nursery/fluent-plugin-concat

@kaisermario
Copy link

Hi @PettitWesley ,
thank you for your feedback.
Do you have a clue if there is a suitable image with fluentd which can replace aws-for-fluent-bit docker image?
Thanks.

@PettitWesley
Copy link
Contributor

@kaisermario
Copy link

Hi @PettitWesley, we do not use kubernetes - but It should be possible to easily create an image by our own.
Thanks.

@PettitWesley
Copy link
Contributor

@kaisermario I think you can probably use the Kubernetes images outside of Kubernetes- just change the configuration file.

@kaisermario
Copy link

kaisermario commented Aug 27, 2020

Hi @PettitWesley,
just trying to use a self built docker image with fluentd and installed plugins.
Locally I can start this docker container.
Running them in ecs fargate with memoryReservation: 300 the task will be killed after some seconds.

Logs show:
`

2020-08-27T14:23:26.990+02:00

/usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/literal_parser.rb:141:in `scan_nonquoted_string': stack level too deep (SystemStackError)

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/literal_parser.rb:86:in `scan_string'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/literal_parser.rb:75:in `parse_literal'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:115:in `parse_element'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:95:in `parse_element'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:168:in `block in eval_include'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:162:in `each'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:162:in `eval_include'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/lib/fluent/config/v1_parser.rb:145:in `parse_include'

2020-08-27T14:23:26.991+02:00

... 8178 levels...

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/2.5.0/rubygems/core_ext/kernel_require.rb:59:in `require'

2020-08-27T14:23:26.991+02:00

from /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.10.4/bin/fluentd:8:in `<top (required)>'

2020-08-27T14:23:26.991+02:00

from /usr/bin/fluentd:23:in `load'

2020-08-27T14:23:26.991+02:00

from /usr/bin/fluentd:23:in <main>'

Do you have an idea? I am desperate...
Sorry for using this issue again.

Thanks.

@PettitWesley
Copy link
Contributor

@kaisermario Can you share your full Fluentd configuration and Dockerfile.

@PettitWesley
Copy link
Contributor

PettitWesley commented Aug 28, 2020

I am not sure if partial_message is set in FireLens. We use the Fluentd Docker log driver under the hood, I am not sure it sets that: https://aws.amazon.com/blogs/containers/under-the-hood-firelens-for-amazon-ecs-tasks/

Still, I see lots of examples online on how to do this:

@kaisermario
Copy link

kaisermario commented Aug 28, 2020

Hi @PettitWesley ,
thanks for quickly responding.

In the meantime I took this step.
Reasons for the error above were detrimental memory settings at ecs task / container and non-observance of
Important When using a custom configuration file, you must specify a different path than the one FireLens uses. Amazon ECS reserves the /fluent-bit/etc/fluent-bit.conf filepath for Fluent Bit and /fluentd/etc/fluent.conf for Fluentd.
...
Yes, I already studied the docs around fluentd, concat plugin and corresponding examples.

Currently this is my fluentd conf:

<filter **>
    @type stdout
</filter>

<label @NORMAL>
  <match **>
    @type stdout
  </match>
</label>

<filter **>
  @type concat
  @log_level debug
  key log
  partial_key partial_message
  partial_value true
  flush_interval 5
  separator ""
</filter>

<match **>
    @type sumologic
    @log_level debug
    endpoint "https://endpoint1.collection.eu.sumologic.com/receiver/v1/http/#{ENV['SUMOLOGIC_TOKEN']}"
    log_format json
    source_category "#{ENV['STAGE']}-#{ENV['REGION']}"
    source_name "#{ENV['SERVICENAME']}"
    open_timeout 10
</match>

The goal was to achive a concatenation of splitted logs by docker daemon.

Environment: Fargate 1.3, Firelens
Unfortuanetly I receive {"timestamp":1598648063867,"error":"#<Fluent::Plugin::ConcatFilter::TimeoutError: Timeout flush: ... error when fluentd processes splitted messages ("partial_message":"true").
In Sumologic I can see now the message with error from fluentd/plugin, but concatenation seems not to work.

Reading all issues here https://github.com/fluent-plugins-nursery/fluent-plugin-concat I think this issue is poorly understood by many.
fluent-plugins-nursery/fluent-plugin-concat#63
fluent-plugins-nursery/fluent-plugin-concat#82

Any ideas?
I am a little desperate.

Cheers

@PettitWesley
Copy link
Contributor

Haven't looked at your config yet, but I checked and (as you also saw) partial_message field from docker is supported in FireLens.

@PettitWesley
Copy link
Contributor

I think its worth re-opening this issue to track the feature request for concat in Fluent Bit, and for debugging @kaisermario issue.

@PettitWesley PettitWesley reopened this Aug 29, 2020
@kaisermario
Copy link

btw. fluentd-concat-plugin seems to work not properly and also not beeing maintained actively...

@bpottier
Copy link

Any update on this?

@PettitWesley
Copy link
Contributor

No hard timeline or promise, but I suspect we would get around to implementing and releasing this in the first quarter of next year

@bpottier
Copy link

Thanks for the response. In the mean time, do you know if there is a working example for joining partial messages from ECS using a custom fluent confit?

@PettitWesley
Copy link
Contributor

@bpottier Here's the FireLens example which uses fluent-plugin-concat: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluentd/multiline-logs

It shows using a regular expression to match multiline logs. You'd want to to use the partial_message field instead.

btw. fluentd-concat-plugin seems to work not properly and also not beeing maintained actively...

If this is true though then we're all out of luck. I'd be very surprised if the fluent concat plugin didn't work though; the Fluentd community is huge and its usage still is larger than Fluent Bit. I haven't had time to play with it myself yet though and check if it works.

@bpottier
Copy link

bpottier commented Dec 4, 2020

@PettitWesley I was never able to get the logs to re-join on the partial_message key. Thanks for the help though.

It's probably worth mentioning for anyone who stumbles across this issue that AWS Fargate platform version 1.4.0 uses containerd instead of Docker and the partial_message key is no longer added to the log message when the size exceeds 16K.

@iiitong
Copy link

iiitong commented Jun 10, 2021

Any update on this?

@gbleu
Copy link
Author

gbleu commented Jul 20, 2021

Good news, now available in Fluent Bit v1.8 fluent/fluent-bit/issues/337#issuecomment-882953961

@opteemister
Copy link

So @gbleu does new fluent-bit multiline filter work for you?

I was trying to use it for your containers with long logs, but nothing changed. Logs are still split by multiple messages.
We use Fargate with Firelens and forward logs from fluent-bit to fluentd.
Before that we had ECS containers (not Fargate) forwarded logs by using Docker fluentd log driver. And those logs had partial_message field. But I don't see it now with Fargate. Do you know is it because of Containerd instead of Docker or because of the Firelens?

We run java services in a container and they write logs in JSON format. So docker logs should look like this
{ "time" : "", "log" : "{"message" : "some long......long message", "another_field" : "content" }" }

And I used the next fluent-bit config

[SERVICE]
Flush 1
Grace 30
log_level debug

[FILTER]
name multiline
match *
multiline.key_content log
multiline.parser docker

Fluent Bit v1.8.3�
Fargate 1.4.0

I expected that will concatenate the message, but instead of that we've got 2 logs:
{ "time" : "", "log" : "{"message" : "some long... " }
{ "time" : "", "log" : "...long message", "another_field" : "content" }" }

Could someone help me with this? Am I doing something wrong or missing something? Or multiline parser is not correctly working for our case?

Thanks in advance

@opteemister
Copy link

opteemister commented Aug 8, 2021

I just also tried on a simple setup.
Docker container writing long logs to fluent-bit container which outputs to file.
I also used FILTER block with multiline and docker parser. But no changes. Still see split logs.

But if we send logs from Fargate service via Firelens to fluent-bit (with multiline filter) and then again to another fluent-bit (with multiline filter) and then to somewhere else. Logs are concatenating after the second fluent-bit

@StasKolodyuk
Copy link

@opteemister the same issue for me. Fargate logs are not concatenated using multiline filter. Note that if you use Fargate 1.4+ then it's using a containerd runtime and you should use multiline.parser cri instead. However, it didn't work for me either.
#100 (comment)

@f0o
Copy link

f0o commented Aug 19, 2021

@kaisermario

How did your journey with fluentd end?

Have you tried with CRI's logtag instead of docker's partial_message? (Fargate 1.4+ seems to be CRI)

@PettitWesley
Copy link
Contributor

@f0o Fargate 1.4 should not be using CRI format logs in ECS, we use this package which wraps the docker log driver code: https://github.com/aws/amazon-ecs-shim-loggers-for-containerd

@opteemister
Copy link

@PettitWesley
Interesting. If Fargate 1.4. is wrapping docker logs - then it should be able to concatenate logs inside and send already full log. Isn't it?
Wrapper should have all necessary fields from docker to do that. Like partial_message and is_last_message fields.

It would solve all problems for everyone.

@PettitWesley
Copy link
Contributor

On the Fluent Bit side, I have started working on this. ETA is ~1 month.

On the Fargate side, I opened a public issue to track (no ETA available): aws/containers-roadmap#1550

@PettitWesley
Copy link
Contributor

@opteemister

If Fargate 1.4. is wrapping docker logs - then it should be able to concatenate logs inside and send already full log. Isn't it?

Yes, this is true. However, I suspect we will find that it's much easier to implement using the standard method of having the runtime emit a partial message identifier and then use a log agent (Fluent Bit) to concatenate them.

This is better from a buffering standpoint. I don't want the container runtime to be buffering huge logs and then trying to send them to Fluent Bit. I'd rather stream it small logs and then have it worry about the buffering.

@PettitWesley
Copy link
Contributor

Hey folks! Great News! I finally got around to implementing this!

PR: fluent/fluent-bit#5037

And here's a pre-release image you can use:

144718711470.dkr.ecr.us-west-2.amazonaws.com/partial_message:pre-release

Here's an example config:

[FILTER]
    name                  multiline
    match                 *
    multiline.key_content log
    mode                  partial_message

Basically, you use the multiline filter which now has two modes, partial_message and parser (which is the old regex parsing functionality). These two are mutually exclusive but remember that you can have more than one filter in your conf.

Let me know what you think. Thanks!

@gschelter
Copy link

@PettitWesley I'm unable to pull the image. Is it public?
Error response from daemon: Head "https://144718711470.dkr.ecr.us-west-2.amazonaws.com/v2/partial_message/manifests/pre-release": no basic auth credentials

@f0o
Copy link

f0o commented Apr 22, 2022

Any guesstimate on when partial_message becomes "stable" for firelens?

@PettitWesley
Copy link
Contributor

@f0o The partial message fluent bit support should be released in AWS for Fluent Bit soon once we uptake 1.9.3. As far as the PV 1.4 work goes, that is currently a work in progress and should release very soon, but I can't publicly given any exact date.

@PettitWesley
Copy link
Contributor

@gschelter sorry missed your comment, You need to be logged into ECR. I would recommend using ECS-CLI. However, you can now just use the 1.9.3 upstream Fluent Bit image from docker hub to test this feature. This will be released in AWS for Fluent bit soon.

@PettitWesley
Copy link
Contributor

This is implemented in the latest release: https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.24.0

FireLens example will be merged shortly: aws-samples/amazon-ecs-firelens-examples#65

The feature request is implemented, so closing this issue. We prefer you open a new issue if you have problems using the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants