Recombine partial messages of docker/fluentd log driver logs ingested via fluentforwardreceiver #33816

otbe · 2024-06-29T12:49:32Z

otbe
Jun 29, 2024

Hello,

I try to use the collector to receive logs via fluentforwardreceiver, process it a bit and send it off to Loki in my case. In my setup the collector is the log router of a AWS ECS Fargate setup with firelens log configuration. Its going to receive all stdout logs from all containers in the same task. This makes it necessary to be able to work with partial messages whenever the fluentd log driver decides to split the log message (16KB). After a bit of reading I came up for the following solution:

receivers:
  fluentforward:
    endpoint: unix:///var/run/fluent.sock

processors:
  logstransform:
    operators:
      - type: router
        default: noop
        routes:
          - output: recombine
             expr: 'attributes["partial_message"] == "true"'
      - type: recombine
        combine_field: body
        combine_with: ""
        force_flush_period: 300s # added for debugging purposes
        max_unmatched_batch_size: 0
        is_last_entry: attributes["partial_last"] == "true"
        source_identifier: attributes["partial_id"]
      - type: noop

exporters:
  debug:
    verbosity: detailed
    sampling_initial: 10000

service:
  pipelines:
    logs:
      receivers: [fluentforward]
      processors: [logstransform]
      exporters: [debug]

This seems to work in 80% of the cases. If it does not work I get 2 messages that are origianlly one message.

The first one is emitted immediately and contains parts of the middle to the end. The second one is emitted 5 mins later (300s force flush) and contains parts of the beginning to the middle.

Lets consider a message like this splitted by the log driver. All of them belong to the same partial_id:

log1
log2
log3
log4
log5
log6
log7
log8 isLast=true

What seems to happen is the following:
Emitted message number 1 from above:

log5
log6
log7
log8 isLast=true

This message carries a partial_ordinal attribute with the value "5" which means partial message number 5 is the first one in the batch.

Emitted message number 2 from above after 5minutes:

log1
log2
log3
log4

This message carries a partial_ordinal attribute with the value "1" which means partial message number 1 is the first one in the batch.

The only way this can happen (in my mind) is that the order of messages received from the fluentforwardreceiver is not correct/stable. As I said it works in 80% of the cases, sometimes it works for similar sized/the same log message and sometimes not.

While debugging it also made a difference if I just use a single pipeline that uses the fluentforwardreceiver (like the example config from above) or a second debug pipeline. In the latter the likeliness that it works increases to almost 100% which made me think its a timing/load issue inside the fluentforwardreceiver.

Other apps in our landscape use a fluentbit sidecar container to ship stdout logs to Loki and there we use multiline+partial_mode which works perfectly fine.

We're in the middle of switchting a lot of stuff to otelcol for shipping our observability events.
Woudld really appreciate help or any other input here. :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recombine partial messages of docker/fluentd log driver logs ingested via fluentforwardreceiver #33816

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Recombine partial messages of docker/fluentd log driver logs ingested via fluentforwardreceiver #33816

otbe Jun 29, 2024

Replies: 0 comments

otbe
Jun 29, 2024