-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Multiline Filter Re-Design #4309
Comments
@nokute78 This is my design to fix the generic multiline filter, which I wrote to help my team understand. I have discussed most of this design with @edsiper, except for the partial message key part. My main question for the community is whether this new design should be gated behind an option or not. My opinion is that we should introduce the fixed filter in V1.9.0 and should not gate this new version behind an option. This is not a new feature, its a fix to the existing plugin which is entirely broken for most use cases. What do you guys think? What is the rough date for 1.9.0? |
For the AWS ECS FireLens use case the
The main difference here is that the last message is denoted by So with these metadata in mind I can define the following test cases that would not work with the propsed design Partial followed by non partial
Mutliple interspersed partials
Out of order partials
So in addition to your configuraiton options I could imagine the following:
To be honest I don't know if the 'Mutliple interspersed partials' and 'Out of order partials' are actually a valid use-case (I didn't encounter them so far) but the last partial of a message is definitely denoted by the specific |
@PettitWesley Thank you for summarizing.
FYI: We should take care intermediate buffer which is described
I agree if there is no breaking change. |
@tsalzinger Thank you for information. @PettitWesley Note: |
Some important notes: Original multiline filter aims to work at Chunk level context only. So I'd say this is a feature extension to use buffers, which is ok. If there are some perf penalties by using buffering (which is expected) we might consider offering buffering functionality under a new configurable option.
I got confused here, is this way to split messages "AWS specific" ?, reassembling this kind of message and considering out of order delivery might be problematic, it should be solved "before" hitting Fluent Bit pipeline. |
But if there's no valid use case for the non-buffered mode, then what's the point in being able to disable it? It will only confuse users. I am willing to add a config option, but I want it to default to buffered mode.
No. This is not AWS specific nor does the splitting happen in an AWS owned component. Container runtimes that process container stdout/stderr logs buffer them in chunks and split lines over 16KB. Docker does this. This is a well known use case, which the Fluent plugin concat filter can solve: https://github.com/fluent-plugins-nursery/fluent-plugin-concat |
Is there a time estimation for solving this issue? |
@Juliahahaha I can't provide any hard estimate publicly but I am actively working on implementing this design. Please see here for a fully working prototype that you can test: aws/aws-for-fluent-bit#100 (comment) |
Regarding the configuration of Example output (prefix with log id and n/m is added by me: the CRI-O output starts with the timestamp):
This illustrates:
Hope to see this working out of the box in the new solution. |
@mdraijer My understanding was that I do not need to cover the CRI use case in the filter, since that is already solved in tail plugin with the new multiline feature and built-in CRI parser. https://docs.fluentbit.io/manual/pipeline/inputs/tail#multiline-core-v1.8 CRI logs would always be sent to log files right? This is a pattern in the logs, not a separate key also right? The filter is mainly targeting the fluentd docker log driver or other log streaming use cases. If you have log files, you should always just use the multiline support in tail. It will be more performant. |
@PettitWesley I checked again and for many log messages it is indeed working fine. However there is one specific use case that is breaking the concatenation: Lines in different streams (stdout, stderr) are mixed in the CRI-O logfiles. See my example for messages C and D: the parts of C are not successively printed in the CRI-O log because one line from D is in between. This can only happen because C is printed in a different stream than D. |
@PettitWesley I understand now. The text in the issue description has a lot about the parsing of log files, which made me miss the point in the lines:
which is of course that the problem exists only in the filter 'multiline' and not in the input 'tail-with-multiline-parser'. I have now filed a bug for this specific issue: #4387 Sorry for the confusion. |
@mdraijer Thank you for this thorough investigation. Also this is useful, since I think the mixing of ordering of messages that come from stderr/stdout is what @tsalzinger was bringing up. (Thanks for that post too, I am behind on diving deep into it myself, but will get to it soon). |
@linuxxin We need more info than this picture to help you (like configuration and deployment details). Also, this issue is for the multiline filter re-design discussion, not for debugging user issues, so please open a new issue. |
#4448 (comment) Submitted please help to check thank you |
Hello It might simplify many scenarios , for example by specifying the first line pattern with timestamp format and the secvond statement that simply can define that all what isn't the timestamp is related to the previous line |
@tsalzinger I have updated the design for the partial_message support, which I am working on implementing now. Feel free to check the update, and thank you for your feedback. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
No stale please! |
@aslafy-z Both the buffer option and the partial_message support have been released: https://fluentbit.io/announcements/v1.9.3/ |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the |
This issue was closed because it has been stalled for 5 days with no activity. |
New Fluent Bit Multiline Filter Design
Background
In this section, you will learn the following key background information which is necessary to understand the plan and design:
Understanding this design also requires a basic understanding of Fluent Bit and its log pipeline and routing and how customers configure plugins. Please read the Fluent Bit docs for that background.
How Container Logs are processed in K8s
In k8s, container stdout logs are written to files. Here is the data-flow when a customer uses Fluent Bit:
The key take-away here is that logs do not go straight to a log file, they pass through the container runtime first.
How Container Logs are processed with fluentd docker log driver
When the fluentd log driver is used (ex: Amazon ECS FireLens), the following is the data-flow:
The key take-away here is that logs do not go straight to Fluent Bit, they pass through the container runtime and log driver first.
Multiline Log Use Case 1: Stack traces and logs with newlines
The main use case when folks talk about multiline is to concatenate log events that were produces with newlines. This means that the application produces a single log statement that contained newlines. There are many cases where this can happen; one important example is stack traces. Stack traces are almost always split into a series of lines.
For example, a stack trace might look like this:
When a user views the stacktrace in a monitoring destination like Amazon CloudWatch, they want a single log event that is the entire stack trace.
Remember from above that in container use cases logs always pass through the container runtime first. The runtime uses newlines as a delimiter for logs- logs are split into lines. In k8s, each line is written line by line to the log file. With the fluentd log driver, each line is new event that is sent by the log driver across the tcp or unix socket connection.
Remember that Fluent Bit does not understand what a user’s logs mean. When it’s reading a log file, it just sees a series of lines. And when it’s receiving events over the FireLens unix socket, it also sees a series of events. Thus, without a multiline log concatenation feature, Fluent Bit will treat each line as a separate event.
Here is an example of what the above stack trace might look like after processing with the Fluentd Docker Log Driver (some default fields, like container name and source, omitted for brevity):
AWS for Fluent Bit issue: aws/aws-for-fluent-bit#100
Multiline Log Use Case 2: Long Log Lines
The other major multiline logging use case is long log lines. Monitoring is critical to modern containerized applications, and we have seen many customers that produce very large and verbose log events. Structured logging, where logs are generated by the app with a set schema/format (usually JSON) is also very common. Many customers, will log huge amounts of information in a single JSON log event- 1 MB log events are not unheard of.
These large logs are emitted by the application in a single line of code/single print statement. However, the container runtime must process them, and most container runtimes, including Docker and Containerd, will split log lines when they are greater than 16KB. In the case of log files, each 16KB chunk of a split log is written to a new log line. In the case of the Fluentd Docker Log driver, each 16KB chunk of data is a separate event.
For the Fluentd Docker Log Driver, a key will be set to note that the message is partial. Below are real split docker logs. A large log line was sent to both stdout and stderr, which are separate pipes, so each is split and creates a series of messages. Notice that stderr and stdout have different values for partial_id.
The following fields should be present in split docker logs:
partial_message
: boolean to signal if the message is partial or not.partial_id
: all parts of a split message get the same unique IDpartial_ordinal
: counts up from 1, ordering the parts of the split message.partial_last
: is this the last split message or not.AWS for Fluent Bit issue: aws/aws-for-fluent-bit#25
Fluent Bit Multiline Support
Fluent Bit currently has partial support for multiline log use cases.
There is a new multiline API: https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/multiline-parsing
There is support in key plugins:
Technical Design: Fluent Bit support for multiline log use cases
Technical Background: how filters work
To understand why the generic multiline filter is broken, we need to first understand how filters fit into the Fluent Bit log pipeline.
This is how the log pipeline is presented to a user. However, internally, from the point of view of Fluent Bit’s concurrency model, filters are implemented as an extension of inputs. Filters are invoked serially in the same context as an input that is ingesting a log record. Filters can not take advantage of concurrency or perform asynchronous operations. They are expected to return quickly when sent log records.
As shown in the diagram above, log records are processed by a filter as soon as the input ingests them. This is important, because what records the filter sees will be determined by the behavior of the input.
Some inputs (Tail) ingest many records all at once, and thus a filter that comes after the tail input will see an entire chunk of records. However, other inputs (Forward), especially inputs that use network connections, will only ingest a single record at a time. This means the filter will receive each record one by one.
This can be seen in the call chain for an input to ingest records:
Technical Background: how the multiline core library works
The new multiline core library is fairly simple and straightforward to use. Code is here.
Here’s how it works:
Why the multiline filter is broken: It can’t do asynchronous buffering
The multiline filter is broken because filters are expected to be serial code that can perform their work exclusively on the records they were sent in a single invocation. This means that when a filter is sent records one by one, it must return something for each record. However, if the record is a multiline, then a single record is useless. The filter needs to receive all records that compose a single multiline log, and then concatenate them together, and then return a single new record. This is not possible with the current architecture.
A reasonable question to ask is: why can’t the filter simply buffer each line until it has a completed multiline record? Why can’t it simply receive each record one by one, returning nothing each time and buffering until it has all records composing a multiline message, and then when the multiline is complete, return the whole concatenated message?
The filter could be implemented with this simple buffering. However, there is a problem that would make it fail in some cases. A multiline can not be known to be completed until a new record is received that is not part of the currently buffered multiline. That is how the multiline library works, it looks at each incoming record, applying its parsers, and the multiline is only known to be complete once a new line does not match a continuation parser.
This is easy to understand with an example:
Imagine processing this input line by line. Until the “another line...” line is received, you would have no way of knowing whether or not the next line will be another part of the stack trace, or not.
So for the simple buffering design in the filter, imagine that only the stack trace is received, and after that, no new log records are received and Fluent Bit is shut down. The filter would never know the multiline was completed, and thus could not flush it. And furthermore, filters do not have any way to asynchronously flush data, and they have no way of flushing buffered data on shut down. A filter is code that should not have any context shared between invocations, it’s just a simple callback that receives data, processes it serially, and then returns.
Thus, the filter could get “stuck” and never emit some records if it used a simple buffering design.
Solution: Use in_emitter and a flush timer
The solution is to use the in_emitter plugin. This is an internal plugin introduced when the rewrite_tag filter was created. It’s an internal API that allows plugins to register an input plugin that they can emit records too, and the input will then emit the records into the beginning of the Fluent Bit pipeline. This allows us to get around the limitation that filters can not asynchronously flush data outside of the synchronous filter callback. The multiline library also supports an asynchronous flush timer, which can be used to create a max time that it can buffer data.
Together, these can be used to fix the multiline filter. When the filter callback is run with some data, it will return no records. Instead, the records are ingested into the multiline library and buffered. Once the multiline record is complete and concatenated, or the flush time expires, the multiline flush callback will be triggered, which will write to the in_emitter instance. The new concatenated records will be re-ingested into the log pipeline under the same tag.
To prevent a cycle in the log pipeline, the filter will not process/ingest records from its own emitter input. (Remember, they are re-ingested with the same tag, so the filter match pattern would match the new records too). Changes will be made to the filter callback to pass it the input instance that ingested the log records, allowing the filter to check whether records came from its own in_emitter input plugin. If they came from the emitter instance, processing was already completed and the records will be passed down the pipeline as is.
This has a side effect that other filters that come before the multiline filter will be applied twice. Thus, the documentation will call out that customers should always configure the multiline filter to be the first filter.
Multiline filter Stream Design
Remember, that the multiline library supports multiple streams of data. Each stream is basically a buffer for data that can be processed together. The filter will use the combination of the unique full input instance name + the tag to uniquely identify a stream of data.
Partial Message Key Support
The design we have discussed so far only covers multiline Use Case 1. There is also Use Case 2, long messages that were split and must be re-joined. The generic filter only needs to support the type of partial messages ingested by the forward input, the tail plugin already supports partial messages written to log files. In the forward input case, the partial messages are noted by a key in the record.
Recall the example from earlier:
The multiline library does not support this multiline use case. Instead, the filter will be implemented with new partial message key buffering code. The code will use a similar design as explained for the multiline library; the same in_emitter instance will be used to re-ingest concatenated records into the pipeline. A timer callback will be used to implement a max timeout for buffered data.
The algorithm for buffering and emitting partial records is very simple:
partial_message
key is present and its value is “true” (case insensitive), create a new buffer for the log using itspartial_id
if one does not already exist. If one does, append the data to the buffer. Each buffer will store a timestamp for when it was last added too, and will be flushed when that timestamp is older than the configured flush timeout.partial_last
key is found to be true for a uniquepartial_id
, the log is complete and will be flushed.New Configuration Options and Filter CX
Please see the existing filter configuration options as a refresher: https://docs.fluentbit.io/manual/pipeline/filters/multiline-stacktrace#configuration-parameters
The following new configuration options will be added.
Design Limitations
flush_ms
which is a good bit lower than the serviceGrace
setting which controls the grace period for Fluent Bit to send all logs on shutdown.Multiline Filter Launch Proposal
These changes will fix the multiline filter and make it function as users expect. However, it is also a very significant change to how the filter currently works, and this change is transparent- users will see the new in_emitter input and will incur some slight overhead from it from the buffering. Thus, this change should ideally be introduced in v1.9.0, the next minor version.
A discussion will need to be had with the community on whether this “fixed” mode can be the only experience post-launch, or if it should be gated behind a configuration option that would allow users to return to the old (broken) behavior if desired. Our recommendation is that no such option should be added, this is a bug fix to a broken component, not a new feature; there is no reason why a user would prefer the old broken behavior.
Prototype
A fully functional prototype has been created and shared with customers, it can solve Multiline Use Case 1: aws/aws-for-fluent-bit#100 (comment)
The text was updated successfully, but these errors were encountered: