-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Partial Retries Support #1911
Comments
Sorry for delay response. What does 'bad' mean? I assume bad entries can't be flushed to destination, right? For bad chunk, we will add backup feature to avoid useless retry: #1856 |
The Seems like we can use |
Currently, we have default retry mechanizm for it. Hmm... I didn't understand the situation correctly.
I assume your problem is |
@repeatedly - Yes! Our problem is |
I see... I understood the situation. |
@repeatedly - We have a local binary running that provides metadata for log entries based on a label. Our Fluentd output plugin makes one request per log entry if the metadata for that label is not cached locally. Depending on whether the metadata is available, we might choose to retry a subset of log entries. The reason behind this: |
@portante and his co-workers report similar issue. The below is his explanation:
ref: uken/fluent-plugin-elasticsearch#398 (comment) Originally reported in |
@qingling128 I see. Does your record have specific field for your metadata? @cosmo0920 I saw the patch. Why |
According to Red Hat's bugzilla, It seems that there is performance issue. |
@repeatedly @cosmo0920 As I stated in the originally issue, there was not necessarily a performance issue but a problem with the the way chunks were consumed by the fluent-elasticsearch-plugin. If any part of deserializing and processing records in the chunk failed, an exception would bubble up and the chunk would be retried indefinitely; there was no way to get past bad processing. Do we have an example of using the router with error label that would negate the need of #1948 so I could close it? |
@repeatedly - Yes, our records do have a specific field The problem with deciding chunk by In a container world, this means if a Fluentd instance is handling logs from 50 containers, the number of requests we make to Logging API will be 50 * original request number. Previously, we can bundle log entries from 50 containers into one Logging API request. Now we have to make 50 requests (from 50 flushes). This will significantly slow us down. |
Hi, any news about this feature request? |
Hi,
Any chance there is a way to do partial retries in Fluentd? Right now we maintain an output plugin https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud. For each buffer chunk the plugin processes, we might have some "good" entries and some "bad" entries. We'd like to enable users to flush the "good" entries right away but retry the "bad" entries later.
In the current implementation, seems that an output plugin can only throw an exception to indicate that this chunk is not processed and should be retried. The problem is that we either block the "good" entries as well, or we retry the "good" entries again and generate duplicate logs.
Is there a known way to do partial retries correctly with the built-in support?
Alternatively we are thinking about writing a new filter plugin to handle this case or dig into the core code and contribute / maintain a fluentd fork with partial retries.
Wanna get some sanity check before we went too far away unnecessarily.
Thanks,
Ling
The text was updated successfully, but these errors were encountered: