-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add upload timeout when using disk buffer to maintain fresh logs on s3. #8
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the goroutines it appears error returns were replaces with logs or panic calls. Lets go back to returning nil/errors through a channel (we can keep all the logs). One event manager shouldn't be able to bring down everything by panicing.
| `use_disk_buffer` | Buffer logs on disk prior to sending to S3. See [Disk Buffering](#disk-buffering) for more info. | `TRUE` | | ||
| `disk_buffer_path` | Directory for disk buffer. Path should be unique for each output. | `tmp/out_clp_s3/1/` | | ||
| `upload_size_mb` | Set upload size in MB when disk buffer is enabled. Size refers to the compressed size. | `16` | | ||
| `timeout` | Upload timeout if upload size is not met. For use when disk buffer is enabled. Valid time units are s, m, h. | `15m` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets point people to time.ParseDuration
as it is a more complete description of what is/isn't valid.
if m.Writer.GetUseDiskBuffer() { | ||
m.diskUploadListener(config, uploader) | ||
} else { | ||
m.memoryUploadListener(config, uploader) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets refactor this. I could be wrong, but it looks like all of the upload path should be merged. We should still have a timeout in the memory buffered version and we can call CheckEmpty
which should cover the differences.
Essentially, we shouldn't need to know if it is memory or disk buffering in the manager code.
I guess this PR is waiting for changes from @davemarco? |
Description
Adds a timeout when using disk buffer to maintain fresh logs on s3. Previously only disk buffer size was considered for upload criteria.
Smaller changes include cleaning up the format of the plugin's logs, and adding newlines in IR messages to make decompressed output more readable.
Next PR will add a guard preventing the same disk buffer path for two output instances. For now I just put a warning in the README.
Timeout goroutines
Adding a timeout is non-trivial. For example, consider a scenario where an application writes a few logs and never writes again. The logs will be stored in the disk buffer, and then the plugin will return execution to Fluent Bit engine. Since there are no more writes, the plugin never regains execution to send these logs. I could not think of an easy way to solve this problem.
The solution in this PR spawns goroutines for each tag. The goroutine will listen for a timeout event, and when received will send logs to s3.
Unfortunately, this multithreaded structure introduces synchronization problems. Mutexes were also added to prevent an upload while writing.
Next, I added wait groups to ensure goroutines are closed by plugin when program receives a kill signal. Without wait groups, goroutines will be terminated by OS. Could help avoid scenario where IR/Zstd postamble is added, then plugin is terminated and stream not yet uploaded to S3. On restart, the stream would have a double postamble. Probably overkill..., but trying to be safe.
Lastly, this PR should also bring some performance improvement. Not planned, just a result of design. Since the uploading is done in its own goroutine, the plugin can pass execution back to FluentBit engine sooner. This is one reason why disk buffer size is also using goroutine. Similarly why the goroutine is also used with memory buffer even though there is no timeout so not required.
Log format
Made the log format consistent for all logs. They all now start with "[out_clp_s3] ".
IR msg changes
Decompressed IR streams appear as one concatenated string. I added a space separating the log from the timestamp. I also added a new line that separates logs from each other. I spoke about newline with Kirk earlier but only added now.
Validation performed
Tested that timeout worked correctly