S3 log storage#5482
Conversation
|
Your basic idea seems fine to me. Thanks! I'd rather use a file instead of the DB as temp store but I don't have a strong preference there. |
|
Thinking if we could provide a more generic way for log storage by calling an api for log-append and one for step finished. So for s3 one cloud easily have a second service that receives those calls. This way a kinda "hacky" implementation detail like having to store intermediate log to db could be abstracted by the service. This could also be sth we could implement similar to the addon forge 🤔 |
|
Ok, I tested this code extensively and found a race condition. Sometimes, in the last chunk of logs, the So I wanted to check if we can tell that the step is over from the logs themselves. I saw that the last log from each step has NULL data: {"id":234196,"step_id":71,"time":4,"line":13,"data":null,"type":0}This is a result of the EOF in this line. But this is not intentional, so I don't want to rely on that. After that, I thought about adding a new line at the end of every log file with So for now, the best solution that I think is to add that to the interface: type LogStorageAddon interface {
// Current operations
AppendLogs(stepID int64, entries []*LogEntry) error
ReadLogs(stepID int64) ([]*LogEntry, error)
DeleteLogs(stepID int64) error
// NEW: Explicit lifecycle events
StepStarted(stepID int64, metadata StepMetadata) error
StepFinished(stepID int64, metadata StepMetadata) error
}I will try to think if there is a better way to indicate that this is the last log append. If you have suggestions, feel free to comment here. Regarding the temp log store, I used the database and not the file store because it is already initialized, and if you use it, the server is stateless and doesn't need a persistent volume. But it could also be a temp file, depending on what solution we go with in the end. |
Or of course, a solution similar to the addon forge. |
Add S3 Log Storage Backend
Draft PR - I'd love to hear your thoughts on this approach before I finalize it.
Planning to test this with multiple S3-compatible storage over the next few days.
I wanted to make my Woodpecker deployment completely ephemeral. With S3 log storage, I can run Woodpecker without any persistent volumes - logs go to S3, everything else stays in the database. Plus I get excellent control over log retention using S3 lifecycle policies.
How It Works
As I explained here #2278 , unfortunately s3 doesn't support appending to an object.
So we can't in logappend simply fetch the log file from s3, add the content and push back to s3.
I implemented it in a hybrid approach that keeps the user experience smooth:
Both uploads and downloads use buffered streaming, so even large log files don't eat up memory.
For authentication, I'm using the AWS SDK's standard environment variables. This works great with IAM roles on AWS where you don't need any explicit credentials, but the SDK supports tons of config methods so users can pick what works for them.
Configuration
It works with any S3-compatible storage - AWS, MinIO, DigitalOcean Spaces, whatever.
Migration is seamless - existing logs stay where they are, new ones automatically go to S3 when you enable it.
I'm curious about your thoughts on this approach and whether there are any compatibility concerns I should address.