Skip to content

S3 log storage#5482

Closed
Levy-Tal wants to merge 9 commits into
woodpecker-ci:mainfrom
Levy-Tal:s3-log-storage
Closed

S3 log storage#5482
Levy-Tal wants to merge 9 commits into
woodpecker-ci:mainfrom
Levy-Tal:s3-log-storage

Conversation

@Levy-Tal
Copy link
Copy Markdown
Contributor

@Levy-Tal Levy-Tal commented Sep 4, 2025

Add S3 Log Storage Backend

Draft PR - I'd love to hear your thoughts on this approach before I finalize it.
Planning to test this with multiple S3-compatible storage over the next few days.

I wanted to make my Woodpecker deployment completely ephemeral. With S3 log storage, I can run Woodpecker without any persistent volumes - logs go to S3, everything else stays in the database. Plus I get excellent control over log retention using S3 lifecycle policies.

How It Works

As I explained here #2278 , unfortunately s3 doesn't support appending to an object.
So we can't in logappend simply fetch the log file from s3, add the content and push back to s3.

I implemented it in a hybrid approach that keeps the user experience smooth:

  1. During pipeline execution, while step is still running chunks of logs go to the database. In the same way that the default database log store is used.
  2. When a step completes, We upload all its logs to S3 and clean them from the database
  3. When someone views logs later, I try S3 first and fall back to database if needed

Both uploads and downloads use buffered streaming, so even large log files don't eat up memory.

For authentication, I'm using the AWS SDK's standard environment variables. This works great with IAM roles on AWS where you don't need any explicit credentials, but the SDK supports tons of config methods so users can pick what works for them.

Configuration

WOODPECKER_LOG_STORE=s3
WOODPECKER_LOG_STORE_S3_BUCKET=my-bucket
WOODPECKER_LOG_STORE_S3_BUCKET_FOLDER=logs  # optional
WOODPECKER_LOG_STORE_S3_PATH_STYLE=true     # for MinIO/S3-compatible services

# AWS SDK handles auth via env vars, IAM roles, credential files, etc.
# For basic setup:
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_REGION=us-east-1

It works with any S3-compatible storage - AWS, MinIO, DigitalOcean Spaces, whatever.

Migration is seamless - existing logs stay where they are, new ones automatically go to S3 when you enable it.

I'm curious about your thoughts on this approach and whether there are any compatibility concerns I should address.

@qwerty287
Copy link
Copy Markdown
Contributor

qwerty287 commented Sep 11, 2025

Your basic idea seems fine to me. Thanks! I'd rather use a file instead of the DB as temp store but I don't have a strong preference there.

@anbraten
Copy link
Copy Markdown
Member

anbraten commented Sep 11, 2025

Thinking if we could provide a more generic way for log storage by calling an api for log-append and one for step finished. So for s3 one cloud easily have a second service that receives those calls. This way a kinda "hacky" implementation detail like having to store intermediate log to db could be abstracted by the service. This could also be sth we could implement similar to the addon forge 🤔

@Levy-Tal
Copy link
Copy Markdown
Contributor Author

Ok, I tested this code extensively and found a race condition.

Sometimes, in the last chunk of logs, the logAppend function gets to this line before step.Finished is set to true. This makes the logAppend function skip the upload to S3, and the log is never uploaded.

So I wanted to check if we can tell that the step is over from the logs themselves. I saw that the last log from each step has NULL data:

{"id":234196,"step_id":71,"time":4,"line":13,"data":null,"type":0}

This is a result of the EOF in this line. But this is not intentional, so I don't want to rely on that.

After that, I thought about adding a new line at the end of every log file with type=2 (exitCode) here, but there's no easy way of adding it.

So for now, the best solution that I think is to add that to the interface:

type LogStorageAddon interface {
    // Current operations
    AppendLogs(stepID int64, entries []*LogEntry) error
    ReadLogs(stepID int64) ([]*LogEntry, error)
    DeleteLogs(stepID int64) error
    
    // NEW: Explicit lifecycle events
    StepStarted(stepID int64, metadata StepMetadata) error
    StepFinished(stepID int64, metadata StepMetadata) error
}

I will try to think if there is a better way to indicate that this is the last log append. If you have suggestions, feel free to comment here.

Regarding the temp log store, I used the database and not the file store because it is already initialized, and if you use it, the server is stateless and doesn't need a persistent volume. But it could also be a temp file, depending on what solution we go with in the end.

@Levy-Tal
Copy link
Copy Markdown
Contributor Author

Thinking if we could provide a more generic way for log storage by calling an api for log-append and one for step finished. So for s3 one cloud easily have a second service that receives those calls. This way a kinda "hacky" implementation detail like having to store intermediate log to db could be abstracted by the service. This could also be sth we could implement similar to the addon forge 🤔

Or of course, a solution similar to the addon forge.

This was referenced Sep 14, 2025
@qwerty287
Copy link
Copy Markdown
Contributor

I'm closing this PR as of #5507 and #5530. After these two PRs are merged, this should be easily possible with an addon which lives in a separate repository. We'll also include a list of known addons so it can be listed there then.

@qwerty287 qwerty287 closed this Sep 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants