[eval] fix(agent): workflow runner use shutdown context#5
Open
Uzay-G wants to merge 8 commits into
Open
Conversation
fix(agent): use shutdown context for reporting metadata Co-authored-by: 6543 <6543@obermui.de>
|
Review started. |
There was a problem hiding this comment.
All checks passed.
Expected behavior
- When a workflow is canceled from the UI/API, the server sends a cancel signal through the queue that propagates to the agent
- The agent's
Wait()goroutine receives the cancel error, sets thecanceledflag, and cancels the workflow context - Pipeline execution stops at the next stage boundary or backend operation (StartStep, WaitStep, DestroyStep)
- SIGTERM also cancels the workflow context, stopping execution
- The
canceledflag usesatomic.Boolfor thread-safe access between the cancel goroutine and the main goroutine - The
WorkflowState.Canceledfield is removed; cancellation is now determined by the queue error state and normalized error handling - All three binaries (server, agent, CLI) build and start correctly with the refactored import paths
What happens
- ✅ Cancellation propagates correctly:
queue.ErrorAtOnce(ErrCancel)→queue.Wait()returns error → agent cancel goroutine fires →canceled.Store(true)→workflowCancel()→ pipeline seescontext.Done()→ returnsErrCancel - ✅ Server starts, agent connects via gRPC and polls for workflows; health check returns 204
- ✅ SIGTERM path works: direct
cancel()call stops pipeline execution immediately - ✅ Normal completion path unaffected:
queue.Done()causesWait()to return nil, no false cancellation - ✅ All 11 queue tests, 2 pipeline tests, 1 RPC test, and 17 server/pipeline tests pass with no regressions
Detailed evidence
Setup
export PATH="/usr/local/go/bin:$HOME/go/bin:$PATH"
export GOPATH="$HOME/go"
export GOMODCACHE="$HOME/go/pkg/mod"Build
All three binaries compile successfully:
$ make build-agent
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build ... -o dist/woodpecker-agent
$ make build-server
CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build ... -o dist/woodpecker-server
$ make build-cli
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build ... -o dist/woodpecker-cli
$ ls -la dist/
-rwxr-xr-x woodpecker-agent (52MB)
-rwxr-xr-x woodpecker-server (53MB)
-rwxr-xr-x woodpecker-cli (65MB)
Server + Agent start
$ WOODPECKER_HOST=http://localhost:8000 \
WOODPECKER_AGENT_SECRET=dev-agent-secret \
WOODPECKER_GITEA=true \
WOODPECKER_FORGE_URL=http://localhost:3000 \
WOODPECKER_FORGE_CLIENT=dummy-client \
WOODPECKER_FORGE_SECRET=dummy-secret \
WOODPECKER_DATABASE_DATASOURCE=/tmp/woodpecker.sqlite \
./dist/woodpecker-server &
$ curl -s -o /dev/null -w "%{http_code}" http://localhost:8000/healthz
204
$ WOODPECKER_SERVER=localhost:9000 \
WOODPECKER_AGENT_SECRET=dev-agent-secret \
WOODPECKER_BACKEND=local \
./dist/woodpecker-agent &
Agent output:
{"message":"agent registered with ID 2"}
{"message":"starting Woodpecker agent with version 'next-860fc34c1d' and backend 'local' using platform 'linux/amd64' running up to 1 pipelines in parallel"}
{"message":"polling new steps"}
{"message":"request next execution"}
Agent connects, registers, and starts polling for work.
Queue cancellation demo
Go program exercising the exact cancel path from the PR (queue → Wait → cancel):
$ go run /tmp/demo_cancel.go
1. Task pushed to queue
2. Task polled by agent: ID=workflow-123
3. Wait started (listening for cancel)
4. Canceling task via ErrorAtOnce (simulating server cancel)...
5. ErrorAtOnce completed
6. Wait returned with error: queue: task canceled
-> This is correct! The cancel signal was received.
DEMO RESULT: Cancellation propagation works correctly.
- Queue.ErrorAtOnce() closes the task's done channel with ErrCancel
- Queue.Wait() unblocks and returns the error
- Agent's cancel goroutine would see err != nil
- Agent calls cancel() on workflow context
- Pipeline stops at next stage boundary or backend call
--- Testing normal completion path ---
7. Task polled: ID=workflow-456
8. Wait returned nil for normal completion - correct!
ALL DEMO SCENARIOS PASSED
E2E cancel with context propagation
Go program simulating the full runner path (queue → Wait → cancel → context.Done → pipeline stops):
$ go run /tmp/demo_e2e_cancel.go
Task polled by agent: cancel-test-1
Cancel listener started (like runner goroutine)
Pipeline started (blocking on context)...
=== Server-side cancel (simulating UI cancel) ===
ErrorAtOnce sent with ErrCancel
Cancel signal received! err=queue: task canceled
Pipeline stopped with: Canceled
Canceled flag: true
Final error: Canceled
SUCCESS: Cancellation propagated correctly through the full path:
Server cancel -> queue.ErrorAtOnce(ErrCancel)
-> queue.Wait() returns error
-> canceled.Store(true)
-> workflowCancel() called
-> pipeline sees context.Done() -> returns ErrCancel
-> normalized to ErrCancel + canceled=true
=== Testing agent SIGTERM path ===
Task polled: sigterm-test-1
Simulating SIGTERM -> calling workflowCancel()
Pipeline stopped with ErrCancel after SIGTERM - correct!
ALL E2E CANCELLATION SCENARIOS PASSED
Test suite results
$ go test ./server/queue/... -count=1
ok go.woodpecker-ci.org/woodpecker/v3/server/queue 5.294s
$ go test ./pipeline -count=1
ok go.woodpecker-ci.org/woodpecker/v3/pipeline 0.007s
$ go test ./rpc/... -count=1
ok go.woodpecker-ci.org/woodpecker/v3/rpc 0.020s
$ go test ./server/pipeline/... -count=1
ok go.woodpecker-ci.org/woodpecker/v3/server/pipeline 0.019s
ok go.woodpecker-ci.org/woodpecker/v3/server/pipeline/stepbuilder 0.106s
All tests pass including TestFifoCancel, TestFifoWait, TestFifoErrors, and all server pipeline tests.
System status verification
$ curl -s http://localhost:8000/version
{"source":"https://github.com/woodpecker-ci/woodpecker","version":"next-860fc34c1d"}
$ # Agent registered and polling
Agent 2: backend=local, platform=linux/amd64, version=next-860fc34c1d, capacity=1
Workers: 1, Pending: 0, Running: 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Mirror of woodpecker-ci#6021 (MERGED) for Orpheus review evaluation.
Upstream: woodpecker-ci#6021
Original PR description:
What was happening
When a workflow/run was canceled from the UI or API, the server correctly marked the run as Canceled, but the runner machines continued executing job steps.
Specifically:
This resulted in:
Related issue
Closes woodpecker-ci#5925
Related PRs
This change complements previous cancellation-related fixes:
While those PRs focus on step and backend cleanup, this PR ensures that workflow cancellation is properly propagated to the runner execution context.
What this PR changes
This PR ensures that canceling a workflow immediately stops execution on the runner.
Why this approach
In Woodpecker, the workflow context is the single source of truth for execution lifecycle.
Previously, receiving a cancel event did not reliably cancel the workflow execution context, allowing pipeline steps to continue running.
By explicitly canceling the workflow context when a cancel signal is received:
This aligns runner behavior with what the UI reports and avoids wasted resources.
Implementation ove