Skip to content

fix(agent): workflow runner use shutdown context#6021

Merged
6543 merged 8 commits into
woodpecker-ci:mainfrom
Pnkcaht:fix/cancel-run-stops-runners
Jan 27, 2026
Merged

fix(agent): workflow runner use shutdown context#6021
6543 merged 8 commits into
woodpecker-ci:mainfrom
Pnkcaht:fix/cancel-run-stops-runners

Conversation

@Pnkcaht

@Pnkcaht Pnkcaht commented Jan 26, 2026

Copy link
Copy Markdown
Contributor

What was happening

When a workflow/run was canceled from the UI or API, the server correctly marked the run as Canceled, but the runner machines continued executing job steps.

Specifically:

  • The cancel signal was received by the runner
  • The workflow state was updated on the server
  • Pipeline execution on the runner was not interrupted
  • Tests and other long-running steps kept running and consuming runner capacity

This resulted in:

  • Wasted compute time
  • Runner capacity remaining blocked
  • Confusing UX (UI shows Canceled, but jobs keep running)

Related issue

Closes #5925

Related PRs

This change complements previous cancellation-related fixes:

While those PRs focus on step and backend cleanup, this PR ensures that workflow cancellation is properly propagated to the runner execution context.

What this PR changes

This PR ensures that canceling a workflow immediately stops execution on the runner.

  • Propagates server-side cancel events to the runner workflow context
  • Cancels the workflow execution context as soon as a cancel signal is received
  • Ensures pipeline execution and backend steps are interrupted
  • Normalizes cancellation handling so the workflow consistently ends as Canceled

Why this approach

In Woodpecker, the workflow context is the single source of truth for execution lifecycle.

Previously, receiving a cancel event did not reliably cancel the workflow execution context, allowing pipeline steps to continue running.
By explicitly canceling the workflow context when a cancel signal is received:

  • The pipeline runtime receives context.Done()
  • Backend implementations (Docker, Kubernetes, etc.) can terminate running steps
  • Runner capacity is freed promptly

This aligns runner behavior with what the UI reports and avoids wasted resources.

Implementation overview

sequenceDiagram
    participant UI as Web UI / API
    participant Server as Woodpecker Server
    participant Runner as Agent Runner
    participant Pipeline as Pipeline Runtime
    participant Backend as Backend Engine

    UI->>Server: Cancel workflow
    Server->>Runner: Cancel signal (Wait)
    Runner->>Runner: cancel workflow context
    Runner->>Pipeline: ctx.Done()
    Pipeline->>Backend: stop execution
    Backend-->>Pipeline: terminate running steps
    Pipeline-->>Runner: execution aborted
    Runner->>Server: report canceled state
Loading

Comment thread agent/runner.go
@6543

6543 commented Jan 26, 2026

Copy link
Copy Markdown
Member

@Pnkcaht for you as you are new to the project i just documented a bunch of dev stuff:

#6012
#6019

-> https://woodpecker-ci-woodpecker-pr-6019.surge.sh/docs/next/development/architecture

@6543 6543 added bug Something isn't working agent labels Jan 27, 2026
Comment thread agent/runner.go Outdated
@codecov

codecov Bot commented Jan 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 21.85%. Comparing base (b1cbd96) to head (860fc34).
⚠️ Report is 306 commits behind head on main.

Files with missing lines Patch % Lines
agent/runner.go 0.00% 46 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6021      +/-   ##
==========================================
- Coverage   21.86%   21.85%   -0.01%     
==========================================
  Files         432      432              
  Lines       39254    39258       +4     
==========================================
  Hits         8581     8581              
- Misses      29865    29869       +4     
  Partials      808      808              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@6543

6543 commented Jan 27, 2026

Copy link
Copy Markdown
Member

before the patch:

if you cancle the agent while running a workflow the steps and workflow is marked as failed

now:

workfow is correctly stated as failed but the step is marked as success

@Pnkcaht

Pnkcaht commented Jan 27, 2026

Copy link
Copy Markdown
Contributor Author

@6543 What do you think? :)

image

@6543

6543 commented Jan 27, 2026

Copy link
Copy Markdown
Member

sorry wont do ... the steps are marked as succes the state of the whole workflow is not the problem

Comment thread agent/runner.go Outdated
Comment thread agent/runner.go Outdated
fix(agent): use shutdown context for reporting metadata

Co-authored-by: 6543 <6543@obermui.de>
@Pnkcaht

Pnkcaht commented Jan 27, 2026

Copy link
Copy Markdown
Contributor Author

Can you explain the mistakes you made? I can fix them.

Comment thread agent/runner.go Outdated
Comment thread agent/runner.go Outdated
Comment thread agent/runner.go Outdated
@6543 6543 changed the title fix(agent): properly cancel workflow execution on server cancel fix(agent): workflow runner use shutdown context Jan 27, 2026
@6543 6543 added refactor delete or replace old code and removed bug Something isn't working labels Jan 27, 2026

@6543 6543 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as smal refactor pull and handling context now correct

@6543 6543 merged commit f202470 into woodpecker-ci:main Jan 27, 2026
7 of 9 checks passed
@woodpecker-bot woodpecker-bot mentioned this pull request Jan 27, 2026
1 task
@6543

6543 commented Apr 4, 2026

Copy link
Copy Markdown
Member

this intorduced an regression, proper fix is #6361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent refactor delete or replace old code skip-changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cancelling a workflow/run does not stop execution on runners (machines keep running)

2 participants