Skip to content

Fix pipeline cancellation status handling and step state synchronization#6011

Merged
6543 merged 106 commits into
woodpecker-ci:mainfrom
6543-forks:rework-step-status-signaling
Feb 5, 2026
Merged

Fix pipeline cancellation status handling and step state synchronization#6011
6543 merged 106 commits into
woodpecker-ci:mainfrom
6543-forks:rework-step-status-signaling

Conversation

@6543

@6543 6543 commented Jan 25, 2026

Copy link
Copy Markdown
Member

Summary

Fixes critical issues with pipeline status updates when pipelines are cancelled or workflows are killed by the agent.
The agent's gRPC client error handling during cancellation was causing incorrect status propagation, leading to cancelled pipelines being marked as either "failed" or "success" instead of "killed".

Problems Fixed

  1. Cancelled running pipelines show incorrect status: When cancelling a running pipeline, the gRPC Wait() error causes the pipeline to be marked as "failed" instead of "killed"
  2. Status updates overwrite cancellation state: Cancelled pipelines initially show correct "killed" status but get asynchronously updated to "success"
  3. Implicit cancellation signaling: Use explicit canceled response property instead of relying on indirect signaling (like exit codes or context errors) for cancellations
  4. Pending pipelines start after cancellation: Race condition allows cancelled pending pipelines to begin execution when agents become available

WebUI Screenshot showing working canceled steps & workflow

image

Changes

  • Add explicit Canceled field to gRPC protocol (WorkflowState and StepState) - bumps protocol version to 15
  • Update agent's Wait() to return cancellation status explicitly instead of relying on errors
  • Properly handle gRPC client cancellation using context.WithCancelCause() to distinguish cancellation reasons
  • Prevent status updates from overwriting cancellation state in step status handling
  • Ensure cancelled workflows remain in "killed" state throughout their lifecycle
  • Add context error checking in local backend's WaitStep for cancelled workflows
  • Fix defer placement in logger and tracer to ensure proper cleanup
  • Improve step status state machine to handle all transitions correctly (pending→running→success/failure/killed)
  • Update backend State to include Started timestamp for accurate timing
  • Fix queue cleanup to handle already-deleted tasks gracefully

Testing

Tested on both local- and docker-backend

read more at #2875


close #833
close #3848
close #2062
close #2911
close #4349


block #6056
block #6039

@6543 6543 added the wip label Jan 25, 2026
@6543

6543 commented Jan 25, 2026

Copy link
Copy Markdown
Member Author

also when Wait() does not return err the step does not get cancled in the background either till it finished ?!?
-> we need special case where wait err is nil but ctx is cancled handled propperly

@6543

6543 commented Jan 25, 2026

Copy link
Copy Markdown
Member Author

related work #3850

@codecov

This comment was marked as off-topic.

@qwerty287

Copy link
Copy Markdown
Contributor

That's nothing new…

There's a summary issue for the whole pipeline canceling: #2875

@6543

6543 commented Jan 25, 2026

Copy link
Copy Markdown
Member Author

https://github.com/6543-forks/woodpecker/blob/1b74c2baa6dd8b67577510d3dd85246ffd24d60c/agent/runner.go#L102-L113

@qwerty287 I'll work on that now ... will take some time i guess to get through all of it :)

@6543 6543 mentioned this pull request Jan 25, 2026
…hed and refactored by us a lot and now is mosty ours
@6543

6543 commented Feb 2, 2026

Copy link
Copy Markdown
Member Author

@qwerty287 ☝️ updated the code comment and code to make it clear and more err resistant

@6543 6543 requested a review from lafriks February 2, 2026 15:05

@qwerty287 qwerty287 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me, but untested from my side (I guess you did that…) and I'm also not totally into this code. But as @lafriks comment is solved I guess it's fine

@6543 6543 merged commit 8a8f9ad into woodpecker-ci:main Feb 5, 2026
8 of 9 checks passed
@6543 6543 deleted the rework-step-status-signaling branch February 5, 2026 20:41
@woodpecker-bot woodpecker-bot mentioned this pull request Feb 5, 2026
1 task
@6543

6543 commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

@qwerty287

Copy link
Copy Markdown
Contributor

@6543 there seems to be another issue: https://ci.woodpecker-ci.org/repos/3780/pipeline/31596

I canceled that manually but it's showing as succeeded

@6543

6543 commented Feb 12, 2026

Copy link
Copy Markdown
Member Author

hmm that's another issue unrelated to agent rpc but queue implementation ... would you mind creating an issue?

samoli added a commit to samoli/woodpecker that referenced this pull request Mar 28, 2026
When a workflow includes service steps, the workflow and pipeline status is permanently stuck as "running" in the database after all steps had finished.

completeChildrenIfParentCompleted was called after UpdateWorkflowStatusToDone, so the status calculation still saw service steps as running. The in-memory step state also wasn't updated after the database update.

The fix moves child completion before the status calculation and syncs the in-memory state so WorkflowStatus sees the finalized step states.

I think this bug was introduced in the refactoring in woodpecker-ci#6011
samoli added a commit to samoli/woodpecker that referenced this pull request Mar 28, 2026
When a workflow includes service steps, the workflow and pipeline status is permanently stuck as "running" in the database after all steps had finished.

completeChildrenIfParentCompleted was called after UpdateWorkflowStatusToDone, so the status calculation still saw service steps as running. The in-memory step state also wasn't updated after the database update.

The fix moves child completion before the status calculation and syncs the in-memory state so WorkflowStatus sees the finalized step states.

I think this bug was introduced in the refactoring in woodpecker-ci#6011
@woodpecker-bot woodpecker-bot mentioned this pull request Apr 1, 2026
1 task
@woodpecker-bot woodpecker-bot mentioned this pull request Apr 15, 2026
1 task
@woodpecker-bot woodpecker-bot mentioned this pull request Apr 27, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent bug Something isn't working highlight server

Projects

None yet

5 participants