Don't unblock run-level-concurrency-blocked runs in the resolver#37461
Don't unblock run-level-concurrency-blocked runs in the resolver#37461silverwind wants to merge 5 commits intogo-gitea:mainfrom
Conversation
checkJobsOfCurrentRunAttempt's resolver only considered needs and job-level concurrency when transitioning jobs out of Blocked. When something drove the resolver against a run blocked solely by workflow-level concurrency (for example, a sibling run in the same group entering the queue and triggering EmitJobsIfReadyByRun), the run's job silently became Waiting while another run still held the group, and the runner could pick it up. Bail out of the resolver when the run's latest attempt is still blocked by run-level concurrency. checkRunConcurrency re-evaluates when the holding run finishes. Fixes go-gitea#37446 Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Prevents the job-status resolver from transitioning jobs out of Blocked when the overall run is still blocked by workflow-level (run-level) concurrency, closing a gap where queued resolver runs could defeat concurrency guarantees.
Changes:
- Add an early-bail in
checkJobsOfCurrentRunAttemptwhen the latest attempt is still blocked by run-level concurrency. - Add an integration test ensuring schedule-triggered runs remain blocked (and don’t emit runnable jobs) while a sibling run holds the concurrency group.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/integration/actions_concurrency_test.go | Adds an integration test covering the run-level concurrency-blocked resolver regression scenario. |
| services/actions/job_emitter.go | Bails out of the job resolver when the run is blocked due to run-level concurrency, preventing jobs from becoming runnable prematurely. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Per review feedback on go-gitea#37461. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
The Len(blockedRuns, 1) assertion already proves the surviving schedule run is not Waiting, so the runner could not pick anything up. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
I believe it can be tested in a unit test.
|
Took 2.7s locally. Slow yes, but not extremely. |
These tests will be extremely slow in CI, the more added, the slower. |
Share some findings I have got about CI time: Operating a git repo via API/web is slow, due to it needs to execute the Gitea's git hook. It might be slower in CI due to limited resources. To keep CI fast and optimize the speed: avoid unnecessary git repo operations via Gitea's API or web (avoid the Gitea's git hook) as much as possible
I think it can save at least many minutes if |
|
Yes, with ~4 times slower CI, we are looking at 12s+, which is borderline. |
|
Another concern is that the integration tests are abused. Actually, for a function level logic, it should be clearly tested in unit tests, including various edge cases. Integration test should focus on "the whole thing overall works together", it's difficult to use it to cover edge cases, and usually it is not informative when writing a integration test for a speical case (a lot of unrelated code, maintenance burden) |
|
Yeah, what can be asserted in a unit test should be. I didn't tell Claude to write integration test, it decided itself. Likely a good pointer to add to AGENTS.md. My attack points for fast CI are:
|
Per @wxiaoguang's feedback on go-gitea#37461: the run-level concurrency guard in checkJobsOfCurrentRunAttempt is function-level logic and is better covered by a unit test. The unit test sets up a Running holder attempt and a Blocked sibling attempt in the same concurrency group directly in the DB, calls checkJobsOfCurrentRunAttempt, and asserts the blocked job stays Blocked. ~0.3s vs ~3.7s for the integration version, and no API/git-hook overhead. Co-Authored-By: Claude (Opus 4.7) <noreply@anthropic.com>
|
@wxiaoguang good call — replaced the integration test with a unit test in This response was written with the help of Claude Opus 4.7 |
|
The new test looks good to me |
Fixes #37446.
The job-status resolver in
checkJobsOfCurrentRunAttemptonly consideredneedsand job-level concurrency when transitioning jobs out ofBlocked. When something drove the resolver against a run blocked solely by workflow-level concurrency — for example, a sibling run in the same group entering the queue and triggeringEmitJobsIfReadyByRun— the run's job silently becameWaitingwhile another run still held the concurrency group, and the runner could pick it up, defeating the concurrency guarantee.The fix bails out of the resolver when the run's latest attempt is still blocked by run-level concurrency.
checkRunConcurrencyre-evaluates when the holding run finishes.Covered by a unit test (
Test_checkJobsOfCurrentRunAttempt_RunLevelConcurrencyKeepsJobsBlockedinservices/actions/job_emitter_test.go) that sets up a Running holder attempt and a Blocked sibling attempt in the same concurrency group directly in the DB, callscheckJobsOfCurrentRunAttempt, and asserts the blocked job staysBlocked. Fails on master, passes with the fix.This PR was written with the help of Claude Opus 4.7