fix: prevent dual-agent conflict for codex in consumer template by stranske · Pull Request #701 · stranske/Workflows

stranske · 2026-01-09T15:54:34Z

Problem

PR #695 introduced a change that causes both CLI and UI agents to run simultaneously for codex-labeled issues:

Issue gets labeled with agent:codex
agents-issue-intake.yml creates PR with post_agent_comment: 'true'
Bridge posts @codex start comment → triggers UI agent (chatgpt-codex-connector)
PR has agent:codex label → triggers CLI keepalive loop
*Both agents conflict+x /tmp/create_missing_labels.sh && /tmp/create_missing_labels.sh

This was identified by the P2 bot comment on PR #700:

This change forces post_agent_comment to 'true' for all non-workflow_dispatch runs... causing the UI agent to run alongside the CLI keepalive loop and potentially conflict.

Solution

Changed the consumer template logic to:

Skip post_agent_comment for agent:codex (CLI keepalive handles it)
Keep post_agent_comment for other agents like agent:claude
Preserve workflow_dispatch behavior for manual overrides

Logic Change

# Before (always true for non-dispatch):
post_agent_comment: ${{ github.event_name == 'workflow_dispatch' && (...) || 'true' }}

# After (skip for codex):
post_agent_comment: >-
  ${{ github.event_name == 'workflow_dispatch'
      && (inputs.post_codex_comment && 'true' || 'false')
      || (needs.check_labels.outputs.agent != 'codex' && 'true' || 'false') }}

Context

See docs/keepalive/Agents.md for the CLI vs UI agent architecture:

CLI Agent: Triggered by workflows (keepalive loop), does NOT need @codex comments
UI Agent: Triggered by @codex comments (backup/manual mode)

For codex, the CLI keepalive loop is the primary mechanism - posting comments would duplicate/conflict.

Impact

Consumer repos using this template will correctly:

Use CLI-only automation for agent:codex labels
Use comment-based triggering for other agents
Allow manual override via workflow_dispatch

- Added missing PYTHONPATH env var to agents-issue-optimizer.yml (Phase 1, 2, dedup) - Copied agents-capability-check.yml, agents-decompose.yml, agents-dedup.yml from template - Adapted all workflows for Workflows repo (removed self-checkout, fixed paths) - Fixed agents-auto-label.yml checkout and paths - Created SHORT_TERM_PLAN.md and SHORT_TERM_PLAN_SUMMARY.md Fixes: agents:optimize failing with 'No module named tools' error on issue #691

Remove ../ prefix from file paths in capability-check, decompose, and dedup workflows. Consumer template uses ../ because workflows-repo is a subdirectory, but Workflows repo itself doesn't need the prefix.

- Add agents-capability-check.yml to workflow docs - Add agents-decompose.yml to workflow docs - Add agents-dedup.yml to workflow docs Fixes remaining test failures in PR #694

Per Copilot review comment - this was missing PYTHONPATH which could cause the same ModuleNotFoundError this PR is fixing. All other workflows in this PR have PYTHONPATH set, this one was overlooked. Co-authored-by: copilot-pull-request-reviewer

- agents-capability-check.yml - agents-decompose.yml - agents-dedup.yml Fixes test_canonical_workflow_names_match_expected_mapping failure

**Problem:** - PRs stuck indefinitely when Gate cancelled due to API rate limits - Keepalive orchestrator defers with no automatic retry mechanism - All agent work stops until manual intervention **Solution:** 1. **Automatic retry after timeout**: Keepalive now retries after 5 minutes instead of deferring indefinitely 2. **State tracking**: Track last defer timestamp to calculate retry timing 3. **Automated recovery workflow**: New workflow runs every 10 minutes to find and retry deferred PRs automatically **Changes:** - keepalive_loop.js: Add retry-after-rate-limit logic with 5min timeout - keepalive_loop.js: Track last_defer_timestamp in state - agents-keepalive-retry-deferred.yml: New scheduled workflow to auto-retry Fixes systematic stoppage when GitHub API rate limits are exceeded.

@codex

…ment When PRs are created from issues labeled with agent:codex, the template was posting @codex start comments which triggered the UI agent. However, the CLI keepalive loop already handles codex PRs through the workflow system. Having both agents run simultaneously causes conflicts. This fix: - Skips post_agent_comment for agent:codex (CLI keepalive handles it) - Keeps post_agent_comment for other agents like agent:claude - Preserves workflow_dispatch behavior for manual overrides Addresses P2 issue raised in PR #700 review comment. See docs/keepalive/Agents.md for CLI vs UI agent distinction.

chatgpt-codex-connector · 2026-01-09T15:57:44Z

Summary

Adjusted the consumer issue-intake workflow to keep manual-dispatch auto-comments enabled for non-codex agents while only honoring post_codex_comment for codex selections, preventing unintended codex UI triggers on manual dispatch runs. templates/consumer-repo/.github/workflows/agents-issue-intake.ymlL170-L189

Testing

Not run (workflow template change).

View task →

Copilot

Pull request overview

This PR addresses a critical dual-agent conflict issue where both CLI and UI agents were running simultaneously for codex-labeled issues, introduced in PR #695. The primary fix prevents the agents-issue-intake.yml workflow from posting @codex comments for codex agents, since the CLI keepalive loop handles agent activation directly.

Key Changes

Modified consumer template logic to skip post_agent_comment for agent:codex while preserving it for other agents and workflow_dispatch overrides
Added rate limit retry logic to the keepalive system with automatic deferral and retry mechanisms
Introduced four new Phase 3 workflows (capability check, task decomposition, duplicate detection, and retry deferred) plus PYTHONPATH fixes

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
templates/consumer-repo/.github/workflows/agents-issue-intake.yml	Core fix: prevents dual-agent conflict by skipping comment posts for codex agents
.github/scripts/keepalive_loop.js	Adds rate limit detection and deferral logic with timestamp tracking
.github/workflows/agents-keepalive-retry-deferred.yml	New workflow to automatically retry rate-limited PRs after delay
.github/workflows/agents-capability-check.yml	New pre-flight check workflow to identify blockers before agent assignment
.github/workflows/agents-decompose.yml	New workflow for breaking down large issues into actionable sub-tasks
.github/workflows/agents-dedup.yml	New workflow for semantic duplicate issue detection
.github/workflows/agents-issue-optimizer.yml	Adds missing PYTHONPATH configuration to fix ModuleNotFoundError
.github/workflows/agents-auto-label.yml	Removes self-checkout logic and fixes PYTHONPATH for Workflows repo
topics.json	Adds `scope` and `non_goals` fields to issue template schema
tests/workflows/test_workflow_naming.py	Adds expected names for three new workflows
docs/ci/WORKFLOW_SYSTEM.md	Documents the three new Phase 3 workflows in the system inventory
docs/ci/WORKFLOWS.md	Adds descriptions for new agent workflows
docs/plans/SHORT_TERM_PLAN_SUMMARY.md	New file documenting 2-week Phase 3 testing plan
docs/plans/SHORT_TERM_PLAN.md	New detailed planning document for Phase 3 completion
docs/WORKFLOW_AUDIT_2026-01-09.md	New audit results documenting workflow issues and fixes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-09T15:59:26Z

docs/plans/SHORT_TERM_PLAN_SUMMARY.md

@@ -0,0 +1,119 @@
+# Short-Term Plan Summary


This PR includes numerous unrelated changes beyond the dual-agent conflict fix described in the title and description. The PR bundles together:

The primary fix (agents-issue-intake.yml post_agent_comment logic)

Four new workflow files (agents-keepalive-retry-deferred.yml, agents-capability-check.yml, agents-decompose.yml, agents-dedup.yml)

PYTHONPATH fixes across multiple workflows

Rate limit retry logic in keepalive_loop.js

Three new documentation files (SHORT_TERM_PLAN.md, SHORT_TERM_PLAN_SUMMARY.md, WORKFLOW_AUDIT_2026-01-09.md)

Schema changes to topics.json

Test file updates

This makes the PR difficult to review and increases the risk of introducing issues. Consider either:

Splitting these into separate, focused PRs

Updating the PR description to comprehensively document all changes and their relationships

Copilot · 2026-01-09T15:59:26Z

tests/workflows/test_workflow_naming.py

+    "agents-capability-check.yml": "Capability Check",
+    "agents-decompose.yml": "Task Decomposition",
+    "agents-dedup.yml": "Duplicate Detection",


The new workflow file agents-keepalive-retry-deferred.yml is missing from the EXPECTED_NAMES mapping in this test file. The test test_workflow_names_match_filename_convention() will fail with "Missing expected name mapping for agents-keepalive-retry-deferred.yml" when it encounters this workflow.

Add an entry to the EXPECTED_NAMES dictionary with the workflow's display name (which should be "Agents Keepalive Retry Deferred" based on line 1 of the workflow file).

Copilot · 2026-01-09T15:59:26Z

.github/scripts/keepalive_loop.js

+        const lastDeferTime = previousState?.last_defer_timestamp || 0;
+        const timeSinceLastDefer = now - lastDeferTime;
+        const RATE_LIMIT_RETRY_INTERVAL = 5 * 60 * 1000; // 5 minutes
+
+        if (timeSinceLastDefer >= RATE_LIMIT_RETRY_INTERVAL) {
+          // Enough time has passed, retry with reduced API usage
+          action = 'run';
+          reason = 'retry-after-rate-limit';
+          if (core) core.info(`Rate limit defer timeout reached (${Math.floor(timeSinceLastDefer / 60000)}min since last defer), retrying with reduced API calls`);
+        } else {
+          action = 'defer';
+          reason = 'gate-cancelled-rate-limit-transient';
+          if (core) core.info(`Rate limit detected, deferring. Will retry in ${Math.floor((RATE_LIMIT_RETRY_INTERVAL - timeSinceLastDefer) / 60000)}min`);


Logic issue: The rate limit retry logic will always execute on the first defer because last_defer_timestamp will be 0 (falsy) for the first occurrence. This means timeSinceLastDefer will equal the current timestamp (potentially millions of milliseconds), which will always exceed the 5-minute threshold (300,000ms).

This effectively bypasses the deferral mechanism on the first rate limit encounter. The code should initialize the defer timestamp on the first defer, not check against it. Consider either:

Checking if lastDeferTime is 0 and treating that as the first defer (set the timestamp but don't retry yet)

Setting the defer timestamp immediately and only retry on subsequent checks

Suggested change

const lastDeferTime = previousState?.last_defer_timestamp || 0;

const timeSinceLastDefer = now - lastDeferTime;

const RATE_LIMIT_RETRY_INTERVAL = 5 * 60 * 1000; // 5 minutes

if (timeSinceLastDefer >= RATE_LIMIT_RETRY_INTERVAL) {

// Enough time has passed, retry with reduced API usage

action = 'run';

reason = 'retry-after-rate-limit';

if (core) core.info(`Rate limit defer timeout reached (${Math.floor(timeSinceLastDefer / 60000)}min since last defer), retrying with reduced API calls`);

} else {

action = 'defer';

reason = 'gate-cancelled-rate-limit-transient';

if (core) core.info(`Rate limit detected, deferring. Will retry in ${Math.floor((RATE_LIMIT_RETRY_INTERVAL - timeSinceLastDefer) / 60000)}min`);

const lastDeferTime = previousState?.last_defer_timestamp ?? 0;

const RATE_LIMIT_RETRY_INTERVAL = 5 * 60 * 1000; // 5 minutes

if (!lastDeferTime) {

// First time we've seen a rate limit in this state: defer and initialize timestamp

action = 'defer';

reason = 'gate-cancelled-rate-limit-transient';

if (core) core.info('Rate limit detected for the first time in this state, deferring and initializing defer timestamp');

} else {

const timeSinceLastDefer = now - lastDeferTime;

if (timeSinceLastDefer >= RATE_LIMIT_RETRY_INTERVAL) {

// Enough time has passed, retry with reduced API usage

action = 'run';

reason = 'retry-after-rate-limit';

if (core) core.info(`Rate limit defer timeout reached (${Math.floor(timeSinceLastDefer / 60000)}min since last defer), retrying with reduced API calls`);

} else {

action = 'defer';

reason = 'gate-cancelled-rate-limit-transient';

if (core) core.info(`Rate limit detected, deferring. Will retry in ${Math.floor((RATE_LIMIT_RETRY_INTERVAL - timeSinceLastDefer) / 60000)}min`);

}

Copilot · 2026-01-09T15:59:27Z

.github/workflows/agents-keepalive-retry-deferred.yml

+            const agentPRs = pulls.filter(pr =>
+              pr.labels.some(label =>
+                label.name.startsWith('agents:') ||
+                label.name.startsWith('codex:')


The label filter includes codex: prefix but based on the codebase search, the standard label format is agent:codex not codex:*. The filter label.name.startsWith('codex:') will not match agent:codex labels.

Either remove this filter line if it's incorrect, or clarify if there are indeed codex:* labels used in the system. Based on the PR description and code review, only agent:codex and agents:* patterns are used.

Suggested change

label.name.startsWith('codex:')

label.name === 'agent:codex'

@codex

…ment When PRs are created from issues labeled with agent:codex, the template was posting @codex start comments which triggered the UI agent. However, the CLI keepalive loop already handles codex PRs through the workflow system. Having both agents run simultaneously causes conflicts. This fix: - Skips post_agent_comment for agent:codex (CLI keepalive handles it) - Keeps post_agent_comment for other agents like agent:claude - Preserves workflow_dispatch behavior for manual overrides Supersedes PR #701 (which had unrelated commits mixed in) Addresses P2 issue raised in PR #700 review comment. See docs/keepalive/Agents.md for CLI vs UI agent distinction.

stranske · 2026-01-09T17:33:02Z

Closing - Superseded by PR #705

This PR accumulated unrelated commits from the fix/workflows-pythonpath-phase3 branch, causing confusion:

The dual-agent conflict fix (1 commit)
PYTHONPATH additions already merged in fix: Add PYTHONPATH and Phase 3 workflows to Workflows repo #694
Phase 3 workflows already merged in fix: Add PYTHONPATH and Phase 3 workflows to Workflows repo #694
Rate limit handling changes superseded by fix: bypass rate-limit-only Gate cancellations - proceed with work #702

The actual fix (skipping post_agent_comment for codex) has been cleanly extracted to PR #705.

Bot Comments Addressed:

Unrelated changes - Fixed by clean extraction to fix: prevent dual-agent conflict for codex by skipping post_agent_comment #705
Missing workflow mapping test - Not needed in fix: prevent dual-agent conflict for codex by skipping post_agent_comment #705 (no new workflows)
Rate limit retry logic issue - Superseded by fix: bypass rate-limit-only Gate cancellations - proceed with work #702 (bypass instead of defer/retry)
Label filter issue - Removed in fix: prevent dual-agent conflict for codex by skipping post_agent_comment #705 (no new workflow file)

Closing in favor of #705.

@codex

…ment (#705) When PRs are created from issues labeled with agent:codex, the template was posting @codex start comments which triggered the UI agent. However, the CLI keepalive loop already handles codex PRs through the workflow system. Having both agents run simultaneously causes conflicts. This fix: - Skips post_agent_comment for agent:codex (CLI keepalive handles it) - Keeps post_agent_comment for other agents like agent:claude - Preserves workflow_dispatch behavior for manual overrides Supersedes PR #701 (which had unrelated commits mixed in) Addresses P2 issue raised in PR #700 review comment. See docs/keepalive/Agents.md for CLI vs UI agent distinction.

stranske added 9 commits January 9, 2026 14:00

docs: Add comprehensive workflow audit results

ecabcab

fix: Correct file paths in Phase 3 workflows

e249015

Remove ../ prefix from file paths in capability-check, decompose, and dedup workflows. Consumer template uses ../ because workflows-repo is a subdirectory, but Workflows repo itself doesn't need the prefix.

docs: Add Phase 3 workflows to documentation inventory

0cf42e8

- Add agents-capability-check.yml to workflow docs - Add agents-decompose.yml to workflow docs - Add agents-dedup.yml to workflow docs Fixes remaining test failures in PR #694

Merge branch 'main' into fix/workflows-pythonpath-phase3

a940df1

test: Add Phase 3 workflows to EXPECTED_NAMES mapping

d91670b

- agents-capability-check.yml - agents-decompose.yml - agents-dedup.yml Fixes test_canonical_workflow_names_match_expected_mapping failure

Copilot AI review requested due to automatic review settings January 9, 2026 15:54

Copilot started reviewing on behalf of stranske January 9, 2026 15:55 View session

Copilot AI reviewed Jan 9, 2026

View reviewed changes

stranske mentioned this pull request Jan 9, 2026

fix: prevent dual-agent conflict for codex by skipping post_agent_comment #705

Merged

5 tasks

stranske closed this Jan 9, 2026

stranske deleted the fix/prevent-dual-agent-conflict branch January 9, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent dual-agent conflict for codex in consumer template#701

fix: prevent dual-agent conflict for codex in consumer template#701
stranske wants to merge 9 commits intomainfrom
fix/prevent-dual-agent-conflict

stranske commented Jan 9, 2026

Uh oh!

chatgpt-codex-connector bot commented Jan 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

Copilot AI Jan 9, 2026

Uh oh!

stranske commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 9, 2026

Problem

Solution

Logic Change

Context

Impact

Uh oh!

chatgpt-codex-connector bot commented Jan 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

stranske commented Jan 9, 2026

Closing - Superseded by PR #705

Bot Comments Addressed:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants