Fix/workflow startup failure real fix by stranske · Pull Request #606 · stranske/Workflows

stranske · 2026-01-06T17:06:42Z

No description provided.

* fix: correct YAML syntax in agents-issue-intake.yml template The 'if' condition in the check_labels job was improperly formatted, causing the line to wrap incorrectly with 'runs-on' ending up on the same line. This resulted in startup_failure errors when the workflow was deployed to consumer repos. Changes: - Use multiline scalar (|) for complex if condition - Properly indent continuation lines - Ensure runs-on is on its own line Fixes workflow failures in stranske/Travel-Plan-Permission and other consumer repositories using this template. * fix: add validation safeguards for template changes Problem: Template changes sync to 4+ consumer repos. A syntax error in agents-issue-intake.yml caused startup_failure in all consumer repos because there was no validation preventing bad templates. Changes: 1. Fix YAML syntax error in check_labels job (multiline if condition) 2. Add validate_workflow_yaml.py script to catch YAML/style issues 3. Add pre-commit hook to validate templates before commit 4. Add CRITICAL section to CLAUDE.md about template changes Safeguards added: - Pre-commit hook blocks template commits with validation errors - Script checks: YAML syntax, line length (100), runs-on placement - Clear warning in CLAUDE.md with validation commands - Enforces repo standards before sync Related: Travel-Plan-Permission#253, Workflows#602

The workflow now uses the CODESPACES_WORKFLOWS secret which has merge permissions, falling back to GITHUB_TOKEN if not available. Successfully merged sync PRs in Manager-Database, Template, and trip-planner using this token.

- Parse multiline REGISTERED_CONSUMER_REPOS env var instead of hardcoded list - Add stale PR cleanup: close and delete branches for older sync PRs - Process repos in order from REGISTERED_CONSUMER_REPOS (7 repos total) - Increase per_page to 20 to catch multiple stale PRs - Add stale_closed status tracking in summary

- Extract consumer repo list from maint-68-sync-consumer-repos.yml at runtime - Use yq to parse the authoritative REGISTERED_CONSUMER_REPOS env var - Remove duplicated hardcoded list to maintain single source of truth

- Change default max_length from 150 to 100 to match repo standards (black, ruff, isort) - Add explicit encoding='utf-8' to all file operations for cross-platform compatibility - Remove redundant condition check (already verified by elif condition)

- Add critical section to CLAUDE.md about checking new workflows for file artifacts - Create comprehensive WORKFLOW_ARTIFACT_CHECKLIST.md with decision trees and examples - Document common artifact patterns that cause merge conflicts in consumer repos - Provide recovery procedures for artifact pollution - Emphasize template workflows sync to 7+ repos (one mistake = 7+ conflicts)

- Require addressing ALL bot comments before merging PRs - Document that bot comments are mandatory fixes, not suggestions - Provide process for evaluating and resolving bot feedback - Emphasize impact: ignored comments → bugs in 7+ consumer repos - Add examples of critical issues bots catch (encoding, defaults, logic)

- Add workflow to EXPECTED_NAMES test mapping - Document in docs/ci/WORKFLOWS.md with description - Add to docs/ci/WORKFLOW_SYSTEM.md workflow table - Fixes test failures: test_canonical_workflow_names_match_expected_mapping, test_workflow_names_match_filename_convention, test_inventory_docs_list_all_workflows

- Quote $repos variable in yq pipeline to prevent word splitting (SC2086) - Quote $GITHUB_OUTPUT and $GITHUB_STEP_SUMMARY variables - Fixes shellcheck warnings in actionlint

The fallback to GITHUB_TOKEN causes merge failures since GITHUB_TOKEN lacks merge permissions. Require CODESPACES_WORKFLOWS secret explicitly.

Keep CODESPACES_WORKFLOWS without fallback to fix merge permissions.

Prevents merge conflicts and wasted CI resources by requiring git fetch/merge before gh pr create.

- Consumer repos should automatically create bootstrap PRs when issues are labeled with agent:codex or similar labels - Previously used 'invite' mode which only waits for humans to create PRs - Changed template to 'create' mode to enable automatic PR creation - This will propagate to all consumer repos via sync workflow Also fixed line length issues to pass validation.

Root cause: The reusable issue bridge workflow was hardcoded to always use 'invite' mode for issue events, ignoring the mode input parameter. This prevented automatic PR creation when issues are labeled. The logic at line 257-268 always overrides mode to 'invite' when eventName === 'issues', with the rationale that 'the human post lands on the issue'. However, this breaks the desired workflow of auto-creating bootstrap PRs when issues are labeled with agent:codex. Solution: - Add force_mode boolean input to reusable workflow - When force_mode=true, respect the mode input regardless of event type - Update consumer template to pass force_mode: true - This allows mode: create to work for issue events while maintaining backward compatibility (default force_mode=false preserves old behavior) This is the correct fix after 5 attempts - the previous attempts only changed the mode input but didn't account for the hardcoded override.

Root cause: PR #89 removed the permissions block from the sync job, breaking the chatgpt_sync workflow that processes topic files to create issues. The sync job calls agents-63-issue-intake.yml which needs: - contents: read (to checkout repo and read topic files) - issues: write (to create/update issues) - id-token: write (for GitHub OIDC token) - models: read (for LangChain formatting with GitHub Models) Without these permissions, the workflow cannot process files or create issues from topic files. This fixes the actual issue - file processing in chatgpt_sync mode.

COMPLETE ROOT CAUSE ANALYSIS: The format_created_issues job in agents-63-issue-intake.yml uses: GH_TOKEN: ${{ github.token }} GITHUB_TOKEN: ${{ github.token }} When a reusable workflow is called: - Without secrets: inherit → github.token has NO permissions - With explicit secrets → github.token still has NO permissions - With secrets: inherit → github.token gets caller's permissions The consumer template was passing explicit secrets (SERVICE_BOT_PAT, OWNER_PR_PAT) but NOT using 'secrets: inherit'. This meant: 1. The sync job in the reusable workflow couldn't use github.token 2. gh CLI and GitHub API calls failed with permission errors 3. Files were processed but issues couldn't be created/updated The permissions block on the sync job sets what github.token CAN have, but secrets: inherit is what actually PASSES that token to the reusable workflow with those permissions. This is the actual fix. Testing flow: 1. User triggers workflow_dispatch with chatgpt_sync mode 2. route job determines mode → should_run_sync=true 3. sync job calls agents-63-issue-intake.yml with secrets: inherit 4. chatgpt_sync job has contents:read, issues:write permissions 5. format_created_issues job has those + id-token:write + models:read 6. Both jobs can use github.token with proper permissions 7. Files are processed, issues created, LangChain formatting applied

…mode

agents-workflows-bot · 2026-01-06T17:08:31Z

⚠️ Action Required: Unable to determine source issue for PR #606. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

Copilot

Pull request overview

This PR addresses workflow startup failures by modifying the agent issue intake workflow behavior and adding process documentation. The primary changes involve switching from "invite" mode to "create" mode with a force mode override, adjusting secret handling, and documenting best practices for branch synchronization.

Key Changes

Changed agent bridge workflow from "invite" to "create" mode with force_mode: true to override event-based mode selection
Added force_mode input parameter to the reusable agent bridge workflow to allow bypassing event-driven mode logic
Modified secret handling in sync job from explicit secrets to secrets: inherit and added permissions block

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
templates/consumer-repo/.github/workflows/agents-issue-intake.yml	Updated bridge job mode to "create" with force_mode, reformatted comments, added permissions to sync job, changed to inherited secrets
.github/workflows/reusable-agents-issue-bridge.yml	Added force_mode input parameter and conditional logic to override event-based mode selection
.github/workflows/maint-71-merge-sync-prs.yml	Removed fallback to GITHUB_TOKEN, now only uses CODESPACES_WORKFLOWS secret
CLAUDE.md	Added documentation section emphasizing the importance of syncing with main before creating PRs

Comments suppressed due to low confidence (1)

templates/consumer-repo/.github/workflows/agents-issue-intake.yml:168

The mode has been changed from "invite" to "create" with force_mode enabled. This represents a significant behavior change that will affect how agent assignment works. Ensure this is the intended behavior and that all calling workflows and downstream systems expect this mode change. The "create" mode may have different side effects compared to "invite" mode.

      mode: "create"
      post_agent_comment: ${{ inputs.post_codex_comment && 'true' || 'false' }}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

templates/consumer-repo/.github/workflows/agents-issue-intake.yml

github-actions · 2026-01-06T17:10:10Z

Automated Status Summary

Head SHA: 54cc3ec
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	92.21%
Baseline	85.00%
Delta	+7.21%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`scripts/workflow_health_check.py`	62.6%	28
`scripts/classify_test_failures.py`	62.9%	37
`scripts/ledger_validate.py`	65.3%	63
`scripts/mypy_return_autofix.py`	82.6%	11
`scripts/ledger_migrate_base.py`	85.5%	13
`scripts/fix_cosmetic_aggregate.py`	92.3%	1
`scripts/coverage_history_append.py`	92.8%	2
`scripts/workflow_validator.py`	93.3%	4
`scripts/update_autofix_expectations.py`	93.9%	1
`scripts/pr_metrics_tracker.py`	95.7%	3
`scripts/generate_residual_trend.py`	96.6%	1
`scripts/build_autofix_pr_comment.py`	97.0%	2
`scripts/aggregate_agent_metrics.py`	97.2%	0
`scripts/fix_numpy_asserts.py`	98.1%	0
`scripts/sync_test_dependencies.py`	98.3%	1

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2026-01-06T17:10:32Z

🤖 Keepalive Loop Status

PR #606 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/0 complete
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

Root cause: Consumer repos were using mode: 'invite' without force_mode, causing the reusable workflow to ignore the mode and prevent automatic bootstrap PR creation when issues are labeled with agent:codex. Changes: - Change mode from 'invite' to 'create' in bridge job template - Add force_mode: true to override issue event defaults This template change will sync to all consumer repos: - Travel-Plan-Permission - Trend_Model_Project - Manager-Database - trip-planner - Template - And others When synced, all consumer repos will support automatic PR creation when issues are labeled with agent:* labels, fixing startup_failure issues. Related: Trend_Model_Project#4185, PR #606

stranske added 18 commits January 6, 2026 13:24

fix: use CODESPACES_WORKFLOWS token for merge permissions

0aad109

The workflow now uses the CODESPACES_WORKFLOWS secret which has merge permissions, falling back to GITHUB_TOKEN if not available. Successfully merged sync PRs in Manager-Database, Template, and trip-planner using this token.

fix: dynamically read REGISTERED_CONSUMER_REPOS from source file

80329a4

- Extract consumer repo list from maint-68-sync-consumer-repos.yml at runtime - Use yq to parse the authoritative REGISTERED_CONSUMER_REPOS env var - Remove duplicated hardcoded list to maintain single source of truth

Merge main into fix/workflow-startup-failure-real-fix

fcaa421

fix: quote shell variables in maint-71-merge-sync-prs.yml

4f61f04

- Quote $repos variable in yq pipeline to prevent word splitting (SC2086) - Quote $GITHUB_OUTPUT and $GITHUB_STEP_SUMMARY variables - Fixes shellcheck warnings in actionlint

fix: Remove GITHUB_TOKEN fallback in merge workflow

0b3697e

The fallback to GITHUB_TOKEN causes merge failures since GITHUB_TOKEN lacks merge permissions. Require CODESPACES_WORKFLOWS secret explicitly.

Merge main and resolve conflict

41c1d17

Keep CODESPACES_WORKFLOWS without fallback to fix merge permissions.

docs: Add critical section on syncing with main before PRs

f986d3a

Prevents merge conflicts and wasted CI resources by requiring git fetch/merge before gh pr create.

Copilot AI review requested due to automatic review settings January 6, 2026 17:06

Copilot started reviewing on behalf of stranske January 6, 2026 17:07 View session

Merge main and resolve conflicts - keep main's version without force_…

ef23020

…mode

stranske temporarily deployed to agent-high-privilege January 6, 2026 17:08 — with GitHub Actions Inactive

Copilot AI reviewed Jan 6, 2026

View reviewed changes

templates/consumer-repo/.github/workflows/agents-issue-intake.yml Show resolved Hide resolved

stranske merged commit bd48949 into main Jan 6, 2026
37 checks passed

stranske deleted the fix/workflow-startup-failure-real-fix branch January 6, 2026 17:10

This was referenced Jan 7, 2026

fix: enable automatic PR creation for agent:codex labels stranske/Trend_Model_Project#4266

Closed

[NL-7] Build LangChain NL-to-ConfigPatch Chain stranske/Trend_Model_Project#4185

Closed

stranske mentioned this pull request Jan 7, 2026

fix: enable automatic PR creation in consumer repo template #646

Merged

This was referenced Jan 7, 2026

fix: use expression syntax for force_mode boolean #647

Merged

fix: simplify force_mode - default to true, remove from template #648

Merged

fix: restore force_mode default to true for auto-create #673

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/workflow startup failure real fix#606

Fix/workflow startup failure real fix#606
stranske merged 19 commits intomainfrom
fix/workflow-startup-failure-real-fix

stranske commented Jan 6, 2026

Uh oh!

agents-workflows-bot bot commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stranske commented Jan 6, 2026

Uh oh!

agents-workflows-bot bot commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2026

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 6, 2026

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants