Skip to content

Fix/workflow startup failure real fix#605

Merged
stranske merged 16 commits intomainfrom
fix/workflow-startup-failure-real-fix
Jan 6, 2026
Merged

Fix/workflow startup failure real fix#605
stranske merged 16 commits intomainfrom
fix/workflow-startup-failure-real-fix

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Jan 6, 2026

No description provided.

* fix: correct YAML syntax in agents-issue-intake.yml template

The 'if' condition in the check_labels job was improperly formatted,
causing the line to wrap incorrectly with 'runs-on' ending up on the
same line. This resulted in startup_failure errors when the workflow
was deployed to consumer repos.

Changes:
- Use multiline scalar (|) for complex if condition
- Properly indent continuation lines
- Ensure runs-on is on its own line

Fixes workflow failures in stranske/Travel-Plan-Permission and other
consumer repositories using this template.

* fix: add validation safeguards for template changes

Problem: Template changes sync to 4+ consumer repos. A syntax error
in agents-issue-intake.yml caused startup_failure in all consumer
repos because there was no validation preventing bad templates.

Changes:
1. Fix YAML syntax error in check_labels job (multiline if condition)
2. Add validate_workflow_yaml.py script to catch YAML/style issues
3. Add pre-commit hook to validate templates before commit
4. Add CRITICAL section to CLAUDE.md about template changes

Safeguards added:
- Pre-commit hook blocks template commits with validation errors
- Script checks: YAML syntax, line length (100), runs-on placement
- Clear warning in CLAUDE.md with validation commands
- Enforces repo standards before sync

Related: Travel-Plan-Permission#253, Workflows#602
* fix: correct YAML syntax in agents-issue-intake.yml template

The 'if' condition in the check_labels job was improperly formatted,
causing the line to wrap incorrectly with 'runs-on' ending up on the
same line. This resulted in startup_failure errors when the workflow
was deployed to consumer repos.

Changes:
- Use multiline scalar (|) for complex if condition
- Properly indent continuation lines
- Ensure runs-on is on its own line

Fixes workflow failures in stranske/Travel-Plan-Permission and other
consumer repositories using this template.

* fix: add validation safeguards for template changes

Problem: Template changes sync to 4+ consumer repos. A syntax error
in agents-issue-intake.yml caused startup_failure in all consumer
repos because there was no validation preventing bad templates.

Changes:
1. Fix YAML syntax error in check_labels job (multiline if condition)
2. Add validate_workflow_yaml.py script to catch YAML/style issues
3. Add pre-commit hook to validate templates before commit
4. Add CRITICAL section to CLAUDE.md about template changes

Safeguards added:
- Pre-commit hook blocks template commits with validation errors
- Script checks: YAML syntax, line length (100), runs-on placement
- Clear warning in CLAUDE.md with validation commands
- Enforces repo standards before sync

Related: Travel-Plan-Permission#253, Workflows#602
The workflow now uses the CODESPACES_WORKFLOWS secret which has
merge permissions, falling back to GITHUB_TOKEN if not available.

Successfully merged sync PRs in Manager-Database, Template, and
trip-planner using this token.
- Parse multiline REGISTERED_CONSUMER_REPOS env var instead of hardcoded list
- Add stale PR cleanup: close and delete branches for older sync PRs
- Process repos in order from REGISTERED_CONSUMER_REPOS (7 repos total)
- Increase per_page to 20 to catch multiple stale PRs
- Add stale_closed status tracking in summary
- Extract consumer repo list from maint-68-sync-consumer-repos.yml at runtime
- Use yq to parse the authoritative REGISTERED_CONSUMER_REPOS env var
- Remove duplicated hardcoded list to maintain single source of truth
- Change default max_length from 150 to 100 to match repo standards (black, ruff, isort)
- Add explicit encoding='utf-8' to all file operations for cross-platform compatibility
- Remove redundant condition check (already verified by elif condition)
- Add critical section to CLAUDE.md about checking new workflows for file artifacts
- Create comprehensive WORKFLOW_ARTIFACT_CHECKLIST.md with decision trees and examples
- Document common artifact patterns that cause merge conflicts in consumer repos
- Provide recovery procedures for artifact pollution
- Emphasize template workflows sync to 7+ repos (one mistake = 7+ conflicts)
- Require addressing ALL bot comments before merging PRs
- Document that bot comments are mandatory fixes, not suggestions
- Provide process for evaluating and resolving bot feedback
- Emphasize impact: ignored comments → bugs in 7+ consumer repos
- Add examples of critical issues bots catch (encoding, defaults, logic)
- Add workflow to EXPECTED_NAMES test mapping
- Document in docs/ci/WORKFLOWS.md with description
- Add to docs/ci/WORKFLOW_SYSTEM.md workflow table
- Fixes test failures: test_canonical_workflow_names_match_expected_mapping, test_workflow_names_match_filename_convention, test_inventory_docs_list_all_workflows
- Quote $repos variable in yq pipeline to prevent word splitting (SC2086)
- Quote $GITHUB_OUTPUT and $GITHUB_STEP_SUMMARY variables
- Fixes shellcheck warnings in actionlint
The fallback to GITHUB_TOKEN causes merge failures since GITHUB_TOKEN
lacks merge permissions. Require CODESPACES_WORKFLOWS secret explicitly.
Keep CODESPACES_WORKFLOWS without fallback to fix merge permissions.
Prevents merge conflicts and wasted CI resources by requiring
git fetch/merge before gh pr create.
- Consumer repos should automatically create bootstrap PRs when issues
  are labeled with agent:codex or similar labels
- Previously used 'invite' mode which only waits for humans to create PRs
- Changed template to 'create' mode to enable automatic PR creation
- This will propagate to all consumer repos via sync workflow

Also fixed line length issues to pass validation.
Copilot AI review requested due to automatic review settings January 6, 2026 15:53
@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #605. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@stranske stranske temporarily deployed to agent-high-privilege January 6, 2026 15:54 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 6, 2026

Automated Status Summary

Head SHA: fc7ed8f
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 92.21%
Baseline 85.00%
Delta +7.21%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
scripts/workflow_health_check.py 62.6% 28
scripts/classify_test_failures.py 62.9% 37
scripts/ledger_validate.py 65.3% 63
scripts/mypy_return_autofix.py 82.6% 11
scripts/ledger_migrate_base.py 85.5% 13
scripts/fix_cosmetic_aggregate.py 92.3% 1
scripts/coverage_history_append.py 92.8% 2
scripts/workflow_validator.py 93.3% 4
scripts/update_autofix_expectations.py 93.9% 1
scripts/pr_metrics_tracker.py 95.7% 3
scripts/generate_residual_trend.py 96.6% 1
scripts/build_autofix_pr_comment.py 97.0% 2
scripts/aggregate_agent_metrics.py 97.2% 0
scripts/fix_numpy_asserts.py 98.1% 0
scripts/sync_test_dependencies.py 98.3% 1

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 6, 2026

🤖 Keepalive Loop Status

PR #605 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 0/0 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

@stranske stranske merged commit 87b7f1c into main Jan 6, 2026
37 checks passed
@stranske stranske deleted the fix/workflow-startup-failure-real-fix branch January 6, 2026 15:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a workflow startup failure by changing the PR creation mode in the agents issue intake workflow template. The changes include code formatting improvements and a critical mode parameter update.

Key changes:

  • Changed the PR creation mode from "invite" to "create" for the reusable issue bridge workflow
  • Reformatted multi-line comments for better readability
  • Split long agent label extraction logic across multiple lines for clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

agent: ${{ needs.check_labels.outputs.agent }}
issue_number: ${{ needs.check_labels.outputs.issue_number }}
mode: "invite"
mode: "create"
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the mode from "invite" to "create" will only affect manual workflow_dispatch runs. For issue-triggered events (opened, reopened, labeled), the reusable workflow's "Select PR mode" step forces the mode back to "invite" regardless of the input value (see lines 257-269 in reusable-agents-issue-bridge.yml). If this fix is intended to apply to issue-triggered events, the override logic in the reusable workflow needs to be adjusted as well.

Suggested change
mode: "create"
mode: "invite"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants