Skip to content

fix: add coverage verification rules to Codex prompts#379

Merged
stranske merged 2 commits intomainfrom
fix/codex-coverage-verification
Dec 31, 2025
Merged

fix: add coverage verification rules to Codex prompts#379
stranske merged 2 commits intomainfrom
fix/codex-coverage-verification

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Dec 31, 2025

Automated Status Summary

Scope

Test coverage is at 71.60% with 17 scripts below 95% coverage. Low coverage makes scripts risky to modify and harder to maintain.

Tasks

  • Increase test coverage for scripts/sync_tool_versions.py from 0.00% to 95%
  • Increase test coverage for scripts/update_residual_history.py from 0.00% to 95%
  • Increase test coverage for scripts/validate_version_pins.py from 0.00% to 95%
  • Increase test coverage for scripts/sync_test_dependencies.py from 15.32% to 95%
  • Increase test coverage for scripts/auto_type_hygiene.py from 34.78% to 95%
  • Increase test coverage for scripts/keepalive_metrics_collector.py from 46.48% to 95%
  • Increase test coverage for scripts/keepalive_metrics_dashboard.py from 56.67% to 95%
  • Increase test coverage for scripts/workflow_health_check.py from 62.62% to 95%
  • Increase test coverage for scripts/classify_test_failures.py from 62.87% to 95%
  • Increase test coverage for scripts/mypy_autofix.py from 63.08% to 95%
  • Increase test coverage for scripts/ledger_validate.py from 65.32% to 95%
  • Increase test coverage for scripts/mypy_return_autofix.py from 82.55% to 95%
  • Increase test coverage for scripts/ledger_migrate_base.py from 85.48% to 95%
  • Increase test coverage for scripts/ci_failure_analyzer.py from 87.35% to 95%
  • Increase test coverage for scripts/fix_cosmetic_aggregate.py from 92.31% to 95%
  • Increase test coverage for scripts/coverage_history_append.py from 92.75% to 95%
  • Increase test coverage for scripts/workflow_validator.py from 93.27% to 95%

Acceptance criteria

  • Overall coverage ≥95% (verify with pytest tests/ --cov=scripts --cov-report=term-missing)
  • Each script in scripts/ has ≥95% coverage
  • All 592+ existing tests pass
  • New tests in tests/scripts/ directory

Head SHA: 1dfdfdb
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress

Workflow / Job Result Logs
Agents PR meta manager ❔ in progress View run
CI Autofix Loop ✅ success View run
Copilot code review ❔ in progress View run
Gate ❔ in progress View run
Health 40 Sweep ✅ success View run
Health 44 Gate Branch Protection ❔ in progress View run
Health 45 Agents Guard ✅ success View run
Health 50 Security Scan ❔ in progress View run
Maint 52 Validate Workflows ✅ success View run
PR 11 - Minimal invariant CI ✅ success View run
Selftest CI ❔ in progress View run
Validate Sync Manifest ✅ success View run

Codex was marking coverage tasks complete after adding tests without
verifying actual coverage reached the target percentage. This adds a
COVERAGE TASKS - SPECIAL RULES section to all keepalive prompt templates
that requires Codex to:

1. Run pytest with --cov after adding tests
2. Find the specific script in the coverage output
3. Verify coverage meets the target before marking complete
4. Continue adding tests if below target

This addresses the issue where PR #378 marked all 17 coverage tasks as
complete despite only adding one test file.

Files updated:
- .github/templates/keepalive-instruction.md
- .github/codex/prompts/keepalive_next_task.md
- templates/consumer-repo/.github/templates/keepalive-instruction.md
- templates/consumer-repo/.github/codex/prompts/keepalive_next_task.md
- agents/codex-prompt.md
Copilot AI review requested due to automatic review settings December 31, 2025 12:06
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions
Copy link
Copy Markdown
Contributor

Gate fast-pass: docs-only change detected; heavy checks skipped.

@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #379. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 31, 2025

Automated Status Summary

Head SHA: 06dc563
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

Test coverage is at 71.60% with 17 scripts below 95% coverage. Low coverage makes scripts risky to modify and harder to maintain.

Tasks

  • Increase test coverage for scripts/sync_tool_versions.py from 0.00% to 95%
  • Increase test coverage for scripts/update_residual_history.py from 0.00% to 95%
  • Increase test coverage for scripts/validate_version_pins.py from 0.00% to 95%
  • Increase test coverage for scripts/sync_test_dependencies.py from 15.32% to 95%
  • Increase test coverage for scripts/auto_type_hygiene.py from 34.78% to 95%
  • Increase test coverage for scripts/keepalive_metrics_collector.py from 46.48% to 95%
  • Increase test coverage for scripts/keepalive_metrics_dashboard.py from 56.67% to 95%
  • Increase test coverage for scripts/workflow_health_check.py from 62.62% to 95%
  • Increase test coverage for scripts/classify_test_failures.py from 62.87% to 95%
  • Increase test coverage for scripts/mypy_autofix.py from 63.08% to 95%
  • Increase test coverage for scripts/ledger_validate.py from 65.32% to 95%
  • Increase test coverage for scripts/mypy_return_autofix.py from 82.55% to 95%
  • Increase test coverage for scripts/ledger_migrate_base.py from 85.48% to 95%
  • Increase test coverage for scripts/ci_failure_analyzer.py from 87.35% to 95%
  • Increase test coverage for scripts/fix_cosmetic_aggregate.py from 92.31% to 95%
  • Increase test coverage for scripts/coverage_history_append.py from 92.75% to 95%
  • Increase test coverage for scripts/workflow_validator.py from 93.27% to 95%

Acceptance criteria

  • Overall coverage ≥95% (verify with pytest tests/ --cov=scripts --cov-report=term-missing)
  • Each script in scripts/ has ≥95% coverage
  • All 592+ existing tests pass
  • New tests in tests/scripts/ directory

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 31, 2025

🤖 Keepalive Loop Status

PR #379 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Gate success
Tasks 19/21 complete
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical issue where Codex was prematurely marking coverage tasks as complete without verifying that actual coverage met the target percentage. The fix adds explicit verification rules to all keepalive prompt templates to ensure Codex runs coverage commands and validates the results before marking tasks complete.

  • Adds a new "COVERAGE TASKS - SPECIAL RULES" section to all keepalive prompt templates
  • Specifies a 5-step verification process requiring Codex to run pytest coverage commands and validate output
  • Ensures consistency across both main repository and consumer-repo template files

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
.github/templates/keepalive-instruction.md Adds coverage verification rules requiring pytest execution and output validation
.github/codex/prompts/keepalive_next_task.md Adds identical coverage verification rules to the next task prompt template
templates/consumer-repo/.github/templates/keepalive-instruction.md Adds coverage verification rules to the consumer repo template
templates/consumer-repo/.github/codex/prompts/keepalive_next_task.md Adds coverage verification rules to the consumer repo next task prompt template
agents/codex-prompt.md Adds coverage verification rules to the main Codex agent prompt

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@stranske stranske merged commit ea5207f into main Dec 31, 2025
31 checks passed
@stranske stranske deleted the fix/codex-coverage-verification branch December 31, 2025 12:09
stranske added a commit that referenced this pull request Jan 12, 2026
Variables GITHUB_MODELS_BASE_URL and GITHUB_DEFAULT_MODEL violated N806
naming convention. The sync workflow only validates E,W,F,I,B but consumer
repos run full lint including naming checks, causing sync PRs to fail CI.

Fixes sync PRs:
- Template #167
- Travel-Plan-Permission #379
stranske added a commit that referenced this pull request Jan 12, 2026
Variables GITHUB_MODELS_BASE_URL and GITHUB_DEFAULT_MODEL violated N806
naming convention. The sync workflow only validates E,W,F,I,B but consumer
repos run full lint including naming checks, causing sync PRs to fail CI.

Fixes sync PRs:
- Template #167
- Travel-Plan-Permission #379
@stranske stranske mentioned this pull request Jan 12, 2026
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants