Merged
Conversation
Contributor
Author
|
Agent worker (codex) activated for branch @codex start Implement only this task in your first commit. |
43 tasks
Contributor
Author
🤖 Keepalive Loop StatusPR #228 | Agent: Codex | Iteration 5+9 🚀 extended Current State
🔍 Failure Classification| Error type | infrastructure | |
Keep both main's canonicalize_name/safe_display_name and the PR branch's registry-based resolve_counterparty/NameResolution architecture. The import in pipeline/run.py now includes all three symbols. https://claude.ai/code/session_012WnYCcttvFEY3FETnhVcNL
Contributor
Provider Comparison ReportProvider Summary
📋 Full Provider Details (click to expand)openai
anthropic
Agreement
Disagreement
Unique Insights
🔍 LangSmith Traces |
44 tasks
stranske
pushed a commit
that referenced
this pull request
Feb 24, 2026
Addresses 6 root causes identified in PR #228 post-mortem where the coding agent claimed 42/42 tasks complete when multiple acceptance criteria were unmet: Fix 1 - Require verification PASS before stopping: The stop decision now requires the verifier to return PASS. If verification fails, the agent is re-run to fix gaps (up to 2 attempts). Previously, verification was attempted once and ignored on failure. Fix 2 - Raise confidence thresholds in analyzeTaskCompletion: Keyword match threshold raised from 0.35 to 0.50 for HIGH confidence. Now requires 2+ matching words (not just percentage) to avoid single-word false positives. fileMatch tightened to require 2+ keywords or explicit file references. commitMatch requires 2+ substantive words. Fix 3 - Gate cascade logic for acceptance criteria: cascadeParentCheckboxes now detects acceptance criteria section headings and disables cascading within them. Each acceptance criterion must be independently checked — a checked parent no longer auto-checks children in acceptance sections. Fix 5 - Different verifier context: Verification steps now switch to the alternate agent (codex→claude or claude→codex) to avoid the structural problem where the same model that produced the work also verifies it. Configurable via verifier_agent. Fix 6 - Mechanical scope enforcement: New extractScopePatterns/validateScopeCompliance functions parse file patterns from the scope section and validate the PR diff against them. Scope violations block the tasks-complete stop decision. The verifier prompt now includes a mandatory Scope Check section. Fix 7 - Separate task/acceptance criteria tracking: Tasks and acceptance criteria are now counted independently. The stop decision requires BOTH allTasksDone AND allCriteriaMet. Auto-reconciliation only operates on task checkboxes, never acceptance criteria. Also fixes pre-existing duplicate fixAttemptMax declaration. https://claude.ai/code/session_01VtzHmRoYTL2kcxaacDgSqQ
This was referenced Feb 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated Status Summary
Scope
PR #208 addressed issue #48 but verification identified concerns (verdict: FAIL). This follow-up addresses the remaining gaps: add a stable CLI entrypoint, implement a deterministic mapping diff report generator, ensure registry-first name resolution is wired into normalization/reconciliation with mapping source attribution, and add unit/integration tests plus documentation.
Context for Agent
Related Issues/PRs
Tasks
CLI Entrypoint
mapping_diff_reportconsole script entrypoint topyproject.tomlunder[project.scripts]pointing to<package>.cli.mapping_diff_report:mainsrc/<package>/cli/mapping_diff_report.pywith argument parser that supports--helpflagconfig/name_registry.ymlpath when registry is missing or unreadableReport Generator
src/<package>/reports/mapping_diff.pywith a callable report generator function signature that accepts registry path and input sourcesUNMAPPEDsection generation that lists raw input names not present in registryFALLBACK_MAPPEDsection generation that lists input names resolved via fallback with their canonical namesSUGGESTIONSsection generation that provides canonical name suggestions for every unmapped entry using title-case transformationRegistry-First Resolution
sourcefield indicatingregistryorfallbackoriginsourcefield in the return valueUnit Tests
tests/test_mapping_diff_report_cli.pythat verifiesmapping_diff_report --helpexits with status zero and prints usage text containingmapping_diff_reportstringconfig/name_registry.ymlcauses non-zero exit and single-line stderr message containing the registry pathconfig/name_registry.ymlcauses non-zero exit and appropriate stderr messageIntegration Tests
tests/test_normalization_registry_first.pyusingname_registry_before.ymlthat verifies at least one fixture input resolves via fallback and appears inFALLBACK_MAPPEDsectionname_registry_after.ymlwith same inputs that verifies previously fallback-mapped name now resolves via registryname_registry_after.ymland asserts no warning messages contain the previously fallback-mapped raw namemapping_diff_reportoutput changes between before and after registry states for the same input setFixtures & Documentation
tests/fixtures/name_registry_before.ymlwith at least one missing alias that will trigger fallback resolutiontests/fixtures/name_registry_after.ymlwith the previously missing alias addedconfig/name_registry.yml)config/name_registry.yml, (2) runmapping_diff_report, (3) interpretUNMAPPED,FALLBACK_MAPPED,SUGGESTIONSsectionsAcceptance criteria
CLI Entrypoint
pyproject.tomldefines a[project.scripts]console entrypoint namedmapping_diff_reportmapping_diff_report --helpexits with status code0and prints usage text that includes the stringmapping_diff_reportconfig/name_registry.ymlis missing or unreadable,mapping_diff_reportexits non-zero and writes a single-line error message to stderr that includesconfig/name_registry.ymlReport Generator
src/<package>/reports/mapping_diff.pyexists and can be imported without performing IO at import timemapping_diff_reportoutput is deterministic and contains three labeled sections:UNMAPPED,FALLBACK_MAPPED, andSUGGESTIONSUNMAPPEDsection lists each input name not present in the registry fixture one per line and prints the raw input name exactly as encounteredFALLBACK_MAPPEDsection lists each input name resolved by fallback logic (not registry alias) and includes both the raw input name and resolved canonical name on each lineSUGGESTIONSsection includes a non-empty suggested canonical name for every entry inUNMAPPED, and each suggestion line follows the format<raw_input_name> -> <suggested_canonical_name>wheresuggested_canonical_nameis generated using title-case transformationRegistry-First Resolution
registryorfallbackper mapped nametests/fixtures/name_registry_before.yml, at least one fixture input resolves via fallback andmapping_diff_reportlists it underFALLBACK_MAPPEDtests/fixtures/name_registry_after.yml(same inputs), the previously fallback-mapped name resolves via registry and does not appear inFALLBACK_MAPPEDorUNMAPPEDinmapping_diff_reportoutputtests/fixtures/name_registry_after.yml, the normalization/reconciliation run emits no warning log messages containing the previously fallback-mapped raw nameDocumentation & Scope
config/name_registry.yml, (2) runmapping_diff_report, (3) interpretUNMAPPED,FALLBACK_MAPPED, andSUGGESTIONSsectionssrc/<package>/cli/*,src/<package>/reports/*,src/<package>/name_registry.py,tests/test_*registry*.py,tests/test_*mapping_diff*.py,tests/fixtures/name_registry*.yml,config/name_registry.yml,pyproject.toml(scripts section only),README.mdordocs/*.md