fix: detect saved work on cancelled keepalive runs by stranske · Pull Request #255 · stranske/Counter_Risk

stranske · 2026-02-25T21:58:28Z

Source: Issue #249

Automated Status Summary

Scope

PR #228 (issue #227) was merged with all 42 task checkboxes marked complete and the in-process verifier returning PASS. However, post-merge verification by both OpenAI (gpt-5.2, 83% confidence) and Anthropic (claude-sonnet-4-5, 95% confidence) returned FAIL, identifying concrete gaps in the implementation. This follow-up addresses the remaining unmet acceptance criteria.

Root Cause Analysis

The keepalive loop logs reveal a cascading failure across multiple systems:

Codex agent checked PR body checkboxes aggressively. The .agents/issue-227-ledger.yml file shows only task-01 (of 42) was marked done in the structured ledger. Yet the PR body went from 42 unchecked → 0 unchecked over 14 iterations. The agent edited the PR body checkboxes directly without updating the ledger, bypassing the structured tracking that was designed to prevent exactly this.
autoReconcileTasks amplified the problem. Each keepalive iteration ran an LLM-based auto-reconciliation step that auto-checked ~1 additional task per run based on loose commit-to-task matching. Over 14 iterations this added up. The reconciler used "high-confidence" matching that was insufficiently discriminating — e.g., a commit touching mapping_diff.py matched multiple task descriptions simultaneously.
cascadeParentCheckboxes may have inflated counts. The keepalive loop's cascade logic automatically checks all indented child checkboxes when a parent is checked. If the agent or reconciler checked a section-level parent, all sub-tasks under it would cascade to checked.
Codex-as-verifier was self-grading. Iteration 14 ran a verify-acceptance prompt using the same Codex agent that did the implementation work. Despite the prompt explicitly saying "Do NOT trust checkbox states as evidence of completion," the verifier returned success. This is a known LLM bias: the agent that produced the work is predisposed to confirm its own output.
The keepalive loop trusted checkbox state as the primary completion signal. When all 42 boxes were checked AND the verifier returned success AND the CI gate was green, the loop issued stop (tasks-complete). There was no independent mechanical verification of the acceptance criteria.
Scope violation was not mechanically enforced. The acceptance criterion "PR diff contains only files matching these patterns" was a text checkbox, not an automated check. The agent could (and did) check it off despite the diff containing 15+ out-of-scope files.

Context for Agent

Related Issues/PRs

Context for Agent

Related Issues/PRs

Tasks

Registry-First Resolution for Clearing House

Create resolve_clearing_house() function in src/counter_risk/normalize.py that returns NameResolution with source field using registry-first lookup before _CLEARING_HOUSE_FALLBACK_MAPPINGS
Update normalize_clearing_house() in src/counter_risk/normalize.py to delegate internally to resolve_clearing_house()
Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='registry' when name exists in registry
Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='fallback' when name is not in registry
Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() consults registry before checking fallback mappings
Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() handles missing or empty registry files without raising exceptions

Public API Source Attribution

Create normalize_counterparty_with_source() function in src/counter_risk/normalize.py that wraps resolve_counterparty() and returns NameResolution
Update docstring for normalize_counterparty_with_source() documenting the source field in returned NameResolution object
Update pipeline/run.py reconciliation logic to call normalize_counterparty_with_source() instead of resolve_counterparty() directly
Add unit test in tests/test_normalization_registry_first.py verifying normalize_counterparty_with_source() returns object with accessible .source attribute

Missing Config File

Create config/name_registry.yml with minimal valid registry containing at least the entries used by existing test fixtures
Add smoke test in tests/test_mapping_diff_report_cli.py that imports mapping_diff_report CLI and runs it against config/name_registry.yml without raising exceptions

Testing Gaps - CLI Parameter Passing

Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct registry_path parameter
Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct output_format parameter

Testing Gaps - End-to-End Report Sections

Create test fixture file tests/fixtures/unmapped_names.csv with known unmapped counterparty names
Create test fixture file tests/fixtures/fallback_mapped_names.csv with known fallback-mapped counterparty names
Add integration test in tests/test_mapping_diff_report_cli.py that runs mapping_diff_report CLI with fixtures and captures stdout
Add assertion in integration test verifying UNMAPPED section header appears in captured output
Add assertion in integration test verifying FALLBACK_MAPPED section header appears in captured output
Add assertion in integration test verifying SUGGESTIONS section header appears in captured output

Testing Gaps - Pipeline Integration

Create fixture file tests/fixtures/name_registry_before.yml representing initial registry state for pipeline testing
Create fixture file tests/fixtures/name_registry_after.yml representing updated registry state with additional mappings
Add integration test in tests/test_normalization_registry_first.py that loads pipeline with name_registry_before.yml
Add assertion verifying pipeline reconciliation uses registry-first resolution by checking NameResolution.source values
Add assertion verifying pipeline produces different NameResolution.source values when run with name_registry_after.yml

Acceptance criteria

Registry-First Resolution

normalize_clearing_house() consults the name registry before _CLEARING_HOUSE_FALLBACK_MAPPINGS and the resolution path records source as registry or fallback
resolve_clearing_house() exists in src/counter_risk/normalize.py and returns a NameResolution object with source field set to registry or fallback
Calling normalize_counterparty_with_source().source succeeds without AttributeError

Config File

config/name_registry.yml exists in repository root
config/name_registry.yml parses without YAML syntax errors
load_name_registry('config/name_registry.yml') executes without raising exceptions

Testing - CLI

Running mapping_diff_report --registry config/name_registry.yml completes with exit code 0
Unit test in tests/test_mapping_diff_report_cli.py mocks generate_mapping_diff_report and asserts it receives expected parameters
All tests in tests/test_mapping_diff_report_cli.py pass

Testing - Report Sections

Integration test runs mapping_diff_report CLI with fixtures and captures output to string
Test assertions verify string contains UNMAPPED substring
Test assertions verify string contains FALLBACK_MAPPED substring
Test assertions verify string contains SUGGESTIONS substring

Testing - Pipeline Integration

Integration test in tests/test_normalization_registry_first.py runs pipeline reconciliation with two different registry files
Test assertions verify at least one NameResolution.source value differs between the two pipeline runs
All tests in tests/test_normalization_registry_first.py pass

Scope Constraint

PR diff contains only files matching these patterns: src/counter_risk/normalize.py, src/counter_risk/cli/*, src/counter_risk/reports/*, config/name_registry.yml, tests/test_*registry*.py, tests/test_*mapping_diff*.py, tests/fixtures/*.yml, tests/fixtures/*.csv, pyproject.toml, README.md, docs/*.md

Overall

All new tests pass in CI
All existing tests continue to pass in CI

Head SHA: 5982550
Latest Runs: ✅ success — Gate
Required: gate: ✅ success

Workflow / Job	Result	Logs
.github/workflows/autofix.yml	❌ failure	View run
Agents PR Event Hub	⏭️ skipped	View run
Agents PR Meta	❔ in progress	View run
Dependabot Auto-merge	⏭️ skipped	View run
Gate	✅ success	View run
Health 45 Agents Guard	✅ success	View run

When a keepalive run is cancelled (typically timeout), agent outputs are lost. This adds branch-level commit detection: when runResult is 'cancelled' and agent outputs are empty, the summary function checks the PR branch for recent commits (pre-timeout checkpoints, codex-keepalive commits) to determine if work was saved. If saved work is detected, the summary shows "Timed out (work saved)" instead of "Cancelled" and informs the user that the next iteration will continue from where the agent left off. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

stranske-keepalive · 2026-02-25T21:58:44Z

⚠️ Action Required: Unable to determine source issue for PR #255. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

Copilot

Pull request overview

Updates the keepalive loop summary logic so cancelled (typically timed-out) runs can detect whether the agent/watchdog saved work via recent commits on the PR branch, and surfaces that in the PR summary comment UX.

Changes:

Add branch commit inspection on cancelled runs to infer whether work was saved despite missing agent outputs.
Introduce a new summary reason (agent-run-cancelled-with-saved-work) and adjust the cancelled-run status messaging accordingly.
Add guidance text in the summary comment when saved work is detected.

Comments suppressed due to low confidence (1)

.github/scripts/keepalive_loop.js:3390

The savedWork signal is set when any recent commit matches (pre-timeout checkpoint, codex-keepalive, or apply updates), but the note text claims specifically that the pre-timeout watchdog committed work. This can be misleading when the matching commit was created by the agent itself. Consider rewording the note to be attribution-agnostic, or tightening detection to only watchdog-specific commit messages if you want to keep this wording.

            '**Note:** The pre-timeout watchdog committed work before the job was cancelled.',
            'The next keepalive iteration will continue from where the agent left off.',
          );

.github/scripts/keepalive_loop.js

stranske-keepalive · 2026-02-25T22:05:40Z

🤖 Keepalive Loop Status

PR #255 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	42/44 complete
Timeout	45 min (default)
Timeout usage	7m elapsed (17%, 38m remaining)
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

stranske-keepalive · 2026-02-25T22:05:41Z

Keepalive Work Log (click to expand)

Time (UTC)	Agent	Action	Result	Files	Tasks	Progress	Commit	Gate
2026-02-25 22:05:40	Codex	wait (missing-agent-label-transient)	skipped	—	0	0/6	—	success
2026-02-25 23:24:02	Codex	wait (missing-agent-label-transient)	skipped	—	+4	42/44	—	failure
2026-02-26 00:29:15	Codex	wait (missing-agent-label-transient)	skipped	—	0	42/44	—	success

…etion-concerns-I1gRT

Three issues identified by independent verification of issue #239 work: 1. pipeline/run.py now calls normalize_counterparty_with_source() instead of resolve_counterparty() directly, fulfilling the public API source attribution task. The test monkeypatch is updated to match. 2. Added two missing tests for resolve_clearing_house() handling missing and empty registry files without raising exceptions. 3. Fixed test_mapping_diff_report_unreadable_registry_exits_nonzero which failed when running as root (chmod(0) is a no-op for root; the empty entries list triggers a validation error instead of a permission error). All 21 tests in the target files pass. Full suite: 787 passed, 5 skipped (6 pre-existing sitecustomize failures unrelated to these changes). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

1. Remove unused resolve_counterparty import from pipeline/run.py (F401 lint failure in Gate CI). 2. Fix fetchPullRequestCached call: use options object ({ github, context, prNumber, core }) instead of positional args. The previous call would always return null, preventing saved-work detection. 3. Use previousState.updated_at instead of current_iteration_at for the since parameter in branch commit detection. loadKeepaliveState calls applyIterationTracking which resets current_iteration_at to "now", making it useless as a time range bound for listCommits. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6

Copilot AI review requested due to automatic review settings February 25, 2026 21:58

Copilot started reviewing on behalf of stranske February 25, 2026 21:58 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

.github/scripts/keepalive_loop.js Outdated Show resolved Hide resolved

.github/scripts/keepalive_loop.js Outdated Show resolved Hide resolved

claude added 3 commits February 25, 2026 23:14

Merge remote-tracking branch 'origin/main' into claude/fix-task-compl…

c7ff977

…etion-concerns-I1gRT

stranske merged commit 23e5647 into main Feb 26, 2026
487 checks passed

stranske mentioned this pull request Feb 26, 2026

fix: remove unused typing.Any import (lint F401) #257

Merged

51 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: detect saved work on cancelled keepalive runs#255

fix: detect saved work on cancelled keepalive runs#255
stranske merged 4 commits intomainfrom
claude/fix-task-completion-concerns-I1gRT

stranske commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

stranske-keepalive bot commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

stranske-keepalive bot commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

stranske-keepalive bot commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stranske commented Feb 25, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Scope

Root Cause Analysis

Context for Agent

Related Issues/PRs

Context for Agent

Related Issues/PRs

Tasks

Registry-First Resolution for Clearing House

Public API Source Attribution

Missing Config File

Testing Gaps - CLI Parameter Passing

Testing Gaps - End-to-End Report Sections

Testing Gaps - Pipeline Integration

Acceptance criteria

Registry-First Resolution

Config File

Testing - CLI

Testing - Report Sections

Testing - Pipeline Integration

Scope Constraint

Overall

Uh oh!

stranske-keepalive bot commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

stranske-keepalive bot commented Feb 25, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

stranske-keepalive bot commented Feb 25, 2026 • edited by agents-workflows-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stranske commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading

stranske-keepalive bot commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading

stranske-keepalive bot commented Feb 25, 2026 •

edited by agents-workflows-bot bot

Loading