fix: detect saved work on cancelled keepalive runs#255
Merged
Conversation
When a keepalive run is cancelled (typically timeout), agent outputs are lost. This adds branch-level commit detection: when runResult is 'cancelled' and agent outputs are empty, the summary function checks the PR branch for recent commits (pre-timeout checkpoints, codex-keepalive commits) to determine if work was saved. If saved work is detected, the summary shows "Timed out (work saved)" instead of "Cancelled" and informs the user that the next iteration will continue from where the agent left off. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Contributor
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the keepalive loop summary logic so cancelled (typically timed-out) runs can detect whether the agent/watchdog saved work via recent commits on the PR branch, and surfaces that in the PR summary comment UX.
Changes:
- Add branch commit inspection on cancelled runs to infer whether work was saved despite missing agent outputs.
- Introduce a new summary reason (
agent-run-cancelled-with-saved-work) and adjust the cancelled-run status messaging accordingly. - Add guidance text in the summary comment when saved work is detected.
Comments suppressed due to low confidence (1)
.github/scripts/keepalive_loop.js:3390
- The
savedWorksignal is set when any recent commit matches (pre-timeout checkpoint,codex-keepalive, orapply updates), but the note text claims specifically that the pre-timeout watchdog committed work. This can be misleading when the matching commit was created by the agent itself. Consider rewording the note to be attribution-agnostic, or tightening detection to only watchdog-specific commit messages if you want to keep this wording.
'**Note:** The pre-timeout watchdog committed work before the job was cancelled.',
'The next keepalive iteration will continue from where the agent left off.',
);
Contributor
🤖 Keepalive Loop StatusPR #255 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Contributor
Keepalive Work Log (click to expand)
|
…etion-concerns-I1gRT
Three issues identified by independent verification of issue #239 work: 1. pipeline/run.py now calls normalize_counterparty_with_source() instead of resolve_counterparty() directly, fulfilling the public API source attribution task. The test monkeypatch is updated to match. 2. Added two missing tests for resolve_clearing_house() handling missing and empty registry files without raising exceptions. 3. Fixed test_mapping_diff_report_unreadable_registry_exits_nonzero which failed when running as root (chmod(0) is a no-op for root; the empty entries list triggers a validation error instead of a permission error). All 21 tests in the target files pass. Full suite: 787 passed, 5 skipped (6 pre-existing sitecustomize failures unrelated to these changes). https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
1. Remove unused resolve_counterparty import from pipeline/run.py (F401
lint failure in Gate CI).
2. Fix fetchPullRequestCached call: use options object ({ github,
context, prNumber, core }) instead of positional args. The previous
call would always return null, preventing saved-work detection.
3. Use previousState.updated_at instead of current_iteration_at for the
since parameter in branch commit detection. loadKeepaliveState calls
applyIterationTracking which resets current_iteration_at to "now",
making it useless as a time range bound for listCommits.
https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
51 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated Status Summary
Scope
PR #228 (issue #227) was merged with all 42 task checkboxes marked complete and the in-process verifier returning PASS. However, post-merge verification by both OpenAI (gpt-5.2, 83% confidence) and Anthropic (claude-sonnet-4-5, 95% confidence) returned FAIL, identifying concrete gaps in the implementation. This follow-up addresses the remaining unmet acceptance criteria.
Root Cause Analysis
The keepalive loop logs reveal a cascading failure across multiple systems:
Codex agent checked PR body checkboxes aggressively. The
.agents/issue-227-ledger.ymlfile shows onlytask-01(of 42) was markeddonein the structured ledger. Yet the PR body went from 42 unchecked → 0 unchecked over 14 iterations. The agent edited the PR body checkboxes directly without updating the ledger, bypassing the structured tracking that was designed to prevent exactly this.autoReconcileTasksamplified the problem. Each keepalive iteration ran an LLM-based auto-reconciliation step that auto-checked ~1 additional task per run based on loose commit-to-task matching. Over 14 iterations this added up. The reconciler used "high-confidence" matching that was insufficiently discriminating — e.g., a commit touchingmapping_diff.pymatched multiple task descriptions simultaneously.cascadeParentCheckboxesmay have inflated counts. The keepalive loop's cascade logic automatically checks all indented child checkboxes when a parent is checked. If the agent or reconciler checked a section-level parent, all sub-tasks under it would cascade to checked.Codex-as-verifier was self-grading. Iteration 14 ran a
verify-acceptanceprompt using the same Codex agent that did the implementation work. Despite the prompt explicitly saying "Do NOT trust checkbox states as evidence of completion," the verifier returnedsuccess. This is a known LLM bias: the agent that produced the work is predisposed to confirm its own output.The keepalive loop trusted checkbox state as the primary completion signal. When all 42 boxes were checked AND the verifier returned success AND the CI gate was green, the loop issued
stop (tasks-complete). There was no independent mechanical verification of the acceptance criteria.Scope violation was not mechanically enforced. The acceptance criterion "PR diff contains only files matching these patterns" was a text checkbox, not an automated check. The agent could (and did) check it off despite the diff containing 15+ out-of-scope files.
Context for Agent
Related Issues/PRs
Context for Agent
Related Issues/PRs
Tasks
Registry-First Resolution for Clearing House
resolve_clearing_house()function insrc/counter_risk/normalize.pythat returnsNameResolutionwithsourcefield using registry-first lookup before_CLEARING_HOUSE_FALLBACK_MAPPINGSnormalize_clearing_house()insrc/counter_risk/normalize.pyto delegate internally toresolve_clearing_house()tests/test_normalization_registry_first.pyverifyingresolve_clearing_house()returnssource='registry'when name exists in registrytests/test_normalization_registry_first.pyverifyingresolve_clearing_house()returnssource='fallback'when name is not in registrytests/test_normalization_registry_first.pyverifyingresolve_clearing_house()consults registry before checking fallback mappingstests/test_normalization_registry_first.pyverifyingresolve_clearing_house()handles missing or empty registry files without raising exceptionsPublic API Source Attribution
normalize_counterparty_with_source()function insrc/counter_risk/normalize.pythat wrapsresolve_counterparty()and returnsNameResolutionnormalize_counterparty_with_source()documenting thesourcefield in returnedNameResolutionobjectpipeline/run.pyreconciliation logic to callnormalize_counterparty_with_source()instead ofresolve_counterparty()directlytests/test_normalization_registry_first.pyverifyingnormalize_counterparty_with_source()returns object with accessible.sourceattributeMissing Config File
config/name_registry.ymlwith minimal valid registry containing at least the entries used by existing test fixturestests/test_mapping_diff_report_cli.pythat importsmapping_diff_reportCLI and runs it againstconfig/name_registry.ymlwithout raising exceptionsTesting Gaps - CLI Parameter Passing
tests/test_mapping_diff_report_cli.pythat mocksgenerate_mapping_diff_reportand verifies CLI forwards correctregistry_pathparametertests/test_mapping_diff_report_cli.pythat mocksgenerate_mapping_diff_reportand verifies CLI forwards correctoutput_formatparameterTesting Gaps - End-to-End Report Sections
tests/fixtures/unmapped_names.csvwith known unmapped counterparty namestests/fixtures/fallback_mapped_names.csvwith known fallback-mapped counterparty namestests/test_mapping_diff_report_cli.pythat runsmapping_diff_reportCLI with fixtures and captures stdoutUNMAPPEDsection header appears in captured outputFALLBACK_MAPPEDsection header appears in captured outputSUGGESTIONSsection header appears in captured outputTesting Gaps - Pipeline Integration
tests/fixtures/name_registry_before.ymlrepresenting initial registry state for pipeline testingtests/fixtures/name_registry_after.ymlrepresenting updated registry state with additional mappingstests/test_normalization_registry_first.pythat loads pipeline withname_registry_before.ymlNameResolution.sourcevaluesNameResolution.sourcevalues when run withname_registry_after.ymlAcceptance criteria
Registry-First Resolution
normalize_clearing_house()consults the name registry before_CLEARING_HOUSE_FALLBACK_MAPPINGSand the resolution path recordssourceasregistryorfallbackresolve_clearing_house()exists insrc/counter_risk/normalize.pyand returns aNameResolutionobject withsourcefield set toregistryorfallbacknormalize_counterparty_with_source().sourcesucceeds withoutAttributeErrorConfig File
config/name_registry.ymlexists in repository rootconfig/name_registry.ymlparses without YAML syntax errorsload_name_registry('config/name_registry.yml')executes without raising exceptionsTesting - CLI
mapping_diff_report --registry config/name_registry.ymlcompletes with exit code 0tests/test_mapping_diff_report_cli.pymocksgenerate_mapping_diff_reportand asserts it receives expected parameterstests/test_mapping_diff_report_cli.pypassTesting - Report Sections
mapping_diff_reportCLI with fixtures and captures output to stringUNMAPPEDsubstringFALLBACK_MAPPEDsubstringSUGGESTIONSsubstringTesting - Pipeline Integration
tests/test_normalization_registry_first.pyruns pipeline reconciliation with two different registry filesNameResolution.sourcevalue differs between the two pipeline runstests/test_normalization_registry_first.pypassScope Constraint
src/counter_risk/normalize.py,src/counter_risk/cli/*,src/counter_risk/reports/*,config/name_registry.yml,tests/test_*registry*.py,tests/test_*mapping_diff*.py,tests/fixtures/*.yml,tests/fixtures/*.csv,pyproject.toml,README.md,docs/*.mdOverall
Head SHA: 5982550
Latest Runs: ✅ success — Gate
Required: gate: ✅ success