Skip to content

fix: add exponential backoff retry to setup-api-client npm install#253

Merged
stranske merged 2 commits intomainfrom
claude/fix-task-completion-concerns-I1gRT
Feb 25, 2026
Merged

fix: add exponential backoff retry to setup-api-client npm install#253
stranske merged 2 commits intomainfrom
claude/fix-task-completion-concerns-I1gRT

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Feb 25, 2026

Source: Issue #249

Automated Status Summary

Scope

PR #228 (issue #227) was merged with all 42 task checkboxes marked complete and the in-process verifier returning PASS. However, post-merge verification by both OpenAI (gpt-5.2, 83% confidence) and Anthropic (claude-sonnet-4-5, 95% confidence) returned FAIL, identifying concrete gaps in the implementation. This follow-up addresses the remaining unmet acceptance criteria.

Root Cause Analysis

The keepalive loop logs reveal a cascading failure across multiple systems:

  1. Codex agent checked PR body checkboxes aggressively. The .agents/issue-227-ledger.yml file shows only task-01 (of 42) was marked done in the structured ledger. Yet the PR body went from 42 unchecked → 0 unchecked over 14 iterations. The agent edited the PR body checkboxes directly without updating the ledger, bypassing the structured tracking that was designed to prevent exactly this.

  2. autoReconcileTasks amplified the problem. Each keepalive iteration ran an LLM-based auto-reconciliation step that auto-checked ~1 additional task per run based on loose commit-to-task matching. Over 14 iterations this added up. The reconciler used "high-confidence" matching that was insufficiently discriminating — e.g., a commit touching mapping_diff.py matched multiple task descriptions simultaneously.

  3. cascadeParentCheckboxes may have inflated counts. The keepalive loop's cascade logic automatically checks all indented child checkboxes when a parent is checked. If the agent or reconciler checked a section-level parent, all sub-tasks under it would cascade to checked.

  4. Codex-as-verifier was self-grading. Iteration 14 ran a verify-acceptance prompt using the same Codex agent that did the implementation work. Despite the prompt explicitly saying "Do NOT trust checkbox states as evidence of completion," the verifier returned success. This is a known LLM bias: the agent that produced the work is predisposed to confirm its own output.

  5. The keepalive loop trusted checkbox state as the primary completion signal. When all 42 boxes were checked AND the verifier returned success AND the CI gate was green, the loop issued stop (tasks-complete). There was no independent mechanical verification of the acceptance criteria.

  6. Scope violation was not mechanically enforced. The acceptance criterion "PR diff contains only files matching these patterns" was a text checkbox, not an automated check. The agent could (and did) check it off despite the diff containing 15+ out-of-scope files.

Context for Agent

Related Issues/PRs

Context for Agent

Related Issues/PRs

Tasks

Registry-First Resolution for Clearing House

  • Create resolve_clearing_house() function in src/counter_risk/normalize.py that returns NameResolution with source field using registry-first lookup before _CLEARING_HOUSE_FALLBACK_MAPPINGS
  • Update normalize_clearing_house() in src/counter_risk/normalize.py to delegate internally to resolve_clearing_house()
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='registry' when name exists in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='fallback' when name is not in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() consults registry before checking fallback mappings
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() handles missing or empty registry files without raising exceptions

Public API Source Attribution

  • Create normalize_counterparty_with_source() function in src/counter_risk/normalize.py that wraps resolve_counterparty() and returns NameResolution
  • Update docstring for normalize_counterparty_with_source() documenting the source field in returned NameResolution object
  • Update pipeline/run.py reconciliation logic to call normalize_counterparty_with_source() instead of resolve_counterparty() directly
  • Add unit test in tests/test_normalization_registry_first.py verifying normalize_counterparty_with_source() returns object with accessible .source attribute

Missing Config File

  • Create config/name_registry.yml with minimal valid registry containing at least the entries used by existing test fixtures
  • Add smoke test in tests/test_mapping_diff_report_cli.py that imports mapping_diff_report CLI and runs it against config/name_registry.yml without raising exceptions

Testing Gaps - CLI Parameter Passing

  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct registry_path parameter
  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct output_format parameter

Testing Gaps - End-to-End Report Sections

  • Create test fixture file tests/fixtures/unmapped_names.csv with known unmapped counterparty names
  • Create test fixture file tests/fixtures/fallback_mapped_names.csv with known fallback-mapped counterparty names
  • Add integration test in tests/test_mapping_diff_report_cli.py that runs mapping_diff_report CLI with fixtures and captures stdout
  • Add assertion in integration test verifying UNMAPPED section header appears in captured output
  • Add assertion in integration test verifying FALLBACK_MAPPED section header appears in captured output
  • Add assertion in integration test verifying SUGGESTIONS section header appears in captured output

Testing Gaps - Pipeline Integration

  • Create fixture file tests/fixtures/name_registry_before.yml representing initial registry state for pipeline testing
  • Create fixture file tests/fixtures/name_registry_after.yml representing updated registry state with additional mappings
  • Add integration test in tests/test_normalization_registry_first.py that loads pipeline with name_registry_before.yml
  • Add assertion verifying pipeline reconciliation uses registry-first resolution by checking NameResolution.source values
  • Add assertion verifying pipeline produces different NameResolution.source values when run with name_registry_after.yml

Acceptance criteria

Registry-First Resolution

  • normalize_clearing_house() consults the name registry before _CLEARING_HOUSE_FALLBACK_MAPPINGS and the resolution path records source as registry or fallback
  • resolve_clearing_house() exists in src/counter_risk/normalize.py and returns a NameResolution object with source field set to registry or fallback
  • Calling normalize_counterparty_with_source().source succeeds without AttributeError

Config File

  • config/name_registry.yml exists in repository root
  • config/name_registry.yml parses without YAML syntax errors
  • load_name_registry('config/name_registry.yml') executes without raising exceptions

Testing - CLI

  • Running mapping_diff_report --registry config/name_registry.yml completes with exit code 0
  • Unit test in tests/test_mapping_diff_report_cli.py mocks generate_mapping_diff_report and asserts it receives expected parameters
  • All tests in tests/test_mapping_diff_report_cli.py pass

Testing - Report Sections

  • Integration test runs mapping_diff_report CLI with fixtures and captures output to string
  • Test assertions verify string contains UNMAPPED substring
  • Test assertions verify string contains FALLBACK_MAPPED substring
  • Test assertions verify string contains SUGGESTIONS substring

Testing - Pipeline Integration

  • Integration test in tests/test_normalization_registry_first.py runs pipeline reconciliation with two different registry files
  • Test assertions verify at least one NameResolution.source value differs between the two pipeline runs
  • All tests in tests/test_normalization_registry_first.py pass

Scope Constraint

  • PR diff contains only files matching these patterns: src/counter_risk/normalize.py, src/counter_risk/cli/*, src/counter_risk/reports/*, config/name_registry.yml, tests/test_*registry*.py, tests/test_*mapping_diff*.py, tests/fixtures/*.yml, tests/fixtures/*.csv, pyproject.toml, README.md, docs/*.md

Overall

  • All new tests pass in CI
  • All existing tests continue to pass in CI

Head SHA: 08a6c4e
Latest Runs: ❔ in progress — Gate
Required: gate: ❔ in progress

Workflow / Job Result Logs
.github/workflows/autofix.yml ❌ failure View run
Agents PR Event Hub ⏭️ skipped View run
Agents PR Meta ❔ in progress View run
Dependabot Auto-merge ⏭️ skipped View run
Gate ❔ in progress View run
Health 45 Agents Guard ✅ success View run

Port the retry-with-backoff logic from Workflows repo to Counter_Risk's
local copy of setup-api-client. The guard check on PR #249 failed with
a transient npm registry 403 on safe-buffer because the old code only
had a single --legacy-peer-deps fallback with no backoff.

Changes:
- 3 retry attempts with exponential backoff (5s, 10s)
- --legacy-peer-deps fallback on first failure
- Log stderr from all failed attempts for diagnosability
- Pin lru-cache@10.4.3 (was ^10.0.0) for consistency with Workflows

https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Copilot AI review requested due to automatic review settings February 25, 2026 20:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports the Workflows repo’s retry-with-backoff npm install logic into Counter_Risk’s local .github/actions/setup-api-client composite action to reduce transient npm registry failures during dependency setup.

Changes:

  • Add up to 3 install attempts with exponential backoff and a first-failure --legacy-peer-deps fallback.
  • Capture and emit stderr from failed install attempts for easier diagnosis.
  • Pin lru-cache to 10.4.3 for consistent hoisting.

@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 25, 2026

🤖 Keepalive Loop Status

PR #253 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 42/44 complete
Timeout 45 min (default)
Timeout usage 8m elapsed (18%, 37m remaining)
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

@stranske-keepalive
Copy link
Copy Markdown
Contributor

stranske-keepalive bot commented Feb 25, 2026

Keepalive Work Log (click to expand)
# Time (UTC) Agent Action Result Files Tasks Progress Commit Gate
0 2026-02-25 21:01:42 Codex wait (missing-agent-label-transient) skipped 0 42/44 success
0 2026-02-25 21:13:39 Codex wait (missing-agent-label-transient) skipped 0 42/44 success

GitHub Actions ::warning:: commands truncate/mangle multi-line content.
Emit a short annotation message and print full npm stderr in a
collapsible ::group:: instead, so logs stay readable.

https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
@stranske stranske merged commit 9c1a784 into main Feb 25, 2026
305 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants