Skip to content

Codex bootstrap for #239#249

Merged
stranske merged 17 commits intomainfrom
codex/issue-239
Feb 25, 2026
Merged

Codex bootstrap for #239#249
stranske merged 17 commits intomainfrom
codex/issue-239

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Feb 25, 2026

Automated Status Summary

Scope

PR #228 (issue #227) was merged with all 42 task checkboxes marked complete and the in-process verifier returning PASS. However, post-merge verification by both OpenAI (gpt-5.2, 83% confidence) and Anthropic (claude-sonnet-4-5, 95% confidence) returned FAIL, identifying concrete gaps in the implementation. This follow-up addresses the remaining unmet acceptance criteria.

Root Cause Analysis

The keepalive loop logs reveal a cascading failure across multiple systems:

  1. Codex agent checked PR body checkboxes aggressively. The .agents/issue-227-ledger.yml file shows only task-01 (of 42) was marked done in the structured ledger. Yet the PR body went from 42 unchecked → 0 unchecked over 14 iterations. The agent edited the PR body checkboxes directly without updating the ledger, bypassing the structured tracking that was designed to prevent exactly this.

  2. autoReconcileTasks amplified the problem. Each keepalive iteration ran an LLM-based auto-reconciliation step that auto-checked ~1 additional task per run based on loose commit-to-task matching. Over 14 iterations this added up. The reconciler used "high-confidence" matching that was insufficiently discriminating — e.g., a commit touching mapping_diff.py matched multiple task descriptions simultaneously.

  3. cascadeParentCheckboxes may have inflated counts. The keepalive loop's cascade logic automatically checks all indented child checkboxes when a parent is checked. If the agent or reconciler checked a section-level parent, all sub-tasks under it would cascade to checked.

  4. Codex-as-verifier was self-grading. Iteration 14 ran a verify-acceptance prompt using the same Codex agent that did the implementation work. Despite the prompt explicitly saying "Do NOT trust checkbox states as evidence of completion," the verifier returned success. This is a known LLM bias: the agent that produced the work is predisposed to confirm its own output.

  5. The keepalive loop trusted checkbox state as the primary completion signal. When all 42 boxes were checked AND the verifier returned success AND the CI gate was green, the loop issued stop (tasks-complete). There was no independent mechanical verification of the acceptance criteria.

  6. Scope violation was not mechanically enforced. The acceptance criterion "PR diff contains only files matching these patterns" was a text checkbox, not an automated check. The agent could (and did) check it off despite the diff containing 15+ out-of-scope files.

Context for Agent

Related Issues/PRs

Tasks

Registry-First Resolution for Clearing House

  • Create resolve_clearing_house() function in src/counter_risk/normalize.py that returns NameResolution with source field using registry-first lookup before _CLEARING_HOUSE_FALLBACK_MAPPINGS
  • Update normalize_clearing_house() in src/counter_risk/normalize.py to delegate internally to resolve_clearing_house()
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='registry' when name exists in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='fallback' when name is not in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() consults registry before checking fallback mappings
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() handles missing or empty registry files without raising exceptions

Public API Source Attribution

  • Create normalize_counterparty_with_source() function in src/counter_risk/normalize.py that wraps resolve_counterparty() and returns NameResolution
  • Update docstring for normalize_counterparty_with_source() documenting the source field in returned NameResolution object
  • Update pipeline/run.py reconciliation logic to call normalize_counterparty_with_source() instead of resolve_counterparty() directly
  • Add unit test in tests/test_normalization_registry_first.py verifying normalize_counterparty_with_source() returns object with accessible .source attribute

Missing Config File

  • Create config/name_registry.yml with minimal valid registry containing at least the entries used by existing test fixtures
  • Add smoke test in tests/test_mapping_diff_report_cli.py that imports mapping_diff_report CLI and runs it against config/name_registry.yml without raising exceptions

Testing Gaps - CLI Parameter Passing

  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct registry_path parameter
  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct output_format parameter

Testing Gaps - End-to-End Report Sections

  • Create test fixture file tests/fixtures/unmapped_names.csv with known unmapped counterparty names
  • Create test fixture file tests/fixtures/fallback_mapped_names.csv with known fallback-mapped counterparty names
  • Add integration test in tests/test_mapping_diff_report_cli.py that runs mapping_diff_report CLI with fixtures and captures stdout
  • Add assertion in integration test verifying UNMAPPED section header appears in captured output
  • Add assertion in integration test verifying FALLBACK_MAPPED section header appears in captured output
  • Add assertion in integration test verifying SUGGESTIONS section header appears in captured output

Testing Gaps - Pipeline Integration

  • Create fixture file tests/fixtures/name_registry_before.yml representing initial registry state for pipeline testing
  • Create fixture file tests/fixtures/name_registry_after.yml representing updated registry state with additional mappings
  • Add integration test in tests/test_normalization_registry_first.py that loads pipeline with name_registry_before.yml
  • Add assertion verifying pipeline reconciliation uses registry-first resolution by checking NameResolution.source values
  • Add assertion verifying pipeline produces different NameResolution.source values when run with name_registry_after.yml

Acceptance criteria

Registry-First Resolution

  • normalize_clearing_house() consults the name registry before _CLEARING_HOUSE_FALLBACK_MAPPINGS and the resolution path records source as registry or fallback
  • resolve_clearing_house() exists in src/counter_risk/normalize.py and returns a NameResolution object with source field set to registry or fallback
  • Calling normalize_counterparty_with_source().source succeeds without AttributeError

Config File

  • config/name_registry.yml exists in repository root
  • config/name_registry.yml parses without YAML syntax errors
  • load_name_registry('config/name_registry.yml') executes without raising exceptions

Testing - CLI

  • Running mapping_diff_report --registry config/name_registry.yml completes with exit code 0
  • Unit test in tests/test_mapping_diff_report_cli.py mocks generate_mapping_diff_report and asserts it receives expected parameters
  • All tests in tests/test_mapping_diff_report_cli.py pass

Testing - Report Sections

  • Integration test runs mapping_diff_report CLI with fixtures and captures output to string
  • Test assertions verify string contains UNMAPPED substring
  • Test assertions verify string contains FALLBACK_MAPPED substring
  • Test assertions verify string contains SUGGESTIONS substring

Testing - Pipeline Integration

  • Integration test in tests/test_normalization_registry_first.py runs pipeline reconciliation with two different registry files
  • Test assertions verify at least one NameResolution.source value differs between the two pipeline runs
  • All tests in tests/test_normalization_registry_first.py pass

Scope Constraint

  • PR diff contains only files matching these patterns: src/counter_risk/normalize.py, src/counter_risk/cli/*, src/counter_risk/reports/*, config/name_registry.yml, tests/test_*registry*.py, tests/test_*mapping_diff*.py, tests/fixtures/*.yml, tests/fixtures/*.csv, pyproject.toml, README.md, docs/*.md

Overall

  • All new tests pass in CI
  • All existing tests continue to pass in CI
Full Issue Text

Follow-up: Address Unmet Acceptance Criteria from PR #228 / Issue #227

Why

PR #228 (issue #227) was merged with all 42 task checkboxes marked complete and the in-process verifier returning PASS. However, post-merge verification by both OpenAI (gpt-5.2, 83% confidence) and Anthropic (claude-sonnet-4-5, 95% confidence) returned FAIL, identifying concrete gaps in the implementation. This follow-up addresses the remaining unmet acceptance criteria.

Root Cause Analysis

The keepalive loop logs reveal a cascading failure across multiple systems:

  1. Codex agent checked PR body checkboxes aggressively. The .agents/issue-227-ledger.yml file shows only task-01 (of 42) was marked done in the structured ledger. Yet the PR body went from 42 unchecked → 0 unchecked over 14 iterations. The agent edited the PR body checkboxes directly without updating the ledger, bypassing the structured tracking that was designed to prevent exactly this.

  2. autoReconcileTasks amplified the problem. Each keepalive iteration ran an LLM-based auto-reconciliation step that auto-checked ~1 additional task per run based on loose commit-to-task matching. Over 14 iterations this added up. The reconciler used "high-confidence" matching that was insufficiently discriminating — e.g., a commit touching mapping_diff.py matched multiple task descriptions simultaneously.

  3. cascadeParentCheckboxes may have inflated counts. The keepalive loop's cascade logic automatically checks all indented child checkboxes when a parent is checked. If the agent or reconciler checked a section-level parent, all sub-tasks under it would cascade to checked.

  4. Codex-as-verifier was self-grading. Iteration 14 ran a verify-acceptance prompt using the same Codex agent that did the implementation work. Despite the prompt explicitly saying "Do NOT trust checkbox states as evidence of completion," the verifier returned success. This is a known LLM bias: the agent that produced the work is predisposed to confirm its own output.

  5. The keepalive loop trusted checkbox state as the primary completion signal. When all 42 boxes were checked AND the verifier returned success AND the CI gate was green, the loop issued stop (tasks-complete). There was no independent mechanical verification of the acceptance criteria.

  6. Scope violation was not mechanically enforced. The acceptance criterion "PR diff contains only files matching these patterns" was a text checkbox, not an automated check. The agent could (and did) check it off despite the diff containing 15+ out-of-scope files.

Checkbox progression (from keepalive logs):

Time Unchecked Checked Source
Feb 23, 18:22 42 0 Initial
Feb 24, 04:17 21 21 Codex (20) + reconciler (1)
Feb 24, 05:29 20 22 Codex + reconciler (1)
Feb 24, 07:03 12 30 Codex (7) + reconciler (1)
Feb 24, 14:16 12 30 No change
Feb 24, 16:06 1 41 Codex (10) + reconciler (1)
Feb 24, 16:25 0 42 Final reconciler (1) → STOP

Progress review at round 5 (07:53) gave alignment score 0.0/10 and recommended STOP — but the agent continued via agent:retry label and eventually checked off all remaining boxes.

Scope

Address the concrete gaps identified by both independent verifiers. This is scoped to only the items that are mechanically verifiable as unmet — not style concerns or theoretical risks.

Non-Goals

  • Reversing the scope violation from PR Agent belt for #227 #228 (already merged; out-of-scope files are in main)
  • Redesigning the keepalive loop or reconciliation system (separate issue)
  • Modifying test infrastructure or CI pipelines

Tasks

Registry-First Resolution for Clearing House

  • Create resolve_clearing_house() function in src/counter_risk/normalize.py that returns NameResolution with source field using registry-first lookup before _CLEARING_HOUSE_FALLBACK_MAPPINGS
  • Update normalize_clearing_house() in src/counter_risk/normalize.py to delegate internally to resolve_clearing_house()
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='registry' when name exists in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() returns source='fallback' when name is not in registry
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() consults registry before checking fallback mappings
  • Add test in tests/test_normalization_registry_first.py verifying resolve_clearing_house() handles missing or empty registry files without raising exceptions

Public API Source Attribution

  • Create normalize_counterparty_with_source() function in src/counter_risk/normalize.py that wraps resolve_counterparty() and returns NameResolution
  • Update docstring for normalize_counterparty_with_source() documenting the source field in returned NameResolution object
  • Update pipeline/run.py reconciliation logic to call normalize_counterparty_with_source() instead of resolve_counterparty() directly
  • Add unit test in tests/test_normalization_registry_first.py verifying normalize_counterparty_with_source() returns object with accessible .source attribute

Missing Config File

  • Create config/name_registry.yml with minimal valid registry containing at least the entries used by existing test fixtures
  • Add smoke test in tests/test_mapping_diff_report_cli.py that imports mapping_diff_report CLI and runs it against config/name_registry.yml without raising exceptions

Testing Gaps - CLI Parameter Passing

  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct registry_path parameter
  • Add unit test in tests/test_mapping_diff_report_cli.py that mocks generate_mapping_diff_report and verifies CLI forwards correct output_format parameter

Testing Gaps - End-to-End Report Sections

  • Create test fixture file tests/fixtures/unmapped_names.csv with known unmapped counterparty names
  • Create test fixture file tests/fixtures/fallback_mapped_names.csv with known fallback-mapped counterparty names
  • Add integration test in tests/test_mapping_diff_report_cli.py that runs mapping_diff_report CLI with fixtures and captures stdout
  • Add assertion in integration test verifying UNMAPPED section header appears in captured output
  • Add assertion in integration test verifying FALLBACK_MAPPED section header appears in captured output
  • Add assertion in integration test verifying SUGGESTIONS section header appears in captured output

Testing Gaps - Pipeline Integration

  • Create fixture file tests/fixtures/name_registry_before.yml representing initial registry state for pipeline testing
  • Create fixture file tests/fixtures/name_registry_after.yml representing updated registry state with additional mappings
  • Add integration test in tests/test_normalization_registry_first.py that loads pipeline with name_registry_before.yml
  • Add assertion verifying pipeline reconciliation uses registry-first resolution by checking NameResolution.source values
  • Add assertion verifying pipeline produces different NameResolution.source values when run with name_registry_after.yml

Acceptance Criteria

Registry-First Resolution

  • normalize_clearing_house() consults the name registry before _CLEARING_HOUSE_FALLBACK_MAPPINGS and the resolution path records source as registry or fallback
  • resolve_clearing_house() exists in src/counter_risk/normalize.py and returns a NameResolution object with source field set to registry or fallback
  • Calling normalize_counterparty_with_source().source succeeds without AttributeError

Config File

  • config/name_registry.yml exists in repository root
  • config/name_registry.yml parses without YAML syntax errors
  • load_name_registry('config/name_registry.yml') executes without raising exceptions

Testing - CLI

  • Running mapping_diff_report --registry config/name_registry.yml completes with exit code 0
  • Unit test in tests/test_mapping_diff_report_cli.py mocks generate_mapping_diff_report and asserts it receives expected parameters
  • All tests in tests/test_mapping_diff_report_cli.py pass

Testing - Report Sections

  • Integration test runs mapping_diff_report CLI with fixtures and captures output to string
  • Test assertions verify string contains UNMAPPED substring
  • Test assertions verify string contains FALLBACK_MAPPED substring
  • Test assertions verify string contains SUGGESTIONS substring

Testing - Pipeline Integration

  • Integration test in tests/test_normalization_registry_first.py runs pipeline reconciliation with two different registry files
  • Test assertions verify at least one NameResolution.source value differs between the two pipeline runs
  • All tests in tests/test_normalization_registry_first.py pass

Scope Constraint

  • PR diff contains only files matching these patterns: src/counter_risk/normalize.py, src/counter_risk/cli/*, src/counter_risk/reports/*, config/name_registry.yml, tests/test_*registry*.py, tests/test_*mapping_diff*.py, tests/fixtures/*.yml, tests/fixtures/*.csv, pyproject.toml, README.md, docs/*.md

Overall

  • All new tests pass in CI
  • All existing tests continue to pass in CI

Implementation Notes

Deferred Task (Requires Human Decision)

The following task requires a design decision between multiple implementation approaches and cannot be completed by an automated agent:

Original task: "Update normalize_counterparty() to return NameResolution (or implement resolve_and_normalize_counterparty() / normalize_counterparty_with_source() returning NameResolution) so callers can access source without calling resolve_counterparty() directly."

Issue: This task presents three alternative implementation approaches without specifying which to choose. Deciding between a breaking API change vs. adding a new function vs. choosing a specific function name requires subjective design judgment.

Resolution: The Tasks section above implements the normalize_counterparty_with_source() approach to maintain backward compatibility. If a different approach is preferred, a human should update the relevant tasks before agent execution.

Clearing House Resolution Pattern

Follow the same pattern as resolve_counterparty():

def resolve_clearing_house(name: str, registry_path: str | None = None) -> NameResolution:
    normalized = _normalize_whitespace(name)
    if registry_path:
        alias_lookup = _load_alias_lookup(str(registry_path))
        if normalized.lower() in alias_lookup:
            return NameResolution(
                raw_name=name,
                canonical_name=alias_lookup[normalized.lower()],
                source="registry"
            )
    fallback = _CLEARING_HOUSE_FALLBACK_MAPPINGS.get(normalized, normalized)
    return NameResolution(
        raw_name=name,
        canonical_name=fallback,
        source="fallback"
    )

Public API Wrapper

Add a new function to maintain backward compatibility:

def normalize_counterparty_with_source(
    name: str,
    registry_path: str | None = None
) -> NameResolution:
    """
    Normalize counterparty name and return resolution with source attribution.

    Returns:
        NameResolution with fields:
            - raw_name: original input
            - canonical_name: normalized output
            - source: 'registry' or 'fallback'
    """
    return resolve_counterparty(name, registry_path)

Minimal Config File Structure

The config/name_registry.yml should contain at minimum:

counterparties:
  - canonical_name: "Example Corp"
    aliases:
      - "Example Corporation"
      - "EXAMPLE CORP"

clearing_houses:
  - canonical_name: "DTCC"
    aliases:
      - "Depository Trust & Clearing Corporation"

Source: PR #228, Issue #227, Post-merge verification report
Related: #227, #208, #48


PR created automatically to engage Codex.

Source: Issue #239

Copilot AI review requested due to automatic review settings February 25, 2026 14:31
@stranske stranske added agent:codex Assigns Codex agent to issue agents:keepalive Enables keepalive automation for PR labels Feb 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the Codex bootstrap marker file for issue #239 so the agent automation/keepalive workflow can recognize and track the issue-specific run.

Changes:

  • Add agents/codex-239.md bootstrap file for issue #239.

@agents-workflows-bot
Copy link
Copy Markdown
Contributor

agents-workflows-bot bot commented Feb 25, 2026

🤖 Keepalive Loop Status

PR #249 | Agent: Codex | Iteration 5+9 🚀 extended

Current State

Metric Value
Iteration progress [##########] 5/5 5 base + 9 extended = 14 total
Action wait (gate-pending)
Disposition skipped (transient)
Gate unknown
Tasks 42/44 complete
Timeout 45 min (default)
Timeout usage 0m elapsed (1%, 45m remaining)
Keepalive ✅ enabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | unknown |
| Suggested recovery | Capture logs and context; retry once and escalate if the issue persists. |

@agents-workflows-bot
Copy link
Copy Markdown
Contributor

agents-workflows-bot bot commented Feb 25, 2026

Keepalive Work Log (click to expand)
# Time (UTC) Agent Action Result Files Tasks Progress Commit Gate
0 2026-02-25 14:33:10 Codex wait (gate-pending-transient) skipped 0 0/44
0 2026-02-25 14:33:52 Codex wait (gate-cancelled-transient-transient) skipped 0 0/44 cancelled
1 2026-02-25 14:44:39 Codex run (bypass-rate-limit-gate) success 6 file(s) +3 3/44 1d15322 cancelled
2 2026-02-25 14:52:57 Codex run (ready) success 5 file(s) +7 10/44 3b46e81 success
3 2026-02-25 15:04:08 Codex run (ready) success 5 file(s) +1 11/44 09259b2 success
4 2026-02-25 15:10:57 Codex run (ready) success 4 file(s) +1 12/44 6fa9f0a success
5 2026-02-25 15:21:35 Codex run (bypass-rate-limit-gate) success 5 file(s) +7 19/44 3fcf275 cancelled
6 2026-02-25 15:35:17 Codex run (ready-extended) success 5 file(s) +2 21/44 4618420 success
6 2026-02-25 15:36:44 Codex run (agent-run-failed) failure 0 21/44 success
6 2026-02-25 15:42:22 Codex stop (max-iterations-unproductive) skipped 0 21/44 success
6 2026-02-25 15:43:14 Codex wait (gate-pending-transient) skipped 0 21/44
6 2026-02-25 15:44:03 Codex wait (gate-pending-transient) skipped 0 21/44
7 2026-02-25 15:54:24 Codex run (bypass-rate-limit-gate) success 7 file(s) +10 31/44 de0ef73 cancelled
8 2026-02-25 16:14:11 Codex run (ready-extended) success 5 file(s) +6 37/44 40cf31b success
9 2026-02-25 16:17:20 Codex run (ready-extended) success 5 file(s) +1 38/44 d118bf2 success
10 2026-02-25 16:38:11 Codex run (bypass-rate-limit-gate) success 5 file(s) +2 40/44 2a5e6f3 cancelled
11 2026-02-25 16:49:36 Codex run (ready-extended) success 7 file(s) +1 41/44 8919f1f success
12 2026-02-25 16:56:51 Codex run (ready-extended) success 7 file(s) 0 41/44 b5af361 success
13 2026-02-25 17:09:05 Codex run (bypass-rate-limit-gate) success 5 file(s) +1 42/44 04dbf66 cancelled
14 2026-02-25 17:31:09 Codex run (ready-extended) success 5 file(s) 0 42/44 7113dde success
14 2026-02-25 20:37:24 Codex run (agent-run-failed) retry failure 0 42/44
14 2026-02-25 20:38:05 Codex stop (max-iterations-unproductive) skipped 0 42/44 success
14 2026-02-25 20:38:45 Codex wait (gate-pending-transient) skipped 0 42/44
14 2026-02-25 21:32:08 Codex run (agent-run-cancelled) cancelled 0 42/44 success
14 2026-02-25 21:44:08 Codex stop (zero-activity-infrastructure) skipped 0 42/44 success
14 2026-02-25 21:44:11 Codex stop (zero-activity-infrastructure) skipped 0 42/44 success
14 2026-02-25 21:55:37 Codex wait (gate-pending-transient) skipped 0 42/44

@stranske stranske added agent:retry Retry agent keepalive loop and removed agent:needs-attention Agent needs human help needs-human labels Feb 25, 2026
@stranske-keepalive stranske-keepalive bot removed the agent:retry Retry agent keepalive loop label Feb 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Codex keepalive run failed

Field Value
Exit Code 1
Error Category unknown
Error Type codex
Run View logs

🔧 Suggested Recovery

Capture logs and context; retry once and escalate if the issue persists.

📝 What to do

  1. Check the workflow logs for detailed error output
  2. If this is a configuration issue, update the relevant settings
  3. If the error persists, consider adding the needs-human label for manual review
  4. Re-run the workflow once the issue is resolved
Output summary
No output captured

@github-actions github-actions bot added the agent:needs-attention Agent needs human help label Feb 25, 2026
stranske pushed a commit that referenced this pull request Feb 25, 2026
Port the retry-with-backoff logic from Workflows repo to Counter_Risk's
local copy of setup-api-client. The guard check on PR #249 failed with
a transient npm registry 403 on safe-buffer because the old code only
had a single --legacy-peer-deps fallback with no backoff.

Changes:
- 3 retry attempts with exponential backoff (5s, 10s)
- --legacy-peer-deps fallback on first failure
- Log stderr from all failed attempts for diagnosability
- Pin lru-cache@10.4.3 (was ^10.0.0) for consistency with Workflows

https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
@stranske stranske merged commit cab3fb5 into main Feb 25, 2026
24 checks passed
@stranske stranske added the verify:compare Runs verifier comparison mode after merge label Feb 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Provider Comparison Report

Provider Summary

Provider Model Verdict Confidence Summary
openai gpt-5.2 CONCERNS 74% The PR largely implements registry-first name resolution with source attribution, adds CLI output-format plumbing and report validation, and introduces substantial tests and fixtures that cover req...
anthropic claude-sonnet-4-5-20250929 PASS 92% The implementation successfully addresses all acceptance criteria with high-quality code and comprehensive testing. Registry-First Resolution: resolve_clearing_house() correctly implements re...
📋 Full Provider Details (click to expand)

openai

  • Model: gpt-5.2
  • Verdict: CONCERNS
  • Confidence: 74%
  • Scores:
    • Correctness: 7.0/10
    • Completeness: 8.0/10
    • Quality: 8.0/10
    • Testing: 7.0/10
    • Risks: 6.0/10
  • Summary: The PR largely implements registry-first name resolution with source attribution, adds CLI output-format plumbing and report validation, and introduces substantial tests and fixtures that cover required report sections and registry-vs-fallback behavior. However, one documented acceptance item (updating pipeline reconciliation to use normalize_counterparty_with_source rather than resolve_counterparty) is not reflected in the code changes shown, and the clearing-house resolver’s robustness to missing registry files is not directly demonstrated and may introduce a new runtime dependency via normalize_clearing_house().
  • Concerns:
    • Acceptance criterion: pipeline reconciliation should call normalize_counterparty_with_source() instead of resolve_counterparty() directly. In this diff, there is no change to pipeline/run.py; tests in test_normalization_registry_first.py instead monkeypatch pipeline_run.resolve_counterparty, implying the pipeline still uses resolve_counterparty. This appears to miss the stated requirement for pipeline integration/public API usage.
    • resolve_clearing_house() uses _load_alias_lookup(str(Path(registry_path).resolve())). If _load_alias_lookup does not already handle missing/empty/unreadable registry robustly, resolve_clearing_house may raise (contrary to the acceptance criterion about handling missing/empty registry without exceptions). The provided tests only cover empty-registry with a valid file, not a missing file path.
    • normalize_clearing_house() now delegates to resolve_clearing_house() but does not accept a registry_path parameter; it always uses the default config/name_registry.yml. If callers previously relied on pure fallback behavior independent of filesystem state, this introduces a new runtime dependency on the registry file existing/being readable (unless _load_alias_lookup is defensive).

anthropic

  • Model: claude-sonnet-4-5-20250929
  • Verdict: PASS
  • Confidence: 92%
  • Scores:
    • Correctness: 9.0/10
    • Completeness: 9.0/10
    • Quality: 8.0/10
    • Testing: 9.0/10
    • Risks: 8.0/10
  • Summary: The implementation successfully addresses all acceptance criteria with high-quality code and comprehensive testing. Registry-First Resolution: resolve_clearing_house() correctly implements registry-first lookup with proper source attribution ('registry' vs 'fallback'), and normalize_clearing_house() delegates to it. normalize_counterparty_with_source() wraps resolve_counterparty() and exposes the .source attribute as required. Config File: config/name_registry.yml exists and is valid YAML with proper entries. CLI Testing: Unit tests mock generate_mapping_diff_report and verify parameter forwarding for both registry_path and output_format. Integration tests verify the CLI runs successfully with the repo registry. Report Sections: Integration tests verify UNMAPPED, FALLBACK_MAPPED, and SUGGESTIONS sections appear in output using fixture-based inputs. Pipeline Integration: The test uses monkeypatching to capture source attribution during reconciliation and verifies different sources between 'before' and 'after' registry states. Scope Constraint: All 9 changed files match the allowed patterns. The code demonstrates good error handling (missing registry treated as fatal), proper encapsulation, and deterministic behavior. Minor concerns include semantic precision of 'fallback' labeling for identity mappings and validation placement for output_format, but these don't impact functional correctness. All acceptance criteria checkboxes are legitimately satisfied by the implementation.
  • Concerns:
    • The resolve_clearing_house() function labels identity/no-op normalization as 'fallback' source, which may be semantically imprecise for reporting purposes
    • The normalize_counterparty_with_source() function is a thin wrapper that adds a registry_path parameter but doesn't validate or document the default path behavior
    • Test fixture name_registry_after.yml adds JP Morgan entries but the integration test only verifies source attribution changes, not the specific mapping behavior
    • The output_format parameter in CLI accepts only 'text' but the validation happens in the report generator rather than at the CLI argument parsing level

Agreement

  • Completeness: scores within 1 point (avg 8.5/10, range 8.0-9.0)
  • Quality: scores within 1 point (avg 8.0/10, range 8.0-8.0)

Disagreement

Dimension openai anthropic
Verdict CONCERNS PASS
Correctness 7.0/10 9.0/10
Testing 7.0/10 9.0/10
Risks 6.0/10 8.0/10

Unique Insights

  • openai: Acceptance criterion: pipeline reconciliation should call normalize_counterparty_with_source() instead of resolve_counterparty() directly. In this diff, there is no change to pipeline/run.py; tests in test_normalization_registry_first.py instead monkeypatch pipeline_run.resolve_counterparty, implying the pipeline still uses resolve_counterparty. This appears to miss the stated requirement for pipeline integration/public API usage.; resolve_clearing_house() uses _load_alias_lookup(str(Path(registry_path).resolve())). If _load_alias_lookup does not already handle missing/empty/unreadable registry robustly, resolve_clearing_house may raise (contrary to the acceptance criterion about handling missing/empty registry without exceptions). The provided tests only cover empty-registry with a valid file, not a missing file path.; normalize_clearing_house() now delegates to resolve_clearing_house() but does not accept a registry_path parameter; it always uses the default config/name_registry.yml. If callers previously relied on pure fallback behavior independent of filesystem state, this introduces a new runtime dependency on the registry file existing/being readable (unless _load_alias_lookup is defensive).
  • anthropic: The resolve_clearing_house() function labels identity/no-op normalization as 'fallback' source, which may be semantically imprecise for reporting purposes; The normalize_counterparty_with_source() function is a thin wrapper that adds a registry_path parameter but doesn't validate or document the default path behavior; Test fixture name_registry_after.yml adds JP Morgan entries but the integration test only verifies source attribution changes, not the specific mapping behavior; The output_format parameter in CLI accepts only 'text' but the validation happens in the report generator rather than at the CLI argument parsing level

🔍 LangSmith Traces

stranske pushed a commit that referenced this pull request Feb 25, 2026
Three issues identified by independent verification of issue #239 work:

1. pipeline/run.py now calls normalize_counterparty_with_source() instead
   of resolve_counterparty() directly, fulfilling the public API source
   attribution task. The test monkeypatch is updated to match.

2. Added two missing tests for resolve_clearing_house() handling missing
   and empty registry files without raising exceptions.

3. Fixed test_mapping_diff_report_unreadable_registry_exits_nonzero which
   failed when running as root (chmod(0) is a no-op for root; the empty
   entries list triggers a validation error instead of a permission error).

All 21 tests in the target files pass. Full suite: 787 passed, 5 skipped
(6 pre-existing sitecustomize failures unrelated to these changes).

https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent:codex Assigns Codex agent to issue agent:needs-attention Agent needs human help agents:keepalive Enables keepalive automation for PR needs-human verify:compare Runs verifier comparison mode after merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants