fix(ledger): #358 — get_context_for_ready_decisions preserves decision.status (closes silent ValidationError swallow)#378
Merged
Conversation
…n.status The bug discovered by #357 sub-task 1's sociable-test de-mock: get_context_for_ready_decisions (ledger/queries.py:1804) hardcoded status="context_pending" in the returned dicts, but BriefDecision.status is Literal['reflected','drifted','pending','ungrounded']. Constructing BriefDecision(status="context_pending") raised ValidationError; the handler's outer try/except at handlers/preflight.py:783 silently swallowed it. Net effect: context_pending_ready on the preflight response always returned empty in production, suppressing the entire "decisions ready for ratification" surface added in v0.8.0 (commit 0988e4b). Pre-#357 every test that exercised this code path was a solitary mock that returned a valid status, short-circuiting the validation — the bug was only surfaced by the eval-harness de-mock in PR #359 Phase B. The fix per the issue body's Option A: change line 1813 from a hardcoded literal to `str(r.get("status", "ungrounded"))`. The SELECT at line 1796 already pulls `status` from the row; this just uses it. Pattern matches the sibling get_collision_pending_decisions at line 1781. At schema v10+ (current is v17) decision.status ASSERT is narrowed to exactly the BriefDecision.status Literal set (ledger/schema.py:728), so any row that satisfies the WHERE clause returns a status compatible with BriefDecision. The signoff field still carries the 'context_pending' signoff state separately — signoff_state on BriefDecision is the right surface for that, not status. Coverage: - tests/test_preflight_hitl.py (NEW) — sociable defense-in-depth pin at the ledger query layer. Three tests: 1. test_get_context_for_ready_decisions_returns_brief_decision_compatible_status — directly checks the regression class 2. test_get_context_for_ready_decisions_preserves_underlying_decision_status — asserts the returned status mirrors decision.status, not overrides it 3. test_handle_preflight_surfaces_context_pending_ready_end_to_end — confirms the user-visible surface actually populates now - tests/eval/preflight_dataset.jsonl — FF4 xfail removed. The strict-xfail mechanism on this row would have flipped FF4 to failure once the underlying bug got fixed; removing the xfail simultaneously is mandatory per the pre-#357 commit message. Verified: - pytest tests/test_preflight_hitl.py: 3/3 passing (0.71s) - pytest tests/eval/run_preflight_eval.py: 10/10 passing (FF4 PASS, no xfail) (1.03s) - pytest tests/test_preflight_* tests/eval/run_preflight_eval.py: 46/46 passing, 0 regressions - ruff check + format clean; mypy clean Closes #358. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #358. One-line fix in `ledger/queries.py` plus two layers of regression coverage (a new sociable test file + removal of the FF4 xfail in the eval dataset).
The bug discovered by #357 sub-task 1's de-mock: `get_context_for_ready_decisions` hardcoded `status="context_pending"` in returned dicts, but `BriefDecision.status` is `Literal['reflected','drifted','pending','ungrounded']`. Constructing `BriefDecision(status="context_pending")` raised `ValidationError`; `handlers/preflight.py:783`'s outer try/except silently swallowed it. Net production effect: `context_pending_ready` on the preflight response always returned empty, suppressing the entire "decisions ready for ratification" surface added in v0.8.0 (commit `0988e4b`).
Pre-#357 every test that exercised this code path was a solitary mock that returned a valid status, short-circuiting validation. The bug shipped silently for months.
The fix (Option A from the issue body)
`ledger/queries.py:1813`:
```python
BEFORE
"status": "context_pending",
AFTER
"status": str(r.get("status", "ungrounded")),
```
The `SELECT` at line 1796 already pulls `status` from the row; this just uses it. Pattern matches the sibling `get_collision_pending_decisions` at line 1781.
Safety: at schema v10+ (current is v17), `decision.status ASSERT` is narrowed to exactly the BriefDecision Literal set (`ledger/schema.py:728`). Any row satisfying the WHERE clause returns a status compatible with BriefDecision. The signoff field still carries the `context_pending` signal — `signoff_state` on BriefDecision is the right surface for that, not `status`.
Coverage
`tests/test_preflight_hitl.py` (NEW) — sociable defense-in-depth pin at the ledger query layer. Three tests:
`tests/eval/preflight_dataset.jsonl` — FF4 xfail removed. The xfail-strict mechanism on this row would have flipped FF4 to failure once the bug got fixed; removing the xfail simultaneously is mandatory per the pre-#357 commit message.
Verification
```
$ pytest tests/test_preflight_hitl.py
3 passed in 0.71s
$ pytest tests/eval/run_preflight_eval.py
10 passed in 1.03s # FF4 now PASS, no xfail
$ pytest tests/test_preflight_*.py tests/eval/run_preflight_eval.py
46 passed, 1 skipped, 0 regressions
$ ruff check . && ruff format --check .
All clean
$ mypy .
no issues
```
Test plan
What this doesn't do
🤖 Generated with Claude Code