fix(collator): briefing surfaces manually-fired intentions until actioned#393
Conversation
…oned `generate_briefing` only reported intentions it auto-transitioned from `pending` to `fired` on the current call. Intentions set to `fired` by another agent — the documented compaction-handoff convention — were invisible to `get_briefing`, so handoffs written at session end never reached the receiving agent. Prod owner currently has 20+ fired handoff intentions accumulated across five projects since 2026-04-14, none ever surfaced. Collator now queries all `state='fired'` intentions for the owner, merged with the auto-fired set, sorted by urgency desc then most-recent update, capped at 10 listed entries (full count in `evaluation.intentions_fired`). The receiving agent transitions off `fired` (→ `active`/`completed`/`cancelled`/`snoozed`) to stop re-surfacing. `fired_intentions` briefing entries now carry `urgency` and `updated` fields. `test_fired_intention_not_surfaced_on_second_briefing` encoded the broken design — renamed to `test_fired_intention_persists_until_actioned` with inverted second-briefing assertion plus a fired → active transition covering the stop condition. New `test_briefing_surfaces_manually_fired_intention` exercises the handoff path directly; new `test_briefing_caps_fired_intention_list` locks the 10-entry cap and high-urgency-first ordering.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
cmeans
left a comment
There was a problem hiding this comment.
[QA] Round 1 — QA Failed
Core fix is correct — manually-fired intentions surface, persist across briefings, and clear on transition. Verified end-to-end against a fresh awareness-qa instance built from 417369c. Two substantive findings, both in adjacent code/tests rather than the main fix.
Verification
Automated (local .[dev] install, Py3.11):
| Check | Result |
|---|---|
pytest |
1005 passed, 7 skipped (all 7 skips are Ollama-dependent, no Ollama locally — matches main) |
ruff check src tests |
All checks passed |
mypy src |
Success, 21 source files, no issues |
Manual MCP tests — all 6 PR-body steps, run against awareness-qa on :8421 (PR #393 image, fresh Postgres):
- Step 1 (baseline empty):
intentions_fired: 0,fired_intentionsabsent,attention_needed: false✅ - Step 2 (manual fire surfaces):
remind → update_intention(state="fired") → get_briefing:intentions_fired: 1, entry carriesurgency: "high",updatedISO timestamp of the transition,attention_needed: true, newurgency/updatedfields present as claimed ✅ - Step 3 (persistence + clear): Two consecutive briefings both surface the same fired entry;
update_intention(state="active")→ third briefing is empty (fired_intentionsabsent,intentions_fired: 0) ✅ - Step 4 (auto-fire path still works):
remind(deliver_at=now−1h) → get_briefingreportsintentions_fired: 1; DB confirmsstate="fired"post-briefing; second briefing still surfaces (persistence) ✅ - Step 5 (cap 10 of 15, high-urgency first): Seeded 15 (3 high, 12 normal) directly in Postgres →
intentions_fired: 15,len(fired_intentions): 10, slots 0–2 allurgency: "high", slots 3–9normal(strict ordering observed) ✅ - Step 6 (transitions exclude from fired): From the 15-fired state, transitioned one
high → active, onenormal → active, onenormal → completed→intentions_fired: 12, cap still 10, top-10 urgencies:[high×2, normal×8], no active/completed entries present ✅
Findings
1. compose_summary undercounts fired intentions when the full count exceeds the cap (substantive).
src/mcp_awareness/collator.py:222-224 still uses the length of the (capped) list for the summary line:
fired = briefing.get("fired_intentions", [])
if fired:
parts.append(f"{len(fired)} intention{'s' if len(fired) != 1 else ''} ready")Reproduced in-session: seeded 15 fired, briefing returned evaluation.intentions_fired: 15 but summary: "10 intentions ready." — the summary silently drops 5. This matters because the deployer note on this PR flags that production owners have 20+ fired handoffs accumulated; the summary will report "10 intentions ready" for every such owner on first post-deploy briefing, understating the backlog the receiving agent is supposed to triage.
Fix: pull the total from the evaluation block rather than the capped list, e.g.:
fired_displayed = briefing.get("fired_intentions", [])
if fired_displayed:
total = briefing.get("evaluation", {}).get("intentions_fired", len(fired_displayed))
parts.append(f"{total} intention{'s' if total != 1 else ''} ready")2. test_briefing_caps_fired_intention_list doesn't lock the ordering contract (substantive — test coverage gap).
The new test asserts 3 high-urgency entries appear somewhere in the top 10:
high_urgency_in_list = sum(1 for f in fired if f.get("urgency") == "high")
assert high_urgency_in_list == 3This would also pass if the sort key flipped direction (the 3 high-urgency items would still be in the top 10, just at slots 7–9 instead of 0–2). The PR body Step 5 claim is stronger — "the first 3 entries in fired_intentions all carry urgency == "high"" — and my manual test confirms the code does this correctly; but the test doesn't lock it. Regression in the sort key (e.g. accidental sign flip on -_URGENCY_RANK.get(...)) would not be caught by the current unit test, only by manual/integration review.
Fix: replace the count check with a strict slot check, e.g.:
assert [f["urgency"] for f in fired[:3]] == ["high", "high", "high"]Not blocking, noted
- Unknown
urgencyvalues silently default tonormalvia_URGENCY_RANK.get(..., 1). Deliberate and reasonable (keeps the briefing robust against dirty data). No action. - Stale fired intentions accumulate without auto-cleanup — if an agent never transitions a fired handoff to active/completed, it surfaces forever. Deployer note acknowledges the 20+ production backlog; triage is the receiving-agent's job per design. Out of scope for this PR.
Verdict: QA Failed. Two substantive findings, both small edits (one 3-line change in compose_summary, one stronger assertion in test_briefing_caps_fired_intention_list). Core fix is sound; happy to fast-track round 2.
|
Applying QA Failed — 2 substantive findings in the review above (summary-undercount + test-coverage gap). Core fix is correct and all 6 manual MCP steps pass. |
Addresses QA round 1 findings on PR #393. Finding #1 (substantive — undercount): compose_summary used len(fired_intentions) for the summary line. fired_intentions is capped at 10 by generate_briefing, so any backlog of 11+ fired intentions silently reported as "10 intentions ready" regardless of the actual count. Reproduced in QA session with 15 seeded: summary said "10" while evaluation.intentions_fired correctly said 15. Impact at deploy time: the PR #392 deployer note flags 20+ accumulated fired handoffs on the production owner; every post-deploy briefing would have under-reported by 10. Fix: pull the total from briefing["evaluation"]["intentions_fired"], fall back to list length only when evaluation block is absent. Finding #2 (substantive — test coverage gap): the cap test asserted `sum(f.urgency == "high") == 3` which would also pass if the sort key flipped sign (high-urgency entries still in the top 10, just at slots 7-9). Replaced with a strict slot check — `[f.urgency for f in fired[:3]] == ["high", "high", "high"]` plus an `all(... "normal")` for slots 3-9. A sign flip on the urgency sort key now fails the test immediately. Also adds test_briefing_summary_reports_total_fired_count_not_capped as an explicit regression guard for finding #1 — seeds 15 fired, asserts the summary contains "15 intentions ready" and not "10 intentions ready".
[Dev] Round 2 — addressing QA findingsBoth findings fixed in commit Finding resolutions1. fired = briefing.get("fired_intentions", [])
if fired:
total = briefing.get("evaluation", {}).get("intentions_fired", len(fired))
parts.append(f"{total} intention{'s' if total != 1 else ''} ready")Regression guard: new 2. Cap-test ordering weak — Fixed. Replaced the count-only check with strict slot assertions: assert [f["urgency"] for f in fired[:3]] == ["high", "high", "high"]
assert all(f["urgency"] == "normal" for f in fired[3:])A sign flip on Verification
Push verification
CI running. Awaiting automation promotion to Ready for QA on green. |
cmeans
left a comment
There was a problem hiding this comment.
[QA] Round 2 — Ready for QA Signoff
Both round-1 findings fixed in commit 6f50cb4 with the exact changes suggested, plus a dedicated regression test for the summary-undercount case.
Fix verification
Finding 1 — compose_summary undercount: collator.py:221-225 now pulls the total from evaluation.intentions_fired with a fallback to len(fired):
fired = briefing.get("fired_intentions", [])
if fired:
total = briefing.get("evaluation", {}).get("intentions_fired", len(fired))
parts.append(f"{total} intention{'s' if total != 1 else ''} ready")Re-ran the failing scenario end-to-end against a freshly-built round-2 image (mcp-awareness-qa:393-r2): seeded 15 fired intentions → evaluation.intentions_fired: 15, len(fired_intentions): 10, summary: "15 intentions ready." (was "10 intentions ready." on round 1). Capped list and full count are now consistent.
Finding 2 — test slot-ordering gap: test_briefing_caps_fired_intention_list now locks the contract strictly:
assert [f["urgency"] for f in fired[:3]] == ["high", "high", "high"]
assert all(f["urgency"] == "normal" for f in fired[3:])Extra-strong — asserts both the high-urgency prefix AND that the remaining 7 slots are all normal. A sign-flip on the sort key now fails this test.
Bonus: new regression test. test_briefing_summary_reports_total_fired_count_not_capped seeds 15 fired and asserts "15 intentions ready" in summary and "10 intentions ready" not in summary. Locks the fix mechanically.
Round-2 checks (this session, PR head 6f50cb4)
| Check | Result |
|---|---|
pytest full suite |
1006 passed, 7 skipped (Ollama) — up from 1005, the +1 being the new regression test |
pytest tests/test_collator.py |
57 passed — up from 56, accounting for the new test |
ruff check src tests |
All checks passed |
mypy src |
Success, 21 source files, no issues |
CI on 6f50cb4 |
All 15 non-skipped checks SUCCESS (lint, typecheck, test 3.10–3.14, CodeQL, gitleaks, semgrep, pip-audit, codecov, CLA, QA Gate) |
| Manual: summary text with 15 fired | "15 intentions ready." ✅ |
| Manual: top-3 slot ordering | [high, high, high] ✅ |
| Manual: slot 3–9 urgencies | all normal ✅ |
Not new, not blocking — minor code smell
compose_summaryre-uses the nametotalfor two meanings (source count on line 202, fired count on line 224). Currently safe because theelse f"All clear across {total} sources."fallback is only reached whenpartsis empty, andfiredis only non-empty when parts will populate. Worth noting in case future edits change the control flow.
Verdict: Ready for QA Signoff. Zero new findings, both round-1 fixes verified in-session at code, test, and MCP-integration layers. Over to maintainer for QA Approved.
|
Applying Ready for QA Signoff — both round-1 findings fixed in |
…#394) ## Linked issue Fixes # none — version-stamp release, not tracked by a feature issue. ## Summary Version stamp release for v0.18.3 (patch, 0.18.2 → 0.18.3). Renames `[Unreleased]` → `[0.18.3] - 2026-04-24`, adds comparison link, bumps `pyproject.toml`. No code changes. Scope delta since v0.18.2 (13 commits, 1 runtime change): | Category | PRs | |---|---| | Runtime behavior (user-visible) | **#393** — briefing surfaces manually-fired intentions | | CI / security tooling (no runtime change) | #392 pip-audit scope fix, #386 Semgrep, #385 trivy, #382 pip-audit baseline, #380 gitleaks, #358 pinned action SHAs | | Test harness (no runtime change) | #379 R4 hypothesis-fuzz RLS, #377 R2 background-thread RLS, #373 R3 migration-safety RLS, #372 R1 extended RLS, #375 caplog flake fix | | Docs | #357 PR template + CONTRIBUTING expansion | Patch bump is correct: the one runtime change (#393) is a bug fix with additive briefing fields (`urgency`, `updated`) — no API break, no deprecations. ## Scope ``` CHANGELOG.md | 5 ++++- pyproject.toml | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) ``` Version stamp only. Zero source code changes. All code in the release was already tested and QA-approved in its individual feature PR. ## AI-assistance disclosure - [ ] No AI used in producing this PR - [x] AI assisted with code generation (e.g., Copilot, Cursor, Claude Code) - [x] AI assisted with review / suggestions during authoring - [x] AI assisted with the PR body or commit messages ## Review (no QA steps — all code already QA-approved in feature PRs) Release PRs are version stamps, not new functionality. Reviewer checks: 1. - [x] `pyproject.toml` version bumped correctly (0.18.2 → 0.18.3). 2. - [x] `CHANGELOG.md` heading renamed `[Unreleased]` → `[0.18.3] - 2026-04-24` with today's date. 3. - [x] Empty `## [Unreleased]` section preserved above `[0.18.3]` for future work. 4. - [x] Comparison links at the bottom: `[Unreleased]` now points at `v0.18.3...HEAD`, new `[0.18.3]` link at `v0.18.2...v0.18.3`. 5. - [x] Scope delta table in this PR body matches `git log v0.18.2..release/v0.18.3 --oneline`. 6. - [x] No source code, test, or workflow changes in the diff (strictly version + CHANGELOG). ## Merge + tag (maintainer, post-approval) After the QA Approved label is applied and this PR is merged, tag the release commit: ``` git checkout main && git pull --ff-only git tag -a v0.18.3 -m "v0.18.3 — briefing surfaces manually-fired intentions" git push origin v0.18.3 ``` The `docker-publish.yml` workflow fires on tag push and publishes `mcp-awareness:0.18.3` + `mcp-awareness:latest`. Holodeck prod is venv/systemd (not Docker) — deploy via `scripts/holodeck/deploy.sh` on the operator workstation (git pull + pip + restart + HAProxy drain). `docker-compose.yaml` uses `:latest` so no update needed there. ## Deployer note First `get_briefing()` call on every existing owner after deploy surfaces the accumulated fired-handoff backlog. For the production owner that's 20+ entries since 2026-04-14. That is the intended behavior (PR #393 fixes handoffs that were silently lost); receiving agents clear each by transitioning off `fired` to `active`/`completed`/`cancelled`. ## Checklist - [x] `CHANGELOG.md` heading renamed and comparison links updated - [x] `pyproject.toml` version bumped - [x] `README.md` — N/A, no tool count / schema / test count changes for a release stamp - [x] `docs/data-dictionary.md` — N/A, no schema change - [x] `docker-compose.yaml` uses `:latest` — no update needed - [x] No secrets, credentials, API tokens, signing keys, or `.env` contents included - [x] I have read and will sign the [CLA](../CLA.md) via the `cla-assistant` bot Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
Linked issue
Fixes # none — replaces #389 (closed) with a fresh branch created bot-first so
require_last_push_approvalbranch protection can proceed.Summary
generate_briefingonly reported intentions it auto-transitioned frompendingtofiredon the current call (the time-based auto-fire path). Intentions set tostate="fired"by another agent — the documented compaction-handoff convention — were counted correctly byget_intentions(state="fired")but invisible toget_briefing, so handoffs written at session end never reached the receiving agent. Production owner has 20+ fired handoff intentions accumulated since 2026-04-14, none surfaced.generate_briefingnow queries allstate="fired"intentions for the owner (including those the current call just transitioned), sorts by urgency desc then most-recent update, caps the listed entries at 10 to bound briefing size, and reports the full count inevaluation.intentions_fired. The receiving agent transitions the intention offfired(→active/completed/cancelled/snoozed) once acknowledged; until then, the briefing keeps surfacing it.fired_intentionsbriefing entries now carryurgencyandupdatedfields.Scope
Single concern: briefing surfacing logic for fired intentions. No schema change, no tool surface change (the briefing JSON adds
urgencyandupdatedfields to eachfired_intentionsentry — additive).AI-assistance disclosure
QA
Prerequisites
pip install -e ".[dev]"AWARENESS_PORT=8421) connected to an empty or scratch Postgres database so pre-existing fired intentions from the real owner don't confuse the cap/ordering test. A fresh docker-compose test profile or an alternateAWARENESS_DATABASE_URLboth work.Manual tests (via MCP tools)
Expected:
intentions_fired: 0,fired_intentionsabsent from response,attention_needed: false.deliver_atsurfaces in the next briefing (the handoff path that was broken)Expected:
evaluation.intentions_fired == 1,fired_intentions[0].id == "<id>",fired_intentions[0].goal == "Resume test handoff",fired_intentions[0].urgency == "high",fired_intentions[0].updatedis the ISO timestamp of theupdate_intentioncall,attention_needed: true.Expected: still
intentions_fired == 1, same entry present. (This is the behaviour change from the old testtest_fired_intention_not_surfaced_on_second_briefing.)Expected:
intentions_fired == 0,fired_intentionsabsent.deliver_at)Expected:
intentions_fired == 1, entry present infired_intentions, and the underlying intention'sstateis now"fired"(verify withget_intentions(state="fired")). Secondget_briefing()still surfaces it (persistent until actioned).Expected:
evaluation.intentions_fired == 15,len(fired_intentions) == 10, the first 3 entries infired_intentionsall carryurgency == "high"(the high-urgency ones are ranked ahead of the normal ones; ties within a band fall back to most-recentupdated).active/completed/cancelled/snoozeddo not countExpected:
evaluation.intentions_fired == 12(15 − 3), cap still enforced if applicable, only entries currently in"fired"state appear.Checklist
CHANGELOG.mdentry added under[Unreleased]in Keep-a-Changelog format (under### Fixed)README.mdanddocs/data-dictionary.mdupdated if affected — N/A, no new tool / no schema change.envcontents included in the diffruff check,mypy, andpytestpass locally (1005 passed, 7 skipped)cla-assistantbotNotes for the deployer
After this deploys, the next
get_briefing()call on every existing owner will surface whatever fired intentions have accumulated — in this repo's production owner that is 20+ entries spanning five projects, some since 2026-04-14. That is the intended behaviour (those handoffs were supposed to surface; the bug was that they didn't). The receiving agents should triage them (transition toactive/completed/cancelledas appropriate) to clear the backlog.