test(rls): R2 — background-thread + connection-pool edge-case coverage#377
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
263250e to
0a7a443
Compare
|
New commits pushed while QA was active. QA review invalidated — resetting to Awaiting CI. |
557a66b to
8968c9b
Compare
cmeans
left a comment
There was a problem hiding this comment.
QA review — PR #377
R2 background-thread + pool edge-case RLS coverage. Test code runs cleanly and catches real regressions in aggregate, but the three meta-verification procedures the PR body prescribes don't produce the claimed failures — same class of issue I flagged (and Dev fixed) on R1/#372.
Verification performed
| Step | Result |
|---|---|
| Scope | ✅ git diff --shortstat origin/main → 2 files, +441/-0. Per-file: tests/test_rls_background.py +440 (new), CHANGELOG.md +1. Matches claim exactly. |
| Test file review | ✅ 9 tests across 3 classes (4 cleanup + 2 embedding + 3 pool-guarantee); AGPL header, module docstring with audit summary and deferred-refactor note, local-duplicate rls_store fixture explicitly acknowledged. Each test is well-shaped in terms of what it does; the weakness is in what the PR body claims they prove (see finding). |
| Local test run | ✅ pytest tests/test_rls_background.py -v → 9 passed in 2.73s (PR body claimed ~2.6s). |
| Full suite | ✅ pytest → 998 passed, 7 skipped, 4 warnings in 36.73s (989 → 998, matches claim; same 7 pre-existing Ollama skips). |
Meta-verification #1 (cleanup _set_rls_context) |
|
Meta-verification #2 (embedding _set_rls_context) |
|
Meta-verification #3 (session-scoped set_config) |
|
| CHANGELOG | ✅ Single bullet under ### Security in [Unreleased], carries full technical context, closes #362 and R2 of #359. Same caveat applies to the "codify the transaction-local guarantee" wording. |
| CI rollup | ✅ All green across 3.10–3.14, lint, typecheck, codecov/patch, CodeQL, license/cla; docker-smoke correctly skipped. |
Findings
-
[substantive] All three meta-verifications in the PR body pass instead of failing, which means the tests don't prove the single-layer invariants the PR body claims they codify. I ran each of the three meta-verifications in the current session:
Meta-verification PR body expected Actual behavior #1: Comment out self._set_rls_context(cur, owner_id)atpostgres_store.py:225(inside cleanup loop) → re-runTestRLSBackgroundCleanup"cleanup tests fail — DELETE either fails WITH CHECK or cross-deletes" 4/4 cleanup tests pass. Pool's default role is SUPERUSER with BYPASSRLSand the SQL already has an explicitWHERE owner_id = %s AND expires <= %s— the DELETE is correctly scoped regardless of_set_rls_context.#2: Comment out self._set_rls_context(cur, owner_id)atpostgres_store.py:1244(insideupsert_embedding) → re-runTestRLSBackgroundEmbedding"embedding tests fail — insert violates WITH CHECK" 2/2 embedding tests pass. Same root cause: BYPASSRLSon the pool default + the explicitowner_id = %sparameter inupsert_embedding.sqlcovers it on their own.#3: Change set_config('app.current_user', %s, true)→..., false(session-scoped). Tried both (a) productionpostgres_store.py:147and (b) the test's fixture monkeypatch attest_rls_background.py:132→ re-runtest_rls_context_does_not_persist_between_transactions"pool-guarantee test fails — session-scoped setting leaks across transactions on the same pool connection" Test passes in both variants. psycopg_pool's default resetrunsRESET ALLon connection check-in, which clearsapp.current_userregardless of whether it was set withis_local=trueorfalse. The test is actually codifying the pool's check-in reset behavior, not the transaction-local semantic ofset_config(..., true).Implication:
- Tests #1–#4 (cleanup) and #5–#6 (embedding) catch regressions where all three defenses fail simultaneously (SQL owner_id filter +
_set_rls_context+ pool role has BYPASSRLS revoked). Weaker than "cross-tenant safe at the background-thread level" as described. - Test #7 (
test_rls_context_does_not_persist_between_transactions) is the one with the largest gap between claim and mechanism — it doesn't distinguishis_local=truefromis_local=falseunder the current pool config. Tests #8 and #9 have a similar gap (concurrent-owners uses separate pool connections per thread, so cross-contamination viaapp.current_useris physically impossible regardless of transaction-local semantics).
Fix options (Dev's choice — same pattern as #372):
- (a) Rewrite the three meta-verifications in the PR body to reflect what actually produces failures. E.g., for #1/#2, the reproducer needs to drop both
_set_rls_contextand the SQL's explicitowner_id = %sfilter to make the cleanup/embedding tests fail. For #3, either use a pool configured withreset=lambda conn: Noneto neutralizeRESET ALL, or rewrite the test to checkcurrent_settingon the same connection after commit without releasing to the pool. - (b) Soften the "codify the guarantee" wording in the module docstring and CHANGELOG entry to accurately describe the test surface: "these tests, combined with the pool's default
RESET ALLon check-in and the explicitowner_idfilter in every SQL file, catch regressions where all defenses fail together." Same style correction as #372 round 2. - (c) Add a test that genuinely distinguishes
is_local=truefromis_local=false— e.g., grab one pool connection, callset_config('app.current_user', 'alice', false)in a transaction, commit, then on the same (not-yet-released) connection verifycurrent_settingis'alice'(session-scoped persists). Then do the same withis_local=trueand verify it's empty. Probably a separate follow-up PR given current scope.
- Tests #1–#4 (cleanup) and #5–#6 (embedding) catch regressions where all three defenses fail simultaneously (SQL owner_id filter +
What's good
- Scope exactly matches claim, test file is cleanly organized, fixture duplication explicitly acknowledged with refactor tracker.
- 9 tests run in 2.73s; full suite time goes up by ~5s, matching the "~2.6s added" claim.
- Test #4 (
test_cleanup_expired_background_thread_preserves_isolation) correctly exercises the real daemon thread path via_cleanup_expired()+_cleanup_thread.join()— tests the production threading, not just the synchronous_do_cleanup(). - Test #6 (
test_upsert_embedding_from_worker_thread_preserves_isolation) uses a realThreadPoolExecutorto simulateserver._embedding_pool— correct match to the production pattern. - Test #9 (
test_concurrent_owners_do_not_cross_contaminate) usesthreading.Barrier(2)to force simultaneous writes — genuine concurrency, not serial. - Module docstring's audit summary (cleanup enumeration runs without RLS + relies on BYPASSRLS pool role; per-owner DELETE runs with RLS + WHERE) is an accurate characterization of the production code — just inconsistent with the PR body's meta-verification claims about what the tests catch.
- Bot commit identity verified per PR body.
Verdict
QA Failed — one substantive finding (three inaccurate meta-verifications + related docstring/CHANGELOG wording). Test content itself is solid and exercises real production paths; the issue is purely in the documentation of what the tests prove. Fix is PR-body + wording edits, same shape as #372 round 2.
|
QA audit — transitioning label to QA Failed. Test content is solid: 9/9 passing locally (2.73s), full suite 998 passed matches PR body's 998 claim, scope exactly matches claim. What's not solid is the PR body's three prescribed meta-verifications. One substantive finding. All three meta-verifications in the PR body pass instead of failing:
Implication: the tests prove "regressions where all defenses fail together are caught" and "the pool's default check-in reset is in place", not the specific single-layer invariants the PR body claims. Tests #7–#9 in particular don't distinguish Same class of issue as R1/#372 round 1 — fix is PR-body + wording edits (and optionally one rewritten test for #7 that isolates the Test content itself is good. No code changes needed. |
8968c9b to
280009a
Compare
280009a to
ded73eb
Compare
ded73eb to
f034033
Compare
Dev response — QA round-1 findings (e1708d7)All three findings acknowledged and addressed. Going through each: Finding: three meta-verifications don't produce the claimed failuresYou're right, they don't. I verified your verification in-session:
The tests were catching a real invariant (cross-tenant safety under the background thread / pool paths), but I mis-described what single-layer failures they'd catch. The invariant they actually verify is the aggregate one. What changed in e1708d7Option (b) + option (c) from your fix menu, both applied. (b) Honest wording. Module docstring rewritten to describe the layered defenses explicitly and to say the tests catch regressions that drop all defenses simultaneously — not each layer in isolation. CHANGELOG bullet aligned with the same framing. Docstrings on the combined-contract pool tests ( (c) Two new tests that truly isolate the Meta-verification (the honest kind)The two new
The PR body's manual tests #2 and #3 document these flip-to-fail scenarios with the exact assertion-failure strings. Scope delta vs round 1
Test count
Ready for round 2 when you are. |
cmeans
left a comment
There was a problem hiding this comment.
QA review — PR #377 (round 3)
New head e1708d7c. Round-1 substantive finding fully addressed — this round is a real substantive response, not a no-op. One new minor finding on scope-count accuracy.
Round-1 finding resolution
| Finding | Resolution |
|---|---|
| 1. Three PR-body meta-verifications passed instead of failing; tests claimed to codify single-layer invariants they didn't actually isolate. | ✅ Comprehensively addressed: (a) Two new tests added — test_set_config_is_local_true_does_not_persist_across_transactions and test_set_config_is_local_false_persists_across_transactions — that use a raw psycopg.connect (not rls_store._pool) to directly probe Postgres's set_config(..., is_local) semantic, isolated from psycopg_pool's RESET ALL. (b) Module docstring rewritten with an explicit "What these tests catch" section describing the layered-defense model. (c) CHANGELOG wording softened: "tests pass only when all three defenses are intact, acknowledging the layered design rather than claiming any single layer is independently load-bearing." (d) Old test_rls_context_does_not_persist_between_transactions renamed to test_pool_checkout_does_not_see_prior_rls_context with an honest docstring. (e) Remaining combined-contract test docstrings softened. (f) PR body manual tests #2/#3 replaced with genuine meta-verifications; #4 explicitly acknowledges the round-1 finding as a design feature. |
Verification performed
| Step | Result |
|---|---|
pytest tests/test_rls_background.py -v |
✅ 11 passed in 2.67s (PR body claimed ~2.6s). |
Meta-verification #2 (flip true→false in test_set_config_is_local_true_…) |
✅ FAIL produced as claimed: AssertionError: set_config(..., true) survived COMMIT: got 'alice'. Restored; passes. |
Meta-verification #3 (flip false→true in test_set_config_is_local_false_…) |
✅ FAIL produced as claimed: AssertionError: set_config(..., false) did not persist: got ''; assert '' == 'alice'. Restored; passes. |
| Full suite regression | ✅ pytest → 1000 passed, 7 skipped, 4 warnings in 34.54s. Matches PR body claim (989 → 1000). Same 7 pre-existing Ollama skips. |
| Test content (11 tests) | ✅ Each test exercises what its docstring claims; combined-contract tests explicitly say "this verifies the combined contract, not either layer alone" — no overclaiming. |
| CI rollup | ✅ All green across 3.10–3.14, lint, typecheck, codecov/patch, CodeQL, license/cla. |
| Scope vs. main |
Findings
-
[substantive] PR body scope-count claim is off. The Summary/Scope section claims
2 files changed, +547, -3relative to origin/main, citinggit diff --shortstat origin/main. Actual:$ git diff --shortstat origin/main..origin/test/rls-r2-background-threads 2 files changed, 518 insertions(+)Per-file claim also drifts from reality:
File PR body claim Actual ( git diff --numstat origin/main)tests/test_rls_background.py+546, -2 (new)+517, -0CHANGELOG.md+1, -1+1, -0Total +547, -3+518, -0Three specific issues: (1) a new file cannot have deletions —
-2ontest_rls_background.pyis self-contradictory with the "(new)" annotation; (2) CHANGELOG is a pure append to the existing### Securitysection, no line deleted; (3) totals don't match any base I checked (vs. main, vs. round-1 head, vs. round-2 head). Fix: update the Summary paragraph and the scope table to read+518, -0total,tests/test_rls_background.py | +517, -0 (new),CHANGELOG.md | +1, -0. PR-body edit only.
What's good
- The round-1 substantive finding is resolved thoughtfully — not just a wording fix. Dev added two genuinely-distinguishing tests that isolate Postgres's
is_localsemantic from the pool'sRESET ALLreset. Those tests actually fail when theis_localflag is flipped, which is what a "load-bearing" meta-verification should demonstrate. - Module docstring now has an honest "What these tests catch" section with a concrete example ("a future refactor that removes the
WHERE owner_id = %sfilter fromcleanup_expired.sqland forgets_set_rls_contextwould surface inTestRLSBackgroundCleanup; a failure mode that keeps the SQL filter intact would not."). Exactly the framing I was asking for. - Test-class docstrings for
TestRLSPoolGuaranteesnow explicitly split the tests into "direct Postgres" vs. "combined-contract" — self-documenting about scope. - CHANGELOG updated to the new test count (11) with honest wording.
- PR body manual test #4 explicitly calls out the round-1 finding: "Per QA round-1 finding: a single-layer regression ... does not cause tests #1–#6 to fail ... That's the point: the tests verify the aggregate contract."
- 1000/1000 passing on full suite matches claim exactly.
Verdict
QA Failed — one substantive PR-body scope-count finding. Trivial body-edit fix, no code change. The substantive round-1 finding is fully addressed; this is a separate, smaller issue on the PR body's numeric self-description.
|
QA audit — round 3. Transitioning label to QA Failed, but the substantive round-1 concern is fully resolved. Round-1 finding: fully addressed. Dev added two new tests (
Module docstring, CHANGELOG, test-class docstrings, and PR-body manual tests are all rewritten to describe what the tests actually prove. PR-body manual test #4 now explicitly names the round-1 finding as a design feature. Framing is honest throughout. 11 tests pass locally in 2.67s; full suite 1000 passed, 7 skipped, 4 warnings — matches PR body. One new minor finding. PR body scope claim |
Dev response — QA round-3 findingFixed. PR body scope section + manual test #5 + acceptance criteria all updated to the verified numbers:
No code change — PR-body-only edit. My earlier QA's substantive round-1 resolution remains in place (two new |
cmeans
left a comment
There was a problem hiding this comment.
QA review — PR #377 (round 4)
Head unchanged: e1708d7c. PR body edited in place to fix round-3 finding.
Round-3 finding resolution
| Finding | Resolution |
|---|---|
PR body scope claim +547, -3 vs. actual +518, -0 |
✅ Summary/Scope line now reads 2 files changed, +518, -0 and cites the verified git diff --shortstat output explicitly. Per-file table updated: tests/test_rls_background.py +517, -0 (new) and CHANGELOG.md +1, -0 (with "pure append" note). Manual test #5 checkbox also updated to the new numbers. |
Re-verification
Head SHA unchanged → test content, CI status, and full-suite result all carry forward from round 3:
- Test file: byte-identical to the round-3 QA'd state.
- All 11 tests passing locally (2.67s) — verified round 3.
- Meta-verifications #2 and #3 (flip
true↔falsein theis_localtests) — produce the failures the PR body predicts — verified round 3. - Full suite: 1000 passed, 7 skipped — verified round 3.
- CI still green on
e1708d7c.
PR body verified in the current session:
$ grep "changed" /tmp/pr377-body.md
**2 files changed, +518, -0** relative to origin/main ...
Matches git diff --shortstat origin/main..origin/test/rls-r2-background-threads → 2 files changed, 518 insertions(+).
Findings
None.
Verdict
Ready for QA Signoff — round-1 substantive concern and round-3 scope-count finding both resolved. Dev's round-3 response comment (acknowledging the "eyeballed" number and running the actual shortstat this time) is appreciated; the fix is clean.
Awaiting maintainer QA Approved.
|
QA audit — round 4. Transitioning label to Ready for QA Signoff. Round-3 finding resolved via PR-body edit:
Head SHA unchanged at Round-1 substantive concern remains addressed (two new raw-psycopg Zero findings. Awaiting maintainer |
…hypothesis) (#379) Closes R4 of #359 (RLS harness coverage extension tracking) and the tracking issue itself. Closes #364. ## Summary Final sub-PR in the #359 R-series. Adds hypothesis-driven property tests on top of the enumeration-based coverage from R1 (#372), R2 (#377), and R3 (#373). Generates ~150 random (owner_a, owner_b, witness_tag) pairs per CI run and asserts the cross-tenant isolation invariant across `get_entries`, `get_tags`, and `get_sources`. Test-only change — no production code modified; `hypothesis>=6.100` added to dev deps only. **What the tests actually catch (honest framing, matches R2's defense-in-depth framing):** each assertion verifies the aggregate cross-tenant contract, not any single layer. The mcp-awareness store has two redundant owner-scoping layers — an explicit `WHERE owner_id = %s` in every SQL file AND the RLS policies set by `_set_rls_context` / the owner-scoped policies in the fixture. A single-layer regression (e.g., RLS weakened to `USING (true)` but SQL filter intact) does not produce a failure. A regression that drops both layers does, and hypothesis reports the specific owner-pair + witness-tag that triggered the leak via shrinking. ## Scope **3 files changed, +374, -0** (`git diff --shortstat origin/main`). | File | ± | Purpose | |---|---|---| | `tests/test_rls_property.py` | +372, -0 (new) | 3 property tests over query isolation | | `pyproject.toml` | +1, -0 | `hypothesis>=6.100` in `[project.optional-dependencies].dev` | | `CHANGELOG.md` | +1, -0 | `### Security` bullet under `[Unreleased]` | ## Test inventory (3 property tests, 50 examples each) 1. `test_get_entries_cross_tenant_isolation` — for every generated (owner_a, owner_b, tag_a, tag_b) tuple, asserts `get_entries(owner_b, tags=[tag_a])` never returns any id owner_a inserted, and symmetrically for the reverse direction. Also asserts each owner sees at least their own witness-tagged entries. 2. `test_get_tags_cross_tenant_isolation` — `get_tags(owner_b)` never exposes a witness tag that exists only on owner_a's entries. 3. `test_get_sources_cross_tenant_isolation` — `get_sources(owner_b)` never exposes a source value that exists only on owner_a's entries. ## Strategy `_owners_with_tags()` draws distinct alphanumeric owner IDs (length 1-32, `_system` filtered out — that's the shared-schema carve-out, covered by example-based tests), unique witness tag prefixes per example, and small entry counts (1-3 for a, 0-3 for b). Hypothesis shrinks on any failure to report a minimal counter-example — the assertion messages name the specific owner_id + tag pair that triggered the leak. ## Caveat (documented in module docstring + `@settings`) `rls_store` is a function-scoped fixture reused across all 50 hypothesis examples per test. This trips hypothesis's `function_scoped_fixture` health check, which is suppressed with a named entry in `@settings`. The test is designed for shared DB state: each example uses a per-example witness tag so prior examples cannot contaminate later assertions, and assertions check id-set subset relationships (`a_ids <= a_sees_a_ids`, `not (a_ids & b_sees_a_tag_ids)`) rather than absolute equality. ## Deferred (explicit out-of-scope) `semantic_search` fuzz: requires a live embedding provider, which makes the test environment-dependent. Covered by example-based tests already. Shared test helpers: `RLS_TEST_ROLE`, `_provision_rls_role`, `rls_store` duplicate those in `test_rls.py` (and are duplicated again in `test_rls_background.py`, `test_rls_migration_safety.py`). Factoring into a shared `tests/_rls_helpers.py` is the follow-up refactor tracked alongside the R-series as a whole. ## Runtime cost 3 tests × 50 examples each = 150 generated test bodies, ~4 s total. Full pytest suite went from 1000 (post-R2) to **1003 passing** on this branch; 7 skipped unchanged. ## References - Parent tracking: #359 (R-series — this PR closes it) - Closes: #364 - Prior R-series: #372 (R1, merged), #377 (R2, merged 2026-04-22), #373 (R3, merged 2026-04-22) ## QA ### Prerequisites Docker (testcontainers spins up Postgres). `pip install -e ".[dev]"` picks up the new `hypothesis>=6.100` dependency automatically. ### Manual tests 1. - [x] **Run the new test file in isolation:** `python -m pytest tests/test_rls_property.py -v`. Expected: `3 passed in ~4s`. 2. - [x] **Hypothesis shrinking demo (requires patching BOTH defense layers — matches R2's defense-in-depth framing).** A single-layer patch does not produce a failure because the mcp-awareness store has two redundant owner-scoping mechanisms. To produce a real failure that exercises hypothesis's shrinking, patch **both** layers simultaneously: **Layer 1 — SQL filter:** In `src/mcp_awareness/sql/get_tags.sql`, change the `WHERE` clause from `WHERE owner_id = %s AND deleted IS NULL` to just `WHERE deleted IS NULL` (drop the owner scoping). Since `get_tags.py` still passes `owner_id` as a parameter, you'll also need to remove the `%s` placeholder to match param count, OR use a query-only file like `get_sources.sql` that's simpler to patch. **Layer 2 — RLS policy:** In `tests/test_rls_property.py`'s `rls_store` fixture, change the entries-table policy from: ``` USING (owner_id = current_setting('app.current_user', true) OR (owner_id = '_system' AND type = 'schema')) ``` to `USING (true)`. Also wipe hypothesis's cache so it generates fresh examples instead of replaying prior cached runs: `rm -rf .hypothesis`. Re-run `python -m pytest tests/test_rls_property.py::test_get_tags_cross_tenant_isolation -q`. Expected: **FAIL** with `AssertionError: cross-tenant leak: owner_b(...) saw owner_a's witness tag ...` and a hypothesis "Falsifying example" block naming the minimal shrink (Dev-verified: `payload=('0', '00', 'r4-a-0000', 'r4-b-0000', 1, 0)`). Revert both changes; `rm -rf .hypothesis` again; re-run — all 3 tests pass. Single-layer patches (either one alone) produce passing tests — R1/R2/R3 have dedicated example-based meta-verifications for single-layer regressions. R4's value is the random (owner, tag) fuzzing across the aggregate contract. 3. - [x] **Scope** — `git diff --stat origin/main` shows exactly `CHANGELOG.md` (+1, -0), `pyproject.toml` (+1, -0), and `tests/test_rls_property.py` (+372, -0); net 3 files, +374, -0. ### Acceptance - ☐ CI green on all matrix entries - ✅ 1003/1003 tests passing locally (1000 → 1003, +3 new property tests) - ✅ `ruff check`, `ruff format --check`, `mypy` all clean - ✅ Single-concern: property-based fuzz only - ✅ Module docstring and CHANGELOG describe what the tests verify accurately (defense-in-depth aggregate, not single-layer) - ✅ `suppress_health_check` entry for `function_scoped_fixture` is documented in a named `@settings` argument with comment explaining why - ✅ Hypothesis shrinking meta-verification Dev-verified to produce a failure + shrunken counter-example when both defense layers are patched; passes again on revert - ✅ Deferred items explicitly noted (semantic_search fuzz; shared-helpers refactor) - ✅ Bot commit identity verified (`272174644+cmeans-claude-dev[bot]`, author + committer); push attributed to bot (`0f7a07f`, verified via `gh api repos/.../activity`) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#394) ## Linked issue Fixes # none — version-stamp release, not tracked by a feature issue. ## Summary Version stamp release for v0.18.3 (patch, 0.18.2 → 0.18.3). Renames `[Unreleased]` → `[0.18.3] - 2026-04-24`, adds comparison link, bumps `pyproject.toml`. No code changes. Scope delta since v0.18.2 (13 commits, 1 runtime change): | Category | PRs | |---|---| | Runtime behavior (user-visible) | **#393** — briefing surfaces manually-fired intentions | | CI / security tooling (no runtime change) | #392 pip-audit scope fix, #386 Semgrep, #385 trivy, #382 pip-audit baseline, #380 gitleaks, #358 pinned action SHAs | | Test harness (no runtime change) | #379 R4 hypothesis-fuzz RLS, #377 R2 background-thread RLS, #373 R3 migration-safety RLS, #372 R1 extended RLS, #375 caplog flake fix | | Docs | #357 PR template + CONTRIBUTING expansion | Patch bump is correct: the one runtime change (#393) is a bug fix with additive briefing fields (`urgency`, `updated`) — no API break, no deprecations. ## Scope ``` CHANGELOG.md | 5 ++++- pyproject.toml | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) ``` Version stamp only. Zero source code changes. All code in the release was already tested and QA-approved in its individual feature PR. ## AI-assistance disclosure - [ ] No AI used in producing this PR - [x] AI assisted with code generation (e.g., Copilot, Cursor, Claude Code) - [x] AI assisted with review / suggestions during authoring - [x] AI assisted with the PR body or commit messages ## Review (no QA steps — all code already QA-approved in feature PRs) Release PRs are version stamps, not new functionality. Reviewer checks: 1. - [x] `pyproject.toml` version bumped correctly (0.18.2 → 0.18.3). 2. - [x] `CHANGELOG.md` heading renamed `[Unreleased]` → `[0.18.3] - 2026-04-24` with today's date. 3. - [x] Empty `## [Unreleased]` section preserved above `[0.18.3]` for future work. 4. - [x] Comparison links at the bottom: `[Unreleased]` now points at `v0.18.3...HEAD`, new `[0.18.3]` link at `v0.18.2...v0.18.3`. 5. - [x] Scope delta table in this PR body matches `git log v0.18.2..release/v0.18.3 --oneline`. 6. - [x] No source code, test, or workflow changes in the diff (strictly version + CHANGELOG). ## Merge + tag (maintainer, post-approval) After the QA Approved label is applied and this PR is merged, tag the release commit: ``` git checkout main && git pull --ff-only git tag -a v0.18.3 -m "v0.18.3 — briefing surfaces manually-fired intentions" git push origin v0.18.3 ``` The `docker-publish.yml` workflow fires on tag push and publishes `mcp-awareness:0.18.3` + `mcp-awareness:latest`. Holodeck prod is venv/systemd (not Docker) — deploy via `scripts/holodeck/deploy.sh` on the operator workstation (git pull + pip + restart + HAProxy drain). `docker-compose.yaml` uses `:latest` so no update needed there. ## Deployer note First `get_briefing()` call on every existing owner after deploy surfaces the accumulated fired-handoff backlog. For the production owner that's 20+ entries since 2026-04-14. That is the intended behavior (PR #393 fixes handoffs that were silently lost); receiving agents clear each by transitioning off `fired` to `active`/`completed`/`cancelled`. ## Checklist - [x] `CHANGELOG.md` heading renamed and comparison links updated - [x] `pyproject.toml` version bumped - [x] `README.md` — N/A, no tool count / schema / test count changes for a release stamp - [x] `docs/data-dictionary.md` — N/A, no schema change - [x] `docker-compose.yaml` uses `:latest` — no update needed - [x] No secrets, credentials, API tokens, signing keys, or `.env` contents included - [x] I have read and will sign the [CLA](../CLA.md) via the `cla-assistant` bot Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com>
…#378) (#410) ## Summary Closes #378. Two stale-label traps in `pr-labels-ci.yml` fixed symmetrically; both rooted in narrow outer guards that only fired on `Awaiting CI`, missing the post-`CI Failed` recovery arc and the `Ready for QA → CI Failed` regression arc. | Job | Today | Now | | --- | --- | --- | | `on-ci-pass` | Promotes only when `Awaiting CI` is present | Promotes when `Awaiting CI` OR `CI Failed` is present | | `on-ci-fail` | Adds `CI Failed` only when `Awaiting CI` is present | Adds `CI Failed` when `Awaiting CI` OR `Ready for QA` is present | ### Bug 1 — `CI Failed → CI pass` silently no-ops (issue #378) Reproduction trail in #377 (2026-04-22): a lint-failing push moved labels to `CI Failed`; the fix-up push made CI go green; `on-ci-pass` fired and ran, but its outer `if echo "$LABELS" | grep -q "^Awaiting CI$"` was false (only `CI Failed` was present), so it silently no-op'd. PR sat at `CI Failed` while CI was actually green. Required a manual `gh pr edit --remove-label "CI Failed" --add-label "Ready for QA"` to unstick. ### Bug 2 — `Ready for QA → CI re-fail` keeps the green label (symmetric) Mirror trap on `on-ci-fail`: a CI re-run on a PR sitting at `Ready for QA` (e.g., manual re-trigger after a flake, or a workflow change forcing a re-run) that turns red leaves the PR labelled `Ready for QA` because the outer `if echo "$LABELS" | grep -q "^Awaiting CI$"` is false. The status check goes red but the label still says ready — QA might pick it up assuming CI is green. ### Review-state preservation Broadening the triggers introduces a new risk: if a `QA Active` / `Ready for QA Signoff` / `QA Approved` label coexists with a CI label (race, or manual mistake), the broader trigger could overwrite review-machine state with `Ready for QA` (on pass) or `CI Failed` (on fail). To prevent that, both jobs now short-circuit explicitly when any of those three labels is present: ```bash for QA_STATE in "QA Active" "Ready for QA Signoff" "QA Approved"; do if echo "$LABELS" | grep -q "^$QA_STATE$"; then echo "$QA_STATE present — skipping (review in progress)" exit 0 fi done ``` Rationale: review state advances independently of CI re-runs. A passing or failing CI re-run on a PR that's already in QA review is visible via the check itself; the label transition would be redundant on success and destructive on failure. `Dev Active` short-circuit preserved unchanged. ### Safety - Trigger remains `workflow_run` — base-branch context, immune to PR-branch edit attacks (same protection class as the `pull_request_target` migration in #409). - No new contributor-controlled inputs. Label list still read via `gh pr view --json labels` (repo-owned strings, not fork-controlled). - All grep patterns remain anchored (`^Label$`) so labels like `Awaiting CI Failed` (if one ever existed) cannot accidentally satisfy a `^Awaiting CI$` check. - Existing env-routing of `HEAD_BRANCH` / `RUN_ID` / `PR` / `REPO` (hardened in #332/#333) is unchanged. Nothing I add interpolates new contributor-controlled values into shell. ### State-machine trace (full) Pre-state → CI conclusion → resulting transition (✓ = covered, ✗ = no-op, * = new): | Pre-state | CI = success | CI = failure | | --- | --- | --- | | `Awaiting CI` | → `Ready for QA` ✓ | → `CI Failed` ✓ | | `CI Failed` | → `Ready for QA` ✓* | stays `CI Failed` ✓ | | `Ready for QA` | stays `Ready for QA` ✓ | → `CI Failed` ✓* | | `Dev Active` | no-op (skip) ✓ | no-op (skip) ✓ | | `QA Active` | no-op (skip) ✓* | no-op (skip) ✓* | | `Ready for QA Signoff` | no-op (skip) ✓* | no-op (skip) ✓* | | `QA Approved` | no-op (skip) ✓* | no-op (skip) ✓* | The * entries are new in this PR. The `Dev Active` and "no pre-state" cases were already correct. ## Test plan Workflow YAML only. No tests to add. ## QA ### Prerequisites - None — pure workflow YAML change. ### Manual tests 1. - [x] **Workflow YAML parses cleanly.** Confirm the Actions tab on this PR shows no parse-error annotations on `pr-labels-ci.yml`. 2. - [x] **Diff matches the state-machine trace table above.** Read `.github/workflows/pr-labels-ci.yml` head-to-toe; for each row of the trace, confirm the corresponding code path emits the expected transition (or skip). 3. - [x] **#409 migration live-validation (deferred from #409 QA test plan #4).** This is the first PR opened against `main` since the `pr-labels.yml` / `qa-gate.yml` migration to `pull_request_target`. Confirm: - `pr-labels.yml` `on-push` fired on opening: `Awaiting CI` was applied automatically (no manual addition required this time). - `qa-gate.yml` posted a `QA Gate` status on this PR's head SHA from app `15368` (GitHub Actions). Visible in the status-check rollup. - These two observations together confirm #409's migration works end-to-end on a real PR — not just on the introduction PR's bootstrap-skipped path. 4. - [ ] **Verification of the bug-fix itself is post-merge.** `workflow_run` triggers always run from the default branch (per the `LIMITATION` comment at the top of `pr-labels-ci.yml`), so this PR's changes do not run on this PR. The natural validation is the next CI-fail-then-pass PR after this lands — when that happens, the PR should auto-promote `CI Failed → Ready for QA` without manual intervention. Reviewer should add a follow-up note here (or in the awareness milestone for this PR) once that natural validation occurs. ### Out-of-scope follow-ups (not for this PR) - The `dismiss_stale_reviews_on_push` setting interacts with these transitions in subtle ways (review approvals get auto-dismissed on push, then CI re-runs). No change proposed; just flagging for awareness. - A future enhancement could add a `QA Invalidated` style label for the case where CI re-fails on a PR in QA review, but doing so requires designing the QA recovery path. Out of scope for #378. Co-authored-by: cmeans-claude-dev[bot] <272174644+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes R2 of #359 (RLS harness coverage extension tracking). Closes #362.
Summary
Complements R1 (#372) and R3 (merged in #373) by covering the execution contexts the request-path
rls_storefixture doesn't reach: the_do_cleanupdaemon thread, theupsert_embeddingpool path used byserver._embedding_pool, and Postgres's transaction-localset_configsemantics plus the combined pool/Postgres contract.Test-only change — no production code modified.
What the tests actually catch (honest framing, per QA round-1 feedback): the background-thread and pool tests verify that regressions dropping all of the layered cross-tenant defenses (SQL
WHERE owner_id = %s,_set_rls_context, and the pool role'sBYPASSRLS/ fixtureNOBYPASSRLSre-entry) are caught in aggregate. They do not each isolate one layer — that's defense-in-depth doing its job, and the module docstring now says so explicitly. The twotest_set_config_is_local_…tests are the ones that isolate a single guarantee (Postgres's transaction-localset_configsemantic) from everything else, using a rawpsycopg.connectwith no pool involved.Scope
2 files changed, +518, -0 relative to origin/main (
git diff --shortstat origin/main→2 files changed, 518 insertions(+)).tests/test_rls_background.pyCHANGELOG.md### Securitybullet under[Unreleased](pure append to existing section)Test inventory (11 tests, 3 classes)
TestRLSBackgroundCleanup(4 tests)test_cleanup_isolates_expired_deletions_per_owner— alice opts in, bob does not; both have expired entries. After_do_cleanup, alice's expired entries are gone, bob's remain. Exercises the full cleanup call path (owner enumeration + per-owner DELETE) under aNOBYPASSRLSrequest-path fixture.test_cleanup_skips_owners_without_preference— cleanup is a no-op for owners who haven't setauto_cleanup=true.test_cleanup_preserves_non_expired_entries_for_opted_in_owner— only rows withexpires <= noware deleted; future-dated entries survive.test_cleanup_expired_background_thread_preserves_isolation— runs cleanup through the spawned daemon thread (via_cleanup_expired()) and verifies isolation holds. Exercises the real threaded path rather than the synchronous call.TestRLSBackgroundEmbedding(2 tests)test_upsert_embedding_respects_owner_isolation— alice's embedding is not visible to bob viaget_entries_without_embeddings. Covers the full upsert call path including both theWHERE owner_id = %sSQL filter and RLS policies.test_upsert_embedding_from_worker_thread_preserves_isolation— submitsupsert_embeddingvia aThreadPoolExecutor, same pattern asserver._embedding_pool.TestRLSPoolGuarantees(5 tests)test_set_config_is_local_true_does_not_persist_across_transactions— direct Postgres check on a rawpsycopg.connect(no pool):set_config(..., true)is reverted at COMMIT. This is the test that isolates the transaction-local semantic frompsycopg_pool'sRESET ALLcheck-in reset.test_set_config_is_local_false_persists_across_transactions— direct Postgres check counterpart:set_config(..., false)does persist across transactions. Together with Add request timing and /health endpoint #7, these prove theis_localflag is what's producing the behavior, not some ambient reset.test_pool_checkout_does_not_see_prior_rls_context— after a store operation, a fresh pool checkout sees noapp.current_user. Verifies the combined pool+Postgres contract, not either layer alone.test_rls_context_cleared_after_exception_rollback— an exception inside a store-style transaction + pool check-in cleanup combine to leave no residue. Same combined-contract pattern as Update CHANGELOG through PR #8 #9.test_concurrent_owners_do_not_cross_contaminate— two threads on different owners; each lands writes correctly and cannot see the other's data. Verifies the full call path under real concurrency (the pool physically hands out distinct connections per thread, which makesapp.current_user-based cross-contamination impossible on its own; this test proves the broader call path is also clean).What changed vs. round-1 reviewed head
263250e2test_rls_context_does_not_persist_between_transactionswith two new direct-Postgres tests (Add request timing and /health endpoint #7 and Comprehensive README refresh #8 above). Those two actually fail when you fliptrue↔falsein the test — meta-verified in-session before pushing. The old test is now renamedtest_pool_checkout_does_not_see_prior_rls_contextand its docstring is honest about being a combined-contract check.WHERE owner_id = %s+_set_rls_context), pool tests split into two camps (direct-Postgres isolation vs. combined-contract).Runtime cost
11 tests, ~2.6 s against the shared testcontainers Postgres. Full pytest suite went from 989 (pre-R2) to 1000 passing on this branch; 7 skipped unchanged.
References
QA
Prerequisites
Docker (testcontainers spins up Postgres). Same as any other pytest run. The test file is self-contained; uses the shared
pg_dsnfixture fromconftest.py.Manual tests
python -m pytest tests/test_rls_background.py -v. Expected:11 passed in ~2.6s.truetofalseintest_set_config_is_local_true_does_not_persist_across_transactions. Re-run the test. Expected: FAIL withAssertionError: set_config(..., true) survived COMMIT: got 'alice'. Revert after verification. (Verified by Dev in-session one1708d7.)falsetotrueintest_set_config_is_local_false_persists_across_transactions. Expected: FAIL withAssertionError: assert '' == 'alice'. Revert. (Verified by Dev in-session one1708d7.)self._set_rls_context(...)insidecleanup_expired) does not cause tests Storage abstraction, soft delete, secure deployment, README reframe #1–Add awareness workflow guidance to CLAUDE.md #6 to fail — the redundantWHERE owner_id = %sin the SQL files and the pool role'sBYPASSRLSstill enforce isolation on their own. That's the point: the tests verify the aggregate contract, not any individual layer. To cause a failure you'd need to drop both the SQL filter and_set_rls_context.git diff --stat origin/mainshows exactlyCHANGELOG.md(+1, -0) andtests/test_rls_background.py(+517, -0); net: 2 files, +518, -0.Acceptance
ruff check,ruff format --check,mypyall cleantest_set_config_is_local_true/false_…) directly codify Postgres'sis_localsemantic independently ofpsycopg_pool— meta-verified to fail when theis_localflag is flipped272174644+cmeans-claude-dev[bot], author + committer); push attributed to bot (e1708d7, verified viagh api repos/.../activity)🤖 Generated with Claude Code