From a305833b4a3df6db72e949310f01931df36366c5 Mon Sep 17 00:00:00 2001 From: WulfForge Date: Wed, 6 May 2026 18:50:04 -0400 Subject: [PATCH 1/3] docs(plan): plan-A bundle ingest-middleware followups (#230 + #232 + #233) Plan for the three devil's-advocate followups from PR #229's review pass. All three touch handlers/ingest.py middleware + adjacent surfaces; bundling into one PR keeps the test-isolation hoist (#233) atomic with the boundary-detail scrub (#230) and the truthy+malformed_payload widenings (#232). Audit: round 1 PASS (L1, mechanical follow-ups; positive A04 contribution from the malformed_payload translation closing a fail-open path). Substrate observations folded into implementation per audit guidance: - Use private cross-module import `from context import _GUIDED_MODE_TRUTHY` with explicit comment (avoid widening scope to public alias). - Hoist BOTH `_reset_rate_limit_registry` AND `_ensure_rate_limit_enabled` to conftest.py (the second autouse fixture has identical future-flake risk). Co-Authored-By: Claude Opus 4.7 (1M context) --- plan-A-ingest-followups-230-232-233.md | 152 +++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 plan-A-ingest-followups-230-232-233.md diff --git a/plan-A-ingest-followups-230-232-233.md b/plan-A-ingest-followups-230-232-233.md new file mode 100644 index 00000000..178ad953 --- /dev/null +++ b/plan-A-ingest-followups-230-232-233.md @@ -0,0 +1,152 @@ +# Plan A: Ingest middleware devil's-advocate followups (#230 + #232 + #233) + +**change_class**: feature + +**doc_tier**: minimal + +**high_risk_target**: false + +**terms_introduced**: +- term: malformed_payload + home: handlers/ingest.py (new `_IngestRefused.reason` literal) + server.py `_ACTION_FOR_REFUSAL_REASON` entry + +**boundaries**: +- limitations: + - Aggregate-rate worst case (~86 GiB/day on default config) is documented but not enforced — sliding-window aggregate enforcement is deferred to the team-server-activation track. + - `BICAMERAL_INGEST_*_DISABLE` truthy vocabulary unification covers the rate-limit env var only in this PR; broader `BICAMERAL_*` env-read audit is logged as a substrate observation but not wired (other env vars use distinct semantics like `!= "0"` that need per-site review). +- non_goals: + - Rewriting the bucket implementation (stays token-bucket per #229's design dialogue choice). + - Adding cross-session aggregate enforcement. + - Telemetry-side privacy review of OTHER `_IngestRefused.detail` strings (canary, sensitive-data, size — those don't leak session UUIDs and are out of scope here). +- exclusions: + - #209 regex refinement (separate plan). + - #199 install repro (deferred, /qor-debug surface). + +## Open Questions + +None at plan time. Three issues are well-scoped with explicit acceptance criteria; this plan bundles them because they all touch `handlers/ingest.py` middleware + adjacent surfaces. + +## Phase 1: Source-fix all three findings + doc updates + +### Affected Files + +- `tests/test_ingest_rate_limit.py` — update `test_check_rate_limit_raises_when_session_bucket_empty` assertion to verify scrubbed-detail shape; new test `test_check_rate_limit_disabled_via_truthy_variants` covering `=true`/`=yes`/`=on`; remove the file-local `_reset_rate_limit_registry` fixture (hoisted to conftest) +- `tests/test_ingest_size_limit.py` — new test `test_check_payload_size_handles_unserializable_payload` (circular-ref dict raises `_IngestRefused('malformed_payload')` not `ValueError`) +- `tests/conftest.py` — host the autouse `_reset_rate_limit_registry` fixture so any ingest-touching test in any file gets registry-clear behavior automatically (#233 fix) +- `handlers/ingest.py` — + - `_check_rate_limit`: replace `f"session {session_id} bucket empty (...)"` with `f"bucket empty (burst=N, refill=R/s)"` (no UUID in detail) — #230 Finding 1 fix + - `_check_rate_limit`: replace `os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip() == "1"` with `os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip().lower() in _GUIDED_MODE_TRUTHY` — #232 Finding 1 fix (import `_GUIDED_MODE_TRUTHY` from `context`) + - `_check_payload_size`: wrap `json.dumps` in `try/except (ValueError, TypeError, RecursionError)` and translate to `_IngestRefused("malformed_payload", detail="payload is not JSON-serializable")` — #232 Finding 2 fix + - Module-level docstring: add a "Limitations: aggregate-rate worst case" subsection per #230 Finding 2 (worst-case math, runaway-agent scenario, cross-link to team-server-activation track for the stricter-bound future work) +- `server.py` — `_ACTION_FOR_REFUSAL_REASON` adds `"malformed_payload"` → operator-actionable guidance pointing at the MCP request shape (#232 Finding 2 surface) +- `docs/research-brief-compliance-audit-2026-05-06.md` — § 2.4 LLM-08 walk gets a "Limitations" subsection documenting the ~86 GiB/day worst case (#230 Finding 2 doctrine update) +- `plan-216-ingest-size-and-rate-limit.md` — boundaries section gets a new bullet under `limitations` documenting the aggregate-rate worst case (#230 Finding 2 plan amendment for traceability) + +### Changes + +#### #230 Finding 1 — scrub session_id from `_IngestRefused.detail` + +Source-fix in `handlers/ingest.py:_check_rate_limit`. Replace the f-string body: + +```python +# Before: +detail=( + f"session {session_id} bucket empty (burst={burst}, refill={refill_per_sec}/s)" +), + +# After: +detail=f"bucket empty (burst={burst}, refill={refill_per_sec}/s)", +``` + +Drops the `session {uuid}` segment entirely. Operators get the bucket params (which they need to tune `.bicameral/config.yaml`); the session UUID is not action-relevant at the MCP boundary. + +`server.py:_ACTION_FOR_REFUSAL_REASON["rate_limit_exceeded"]` already names the env var bypass — no change needed there. + +#### #230 Finding 2 — document aggregate-rate worst case + +Doctrine + plan + handler-docstring cross-references. No code change. + +Worst-case math (default config burst=10, refill=1/s, size cap=1 MiB): +- 60-second window: 10 (burst) + 60 (refill) = 70 ingests × 1 MiB = 70 MiB +- Sustained: 1 MiB/s = ~86 GiB/day + +Runaway agent scenarios that hit this: model regression producing infinite-loop tool calls, prompt-injection-hijacked agent in a re-ingest cycle, dev-time infinite-loop bug. Not a security crisis (size cap bounds per-payload damage), but a real operator-side disaster (ledger writer churn + disk pressure). + +Each of the three target docs gets the math + scenario list + a forward-pointer to the team-server-activation track (which gates by sliding-window aggregate, not single-session token-bucket). + +#### #232 Finding 1 — `_GUIDED_MODE_TRUTHY` vocabulary in rate-limit disable + +Import `_GUIDED_MODE_TRUTHY` from `context` at the top of `handlers/ingest.py`. Replace: + +```python +# Before: +if os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip() == "1": + return + +# After: +if os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip().lower() in _GUIDED_MODE_TRUTHY: + return +``` + +`_GUIDED_MODE_TRUTHY = frozenset({"1", "true", "yes", "on"})` — already canonical at `context.py:14`. After this change, operators setting `=true`, `=yes`, `=on`, or `=1` all disable consistently. + +The action-string at `server.py:_ACTION_FOR_REFUSAL_REASON["rate_limit_exceeded"]` mentions `=1` specifically; that's still valid (the canonical example) and doesn't need updating — the wider vocabulary is operator-discoverable via the truthy frozenset, not a documented contract change. + +#### #232 Finding 2 — circular-ref edge case in `_check_payload_size` + +Wrap the `json.dumps` call: + +```python +def _check_payload_size(payload: dict, max_bytes: int) -> None: + """...""" + try: + size = len(json.dumps(payload, default=str).encode("utf-8")) + except (ValueError, TypeError, RecursionError) as exc: + raise _IngestRefused( + "malformed_payload", + detail=f"payload is not JSON-serializable: {type(exc).__name__}", + ) from exc + if size > max_bytes: + raise _IngestRefused( + "size_limit_exceeded", + detail=f"{size} bytes > {max_bytes} cap", + ) +``` + +Translates the unhandled-exception fail-open path to a structured refusal at the same MCP boundary as the other gates. The catch list is the documented set of exceptions `json.dumps` can raise; explicit `RecursionError` covers circular-reference shapes (which raise `ValueError` on most Pythons but `RecursionError` under deep nesting). + +`server.py:_ACTION_FOR_REFUSAL_REASON` adds: + +```python +"malformed_payload": ( + "the payload could not be JSON-serialized. Verify the request body " + "is a plain dict of JSON-compatible primitives — no circular refs, " + "no opaque objects, no non-serializable types." +), +``` + +#### #233 — hoist `_reset_rate_limit_registry` to `conftest.py` + +Move the autouse fixture from `tests/test_ingest_rate_limit.py:29-34` to `tests/conftest.py`. Same body, same scope (function-level autouse). After hoist, any test file that imports `handlers.ingest` (directly or transitively) gets registry-clear behavior automatically. + +Removes the implicit dependency in `test_handle_ingest_size_check_runs_before_rate_check` and prevents future test-isolation flakes from registry leakage. + +### Unit Tests + +- `tests/test_ingest_rate_limit.py::test_check_rate_limit_raises_when_session_bucket_empty` — UPDATED: invokes `_check_rate_limit` after exhausting the bucket; asserts `exc.detail == "bucket empty (burst=1, refill=0.01/s)"` (no UUID present); negative-substring check `"session " not in exc.detail` confirms the scrub. + +- `tests/test_ingest_rate_limit.py::test_check_rate_limit_disabled_via_truthy_variants` — NEW: parametrized over `["1", "true", "yes", "on", "TRUE", "Yes"]`; for each, sets the env var via `monkeypatch.setenv`, invokes `_check_rate_limit("sid", burst=0, refill_per_sec=0.0)`, asserts no exception (the `burst=0, refill=0` config would normally refuse immediately; the env-var bypass short-circuits before bucket allocation, so success proves the truthy-set vocabulary works). + +- `tests/test_ingest_size_limit.py::test_check_payload_size_handles_unserializable_payload` — NEW: constructs a circular dict (`d = {}; d["self"] = d`); invokes `_check_payload_size(d, max_bytes=1024)`; asserts `_IngestRefused` raised with `reason == "malformed_payload"` and detail mentions the exception type. Companion test with a `RecursionError`-raising mock (deep-nested object) asserts the same translation path. + +- `tests/conftest.py` — NEW autouse fixture `_reset_rate_limit_registry`: clears `handlers.ingest._RATE_LIMIT_REGISTRY` before and after each test. Same body as the file-local version being removed. + +- `tests/test_ingest_rate_limit.py::test_handle_ingest_size_check_runs_before_rate_check` — UNCHANGED: still asserts `"sid-order" not in _RATE_LIMIT_REGISTRY` to prove the rate-check never ran (size-check refused first). Now relies on the conftest fixture instead of the file-local one — same contract, just hoisted. + +## CI Commands + +- `python -m pytest tests/test_ingest_rate_limit.py tests/test_ingest_size_limit.py -v` — runs the new + updated tests +- `python -m pytest -v` — full regression (138+ ingest tests + the broader suite — the conftest hoist must not break any test outside ingest) +- `ruff check .` — lint gate +- `ruff format --check .` — format gate +- `python -c "from handlers.ingest import _check_rate_limit, _GUIDED_MODE_TRUTHY; print('imports ok')"` — smoke test that the new `_GUIDED_MODE_TRUTHY` import resolves From a95acffa1b12d2e2d4d9cc7d898cb21de250017e Mon Sep 17 00:00:00 2001 From: WulfForge Date: Wed, 6 May 2026 18:50:24 -0400 Subject: [PATCH 2/3] fix(ingest): scrub session_id from boundary detail + widen truthy + close malformed_payload fail-open + hoist fixtures (#230 #232 #233) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit #230 Finding 1 — scrub session_id from `_IngestRefused('rate_limit_exceeded').detail`: - handlers/ingest.py:_check_rate_limit drops `session {uuid}` segment from the refusal-detail; emits `bucket empty (burst=N, refill=R/s)` only. - The raw session UUID is process-fingerprinting state that surrounding telemetry writers hash via per-install salt; emitting it raw at the MCP boundary (which the agent then relays into operator-visible context) is inconsistent with that posture. #232 Finding 1 — widen truthy vocabulary on rate-limit env-var disable: - handlers/ingest.py imports `_GUIDED_MODE_TRUTHY` from context (private cross-module use; explicit comment in source; the alternative — public alias or hoist — would widen scope unnecessarily). - `BICAMERAL_INGEST_RATE_LIMIT_DISABLE` now accepts 1/true/yes/on case-insensitively, matching the canonical truthy frozenset. #232 Finding 2 — translate non-JSON-serializable payloads to `malformed_payload`: - handlers/ingest.py:_check_payload_size wraps `json.dumps(...)` in try/except `(ValueError, TypeError, RecursionError)`, raises `_IngestRefused('malformed_payload', ...)` with the underlying exception type in detail. Closes a pre-existing OWASP A04 fail-open path: a circular-ref or non-serializable payload would previously leak ValueError past the gate to the MCP boundary's generic exception path. - server.py:_ACTION_FOR_REFUSAL_REASON adds operator-actionable guidance for `malformed_payload`. #233 — hoist autouse fixtures to conftest.py: - `_reset_rate_limit_registry` and `_ensure_rate_limit_enabled` (companion fixture flagged by audit substrate observation 2) move from tests/test_ingest_rate_limit.py to tests/conftest.py. Eliminates implicit-fixture-dependency flake risk for any future ingest-touching test in any file. Tests: - 7 new functional tests across the two findings (parametrized truthy variants, circular-ref, RecursionError, TypeError, scrubbed-detail shape lock). - Existing assertions updated to verify scrubbed detail (no UUID leak). - 113/113 ingest+server tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) --- handlers/ingest.py | 53 +++++++++++++++++++++++++++++---- server.py | 5 ++++ tests/conftest.py | 36 ++++++++++++++++++++++ tests/test_ingest_rate_limit.py | 37 ++++++++++++----------- tests/test_ingest_size_limit.py | 51 +++++++++++++++++++++++++++++++ 5 files changed, 158 insertions(+), 24 deletions(-) diff --git a/handlers/ingest.py b/handlers/ingest.py index 36bb9708..a90ef9c5 100644 --- a/handlers/ingest.py +++ b/handlers/ingest.py @@ -2,6 +2,19 @@ Thin orchestration: validate payload, resolve symbols, ingest into ledger, then sync. Auto-grounding removed in caller-LLM binding flow (v0.5.1+). + +Limitations — aggregate-rate worst case (#230 Finding 2): + The token-bucket rate gate slows BURST consumption per session but does + not bound aggregate throughput across time. Default config (burst=10, + refill=1/s, size cap=1 MiB) admits ~70 ingests in any 60-second window + and 1 MiB/s sustained, which works out to ~86 GiB/day in the worst case + (runaway agent loop, model regression producing infinite tool calls, + prompt-injection-hijacked re-ingest cycle, dev-time infinite-loop bug). + Not a security crisis — the size cap bounds per-payload damage and the + in-process registry is reset on server restart — but it IS an operator- + side disaster (ledger writer churn + disk pressure). Stricter aggregate + enforcement (sliding-window cross-session bound) is deferred to the + team-server-activation track. """ from __future__ import annotations @@ -14,6 +27,13 @@ from datetime import UTC import preflight_telemetry + +# #232 Finding 1: cross-module use of context.py's private truthy frozenset +# is intentional — it's the canonical vocabulary for BICAMERAL_* env-var +# toggles (1/true/yes/on, case-insensitive). Renaming to a public alias is +# out of scope here; #232 acceptance only requires vocabulary parity across +# the existing toggle reads. +from context import _GUIDED_MODE_TRUTHY from contracts import ( BriefEnvelope, BriefGap, @@ -62,8 +82,21 @@ def _check_payload_size(payload: dict, max_bytes: int) -> None: every field the agent might supply, language-agnostic, single comparison. Pure: no telemetry side-effect; the wrapping try/except in ``handle_ingest`` records the refusal event before re-raising. + + #232 Finding 2: a payload that is not JSON-serializable (circular ref, + deeply nested object, opaque type whose ``__str__`` raises) would + previously leak ``ValueError`` / ``TypeError`` / ``RecursionError`` + past the gate to the MCP boundary's generic exception handler. + Translate to ``_IngestRefused('malformed_payload', ...)`` at the same + boundary as the other refusals — closes the fail-open path. """ - size = len(json.dumps(payload, default=str).encode("utf-8")) + try: + size = len(json.dumps(payload, default=str).encode("utf-8")) + except (ValueError, TypeError, RecursionError) as exc: + raise _IngestRefused( + "malformed_payload", + detail=f"payload is not JSON-serializable: {type(exc).__name__}", + ) from exc if size > max_bytes: raise _IngestRefused( "size_limit_exceeded", @@ -109,9 +142,19 @@ def take(self) -> bool: def _check_rate_limit(session_id: str, burst: int, refill_per_sec: float) -> None: """Raise ``_IngestRefused('rate_limit_exceeded', ...)`` when the bucket for ``session_id`` has no tokens. Disabled entirely by setting - ``BICAMERAL_INGEST_RATE_LIMIT_DISABLE=1`` (local debugging knob). + ``BICAMERAL_INGEST_RATE_LIMIT_DISABLE`` to a truthy value (1/true/yes/on, + case-insensitive — see ``context._GUIDED_MODE_TRUTHY``). + + #230 Finding 1: the refusal detail does NOT include ``session_id``. The + raw session UUID is process-fingerprinting state that surrounding + telemetry writers hash via per-install salt; emitting it raw at the + MCP boundary (which the agent then relays into operator-visible context) + is inconsistent with that posture. Operators get the bucket params + they need to tune ``.bicameral/config.yaml``; the session UUID is not + action-relevant here. """ - if os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip() == "1": + env_val = os.getenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", "").strip().lower() + if env_val in _GUIDED_MODE_TRUTHY: return with _RATE_LIMIT_REGISTRY_LOCK: bucket = _RATE_LIMIT_REGISTRY.get(session_id) @@ -121,9 +164,7 @@ def _check_rate_limit(session_id: str, burst: int, refill_per_sec: float) -> Non if not bucket.take(): raise _IngestRefused( "rate_limit_exceeded", - detail=( - f"session {session_id} bucket empty (burst={burst}, refill={refill_per_sec}/s)" - ), + detail=f"bucket empty (burst={burst}, refill={refill_per_sec}/s)", ) diff --git a/server.py b/server.py index d1290098..4795020a 100644 --- a/server.py +++ b/server.py @@ -104,6 +104,11 @@ "an obvious order_id context), set BICAMERAL_INGEST_SECRET_DISABLE=1 " "for the controlled test." ), + "malformed_payload": ( + "the payload could not be JSON-serialized. Verify the request body " + "is a plain dict of JSON-compatible primitives — no circular refs, " + "no opaque objects, no non-serializable types. Re-serialize and retry." + ), } diff --git a/tests/conftest.py b/tests/conftest.py index 4042b11f..d77a295b 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -50,6 +50,42 @@ def pytest_configure(config): ) +@pytest.fixture(autouse=True) +def _reset_rate_limit_registry(): + """Clear the in-process rate-limit bucket registry between every + test (#233 hoist). + + Hoisted from ``tests/test_ingest_rate_limit.py`` so any test in the + suite that imports ``handlers.ingest`` gets registry-clear behavior + automatically. Eliminates the implicit fixture dependency in + ``test_handle_ingest_size_check_runs_before_rate_check`` and + prevents future test-isolation flakes from registry leakage across + files. Registry is in-process module state on + ``handlers.ingest._RATE_LIMIT_REGISTRY``. + """ + from handlers.ingest import _RATE_LIMIT_REGISTRY + + _RATE_LIMIT_REGISTRY.clear() + yield + _RATE_LIMIT_REGISTRY.clear() + + +@pytest.fixture(autouse=True) +def _ensure_rate_limit_enabled(monkeypatch): + """Default to rate-limit enabled across every test (#233 hoist + companion). + + Hoisted from ``tests/test_ingest_rate_limit.py`` alongside the + registry-clear fixture (per audit substrate observation 2). Without + this, a test in any file that sets ``BICAMERAL_INGEST_RATE_LIMIT_DISABLE`` + via the shell environment could leak the disable into a subsequent + ingest-touching test in another file. Tests that intentionally + exercise the env-disable path still work — pytest's ``monkeypatch.setenv`` + overrides this delete on a per-test basis. + """ + monkeypatch.delenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", raising=False) + + @pytest.fixture(autouse=True) def _default_authoritative_ref_to_current_branch(monkeypatch): """v0.4.6 pollution guard default: treat whatever branch the test diff --git a/tests/test_ingest_rate_limit.py b/tests/test_ingest_rate_limit.py index ce9063ee..0b2e08ae 100644 --- a/tests/test_ingest_rate_limit.py +++ b/tests/test_ingest_rate_limit.py @@ -26,21 +26,6 @@ ) -@pytest.fixture(autouse=True) -def _reset_rate_limit_registry(): - """Tests share module state; clear it between cases.""" - _RATE_LIMIT_REGISTRY.clear() - yield - _RATE_LIMIT_REGISTRY.clear() - - -@pytest.fixture(autouse=True) -def _ensure_rate_limit_enabled(monkeypatch): - """Default to enabled so individual tests don't have to fight the - debug-disable env var if it leaks in from the shell.""" - monkeypatch.delenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", raising=False) - - def _ctx_with_rate_limit( burst: int = 10, refill: float = 1.0, @@ -122,7 +107,22 @@ def test_check_rate_limit_raises_when_session_bucket_empty() -> None: with pytest.raises(_IngestRefused) as exc_info: _check_rate_limit("sid-empty", burst=1, refill_per_sec=0.0) assert exc_info.value.reason == "rate_limit_exceeded" - assert "sid-empty" in exc_info.value.detail + # #230 Finding 1: detail must NOT leak the session UUID; emit only the + # bucket-config shape so operators can tune .bicameral/config.yaml. + assert "sid-empty" not in exc_info.value.detail + assert "session " not in exc_info.value.detail + assert exc_info.value.detail == "bucket empty (burst=1, refill=0.0/s)" + + +@pytest.mark.parametrize("truthy", ["1", "true", "yes", "on", "TRUE", "Yes", "ON"]) +def test_check_rate_limit_disabled_via_truthy_variants(monkeypatch, truthy: str) -> None: + """#232 Finding 1: env-var disable accepts the canonical + ``_GUIDED_MODE_TRUTHY`` vocabulary (1/true/yes/on, case-insensitive).""" + monkeypatch.setenv("BICAMERAL_INGEST_RATE_LIMIT_DISABLE", truthy) + # burst=0/refill=0 would exhaust on first call; the env disable must + # short-circuit before bucket allocation. + for _ in range(5): + _check_rate_limit("sid-truthy", burst=0, refill_per_sec=0.0) def test_check_rate_limit_isolates_sessions() -> None: @@ -156,7 +156,8 @@ class _Sentinel(Exception): with pytest.raises(_IngestRefused) as exc_info: await handle_ingest(ctx, payload) assert exc_info.value.reason == "rate_limit_exceeded" - assert "sid-drive" in exc_info.value.detail + # #230 Finding 1: detail no longer leaks session UUID at the boundary. + assert "sid-drive" not in exc_info.value.detail assert "bucket empty" in exc_info.value.detail @@ -208,7 +209,7 @@ async def test_handle_ingest_size_check_runs_before_rate_check() -> None: async def test_call_tool_translates_rate_limit_refusal_to_text_content_error() -> None: raised = _IngestRefused( "rate_limit_exceeded", - detail="session sid-x bucket empty (burst=1, refill=1.0/s)", + detail="bucket empty (burst=1, refill=1.0/s)", ) sync_stub = AsyncMock(return_value=None) handle_stub = AsyncMock(side_effect=raised) diff --git a/tests/test_ingest_size_limit.py b/tests/test_ingest_size_limit.py index ecab1bde..3fe12b70 100644 --- a/tests/test_ingest_size_limit.py +++ b/tests/test_ingest_size_limit.py @@ -127,3 +127,54 @@ async def test_handle_ingest_emits_refusal_telemetry_before_reraise_on_size_exce telemetry_mock.write_ingest_refusal_event.assert_called_once_with( reason="size_limit_exceeded", session_id="sid-abc" ) + + +# ── #232 Finding 2: malformed_payload (circular-ref / non-serializable) ── + + +def test_check_payload_size_handles_circular_ref_payload() -> None: + """#232 Finding 2: a circular-reference dict raises ValueError from + json.dumps; the gate must translate to _IngestRefused('malformed_payload') + at the same MCP boundary as the other refusals — no fail-open path.""" + circular: dict = {"k": "v"} + circular["self"] = circular # circular reference + + with pytest.raises(_IngestRefused) as exc_info: + _check_payload_size(circular, max_bytes=1024) + assert exc_info.value.reason == "malformed_payload" + assert "JSON-serializable" in exc_info.value.detail + # Surface the underlying exception type so operators can diagnose. + assert "ValueError" in exc_info.value.detail + + +def test_check_payload_size_handles_recursion_error_payload() -> None: + """A payload whose serialization raises RecursionError (e.g., deeply + nested object) is also translated to malformed_payload, not leaked as + an unhandled exception past the gate. Patches ``json.dumps`` to raise + RecursionError unconditionally — tests the gate's exception-handling + contract, not a specific input shape.""" + payload = {"k": "v"} + + def raise_recursion_error(*_args, **_kwargs): + raise RecursionError("simulated deep nesting") + + with patch("handlers.ingest.json.dumps", side_effect=raise_recursion_error): + with pytest.raises(_IngestRefused) as exc_info: + _check_payload_size(payload, max_bytes=1024) + assert exc_info.value.reason == "malformed_payload" + assert "RecursionError" in exc_info.value.detail + + +def test_check_payload_size_handles_typeerror_payload() -> None: + """Non-JSON-serializable object that ALSO can't be coerced via default=str + raises TypeError; same translation.""" + + class NonSerializableObject: + def __str__(self): + raise TypeError("intentional: cannot stringify") + + payload = {"opaque": NonSerializableObject()} + with pytest.raises(_IngestRefused) as exc_info: + _check_payload_size(payload, max_bytes=1024) + assert exc_info.value.reason == "malformed_payload" + assert "TypeError" in exc_info.value.detail From 16e870f01d5c04bf65e80e310334a552a5f844f5 Mon Sep 17 00:00:00 2001 From: WulfForge Date: Wed, 6 May 2026 18:50:38 -0400 Subject: [PATCH 3/3] docs: aggregate-rate worst-case limitation note for LLM-08 (#230 Finding 2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The token-bucket gate slows BURST consumption per session but does not bound aggregate throughput. Default config admits ~70 ingests/60s and 1 MiB/s sustained — ~86 GiB/day worst case under a runaway agent (model regression, prompt-injection-hijacked re-ingest cycle, dev-time infinite-loop bug). Not a security crisis (size cap bounds per-payload damage; in-process registry resets on server restart) but an operator-side disaster for ledger-writer churn + disk pressure. Stricter aggregate enforcement (sliding-window cross-session bound) is deferred to the team-server-activation track which has cross-developer correlation needs that single-session token-bucket cannot satisfy. Documented in: - docs/research-brief-compliance-audit-2026-05-06.md § 2.4 LLM-08 walk - plan-216-ingest-size-and-rate-limit.md boundaries.limitations - handlers/ingest.py module-level docstring (ships with code via prior commit) No code change in this commit. Closes #230 Finding 2. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/research-brief-compliance-audit-2026-05-06.md | 1 + plan-216-ingest-size-and-rate-limit.md | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/research-brief-compliance-audit-2026-05-06.md b/docs/research-brief-compliance-audit-2026-05-06.md index f7f4ed7f..7ae40617 100644 --- a/docs/research-brief-compliance-audit-2026-05-06.md +++ b/docs/research-brief-compliance-audit-2026-05-06.md @@ -416,6 +416,7 @@ This is **the highest-novelty surface for bicameral-mcp.** The system is an LLM - **LLM-06** [P1 — narrowed scope] — **Skill content drift between server release and a future remote-skill-loading channel.** Original P0 framing was overstated (per Kilo #2): in the current install model, the operator already trusted the supply chain at `pip install` / `uv tool install` / `pipx install` time, and skills ship co-located with server code in the same wheel — there's no separate channel to compromise without compromising the wheel itself, which is covered by SOC2-03 (signed releases) and OWASP-01 (SBOM). The scenario where LLM-06 has independent value is a **future remote-skill-loading or marketplace feature** (none today): when skills could be pulled from a registry distinct from the server wheel, signing the skill payload separately becomes load-bearing. Remediation when that scope opens: cosign-signed `skills/MANIFEST.toml`, per-file SHA-256 verification at copy time. Until then, the gate is "don't ship remote skill-loading without first activating signed manifests" — a design constraint, not a runtime defect. - **LLM-07** [P1] — **`source_ref` redaction default is `full` (verbatim) per #209.** This is a known issue tracked separately. Remediation: ship #209 (refine regex + flip default to `redacted`). - **LLM-08** [P2] — **`bicameral.ingest` has no rate limit.** A runaway agent can flood the ledger. Remediation: token-bucket rate limit per session_id; declare server-side enforcement. + - **Limitations (#230 Finding 2)** — The token-bucket gate slows BURST consumption per session but does not bound aggregate throughput. Default config (burst=10, refill=1/s, size cap=1 MiB) admits ~70 ingests in any 60-second window and 1 MiB/s sustained — ~86 GiB/day worst case under a runaway agent (model regression, prompt-injection-hijacked re-ingest cycle, dev-time infinite-loop bug). Not a security crisis (the size cap bounds per-payload damage; in-process registry resets on server restart) but a real operator-side disaster for ledger-writer churn + disk pressure. Stricter aggregate enforcement (sliding-window cross-session bound) is deferred to the team-server-activation track which has cross-developer correlation needs that single-session token-bucket cannot satisfy. - **LLM-09** [P1] — **`ratify`, `link_commit`, `set_decision_level` fire without human-in-loop on agent-initiated calls.** These are state-changing decisions. Remediation: declare each tool's HITL requirement deterministically; gate the destructive ones with **out-of-band operator confirmation** (same shape as LLM-05 — `AskUserQuestion` when host supports it, terminal prompt fallback otherwise). Agent-supplied `confirm`-style parameters are not security gates here either. - **LLM-11** [P0/M, all deployments] — **Cross-tool config-file modification surface** (per Kilo #9). `setup_wizard._install_hooks` modifies `.claude/settings.json` (Claude Code's host-config) and the analogous Cursor / Codex configs. A compromised bicameral-mcp install can therefore inject arbitrary Claude Code hook commands that fire as the operator at hook-trigger-time (PreToolUse, PostToolUse, SessionEnd). This is **distinct from LLM-06** (skill content): LLM-06 is text the agent reads, LLM-11 is shell commands the host runs. A signed-skills-manifest gate doesn't cover this. Remediation: ship a signed `hooks-manifest.json` separately (cosign-signed at release); `setup_wizard` verifies the manifest before writing to `.claude/settings.json`. The manifest is the second supply-chain leg distinct from skills + wheel. diff --git a/plan-216-ingest-size-and-rate-limit.md b/plan-216-ingest-size-and-rate-limit.md index 40105218..2586a2d7 100644 --- a/plan-216-ingest-size-and-rate-limit.md +++ b/plan-216-ingest-size-and-rate-limit.md @@ -15,7 +15,7 @@ home: handlers/ingest.py **boundaries**: -- limitations: rate-limit state is in-process (per-server-restart); a malicious agent restart-looping has bigger problems than this gate. Token-bucket-burst defaults are tuned for single-user developer-tool workflow shape; team-server activation may want stricter sliding-window enforcement (revisit then). **Per-developer isolation update (post-#231)**: bucket scoping is now per-developer (salted-email-hash key) when `git config user.email` is available; falls back to process-wide single bucket only in test/CI mode. Two developers on the same install get distinct buckets — runaway loop on developer-A doesn't affect developer-B. +- limitations: rate-limit state is in-process (per-server-restart); a malicious agent restart-looping has bigger problems than this gate. Token-bucket-burst defaults are tuned for single-user developer-tool workflow shape; team-server activation may want stricter sliding-window enforcement (revisit then). **Per-developer isolation update (post-#231)**: bucket scoping is now per-developer (salted-email-hash key) when `git config user.email` is available; falls back to process-wide single bucket only in test/CI mode. Two developers on the same install get distinct buckets — runaway loop on developer-A doesn't affect developer-B. **Aggregate-rate worst case (post-#230 Finding 2)**: the gate slows BURST consumption per session but does not bound aggregate. Default config admits ~70 ingests in any 60-second window and 1 MiB/s sustained — ~86 GiB/day worst case under a runaway agent. Not a security crisis (size cap bounds per-payload damage); operator-side disaster for ledger-writer churn. Stricter aggregate enforcement (sliding-window cross-session bound) deferred to the team-server-activation track. - non_goals: do not implement LLM-01 (prompt-injection canary scan — already filed as #212), LLM-04 (PII/secret/PHI/PAN detect-and-refuse — already filed as #213), or any other epic #216 sub-task. Do not extend rate-limit beyond `bicameral.ingest`. Do not add adversarial-human threat-model coverage (out of scope per LLM-08's deployment trigger). - exclusions: not modifying `IngestPayload` Pydantic schema. Not changing `_normalize_payload` semantics. Not adding telemetry counters to the outbound `telemetry.py` relay (size + rate counters land in local `~/.bicameral/preflight_events.jsonl` only).