feat(preflight): dedup decision telemetry — #87 Phase 5 by silongtan · Pull Request #310 · BicameralAI/bicameral-mcp

silongtan · 2026-05-13T01:08:04Z

Summary

Adds the production-attribution counter Kevin asked for at #87 signoff: "so we can tell the new key is doing useful work in production".

Stacked on #309 (Phase 4) which is stacked on #308 (v18 precondition). Once #308 → #309 merge to dev, retarget this PR's base to dev.

Two outcomes emitted

Both go to ~/.bicameral/preflight_events.jsonl with event_type=preflight_dedup_decision:

`reason`	When	Why it matters
`invalidated_by_revision_bump`	A same-(topic, file_paths) call missed the cache because `ledger_revision` advanced	M7a/M7c signal — proves Phase 4's new key shape is invalidating correctly on real ledger mutations and HITL state changes
`bypassed_revision_unknown`	Revision lookup returned None → handler short-circuited per Kevin's amendment	Safety counter — sustained spike = transient SurrealDB faults / schema mismatch / connection issues; actionable ops signal

What's intentionally NOT instrumented

Cache hits (recently_checked) — already in write_preflight_event rows
First-call misses — baseline noise
Topic-changed misses — different topic = different intent
File-paths-shift misses (M7b) — derivable from existing file_paths_hash on preflight events if a follow-up wants the denominator

Phase 5's scope is the change-detection signal, not the hit/miss baseline.

What changed

`preflight_telemetry.py`

write_dedup_event(reason, session_id, preflight_id=None) — new. Matches the existing per-event pattern (write_fallback_event, write_bypass_event). No-op when telemetry disabled. preflight_id omitted when None for cleaner schema.

`handlers/preflight.py`

Import write_dedup_event
_dedup_miss_was_revision_bump() — new classifier. Returns True iff ctx._sync_state holds a same-(topic, file_paths) prefix with a different revision suffix within TTL. Skips identical keys (would have been a HIT) and expired entries (TTL boundary).
handle_preflight() emits at two decision points:
- BYPASS branch (ledger_revision=None) → bypassed_revision_unknown
- MISS classified as revision bump → invalidated_by_revision_bump. Classification happens AFTER _check_dedup writes the current key; the classifier's skip-identical-keys rule prevents false positives.

`tests/test_preflight_dedup_telemetry.py` (new, 15 tests)

7 classification tests: first call, revision differs only (M7a/c), file_paths differ (M7b — must NOT classify as revision bump), topic differs, TTL expiry, identical keys, missing _sync_state
5 end-to-end emission tests: bypass emits one bypassed_revision_unknown; two-call revision-bump scenario emits one invalidated_by_revision_bump; first call alone emits nothing; cache hit emits nothing; file_paths shift does NOT emit revision_bump
3 write_dedup_event direct contract tests: no-op when disabled, full-record shape when enabled, preflight_id omitted when None

Test results

Suite	Result
`tests/test_preflight_dedup_telemetry.py` (new)	15/15 pass
Phase 4+5 + schema + dedup cluster regression	143/143 pass

Dashboard hookup

Out of scope for this PR — that's a separate operations task. The emitted JSON shape:

{
  "ts": "2026-05-12T...",
  "event_type": "preflight_dedup_decision",
  "reason": "invalidated_by_revision_bump",
  "session_id": "session-xyz",
  "preflight_id": "pf-abc"
}

Aggregate by (reason, day) for the time-series dashboard panel. Pair with the recently_checked events to compute the invalidation ratio = invalidated_by_revision_bump / (invalidated_by_revision_bump + recently_checked). A healthy ratio means the new key shape is catching real mutations; near-zero means either no mutations are happening or the new key is too coarse.

Out of scope

Dashboard panel wiring (operations PR)
File-paths-shift attribution counter (M7b) — not in Kevin's signoff
Per-key TTL tuning — defer until telemetry shows it matters
Cross-session dedup cache — defer

Test plan

CI green
Manual: enable BICAMERAL_TELEMETRY=preflight, force a revision bump (ratify a decision mid-session), trigger second same-topic preflight, confirm one invalidated_by_revision_bump row in ~/.bicameral/preflight_events.jsonl
Manual: temporarily break the decision schema (rename updated_at), confirm a bypassed_revision_unknown row appears

🤖 Generated with Claude Code

coderabbitai · 2026-05-13T01:08:21Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 941a97a7-2278-42d7-8079-4f8f0a70688a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch 87-preflight-dedup-telemetry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Adds the production-attribution counter Kevin asked for at #87 signoff: *"so we can tell the new key is doing useful work in production"*. Two outcomes are emitted to ~/.bicameral/preflight_events.jsonl with event_type=preflight_dedup_decision: reason=invalidated_by_revision_bump A same-(topic, file_paths) call missed the cache because ledger_revision advanced. This is the M7a/M7c signal — proves Phase 4's new key shape is invalidating correctly on real ledger mutations and HITL state changes. reason=bypassed_revision_unknown Revision lookup returned None and the handler short-circuited the dedup check per Kevin's amendment. A sustained spike here means transient SurrealDB faults / schema mismatch / connection issues — actionable ops signal. Cache hits, first-call misses, topic-changed misses, and file_paths- shift misses are intentionally NOT instrumented. Phase 5's scope is the change-detection signal; hit/miss baselines are derivable from existing write_preflight_event rows (reason=recently_checked) if a follow-up wants the denominator. ## What changed preflight_telemetry.py - write_dedup_event(reason, session_id, preflight_id=None) — new. Matches the existing per-event API pattern (write_fallback_event, write_bypass_event). No-op when telemetry disabled. preflight_id is omitted from the record when None for a cleaner schema. handlers/preflight.py - Import write_dedup_event. - _dedup_miss_was_revision_bump() — new classifier. Returns True iff ctx._sync_state holds a same-(topic, file_paths) prefix with a different revision suffix within TTL. Skips identical keys (would have been a HIT, not a miss) and expired entries (TTL boundary). - handle_preflight() — emits at the two decision points: * BYPASS branch (ledger_revision=None) → bypassed_revision_unknown * MISS-then-classified-as-revision-bump → invalidated_by_revision_bump Classification happens AFTER _check_dedup writes the current key; the classifier's skip-identical-keys rule prevents false positives. tests/test_preflight_dedup_telemetry.py (NEW, 15 tests) - 7 pin _dedup_miss_was_revision_bump classification: first call, revision differs only (M7a/c), file_paths differ (M7b — must NOT classify as revision bump), topic differs, TTL expiry, identical keys, missing _sync_state. - 5 pin end-to-end telemetry emission: bypass emits one bypassed_revision_unknown; two-call revision-bump scenario emits one invalidated_by_revision_bump; first call alone emits nothing; cache hit emits nothing; file_paths shift does NOT emit revision_bump. - 3 pin write_dedup_event direct contract: no-op when telemetry disabled, full-record shape when enabled, preflight_id omitted when None. ## Test results - 15/15 new Phase 5 tests pass - 143/143 in the broader Phase 4+5 + schema + dedup cluster ## Out of scope - Dashboard wiring (separate operations PR) - File-paths-shift attribution (M7b) — derivable from existing preflight_events.jsonl rows; not in Kevin's signoff - Per-key TTL tuning — defer until telemetry shows it matters Refs #87 Phase 5. Stacked on #309 (Phase 4) which is stacked on #308 (v18 precondition). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final sibling of the format-fix series. CI's `ruff format --check` flagged Phase 5's additions: - preflight_telemetry.py — write_dedup_event() helper - tests/test_preflight_dedup_telemetry.py — 15 unit tests handlers/preflight.py was already format-clean. Pure whitespace normalization, no semantic change. 57/57 tests across the full Phase 4+5 + schema + dedup cluster still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mp (#310) Sibling cleanup to 1d752cc — `stored_key[len(prefix):]` needs a space before the slice colon per ruff's E203-style preference. Caught locally while rebasing; the earlier format pass on this file appears to have left this line behind. Pure whitespace, no semantic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…310 CI) Pre-emptive fix for the same ruff I001 trap that caught #309. Two occurrences in tests/test_preflight_dedup_telemetry.py: 1. Top-of-file imports — trailing blank line between the `from handlers.preflight import _dedup_miss_was_revision_bump` line and the comment divider. 2. Function-local imports inside test_classifier_ignores_entries_outside_ttl: `import handlers.preflight as pf` came before `import time` — ruff wants the stdlib `time` first. Auto-fixed by `ruff check --fix`. 15/15 telemetry tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same situation as #309 — PR #310 was opened with base ``87-preflight-dedup-key`` before that branch landed on dev. The pull_request-event workflows (Lint & Type Check, MCP Regression Suite, etc.) gate on ``branches: [main, dev]`` and only evaluate their trigger at PR-open time, so they never queued for this branch even after the base auto-resolved to dev. Empty commit pushes a fresh synchronize event so the full CI suite evaluates with the current (correct) base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… time Adds .pre-commit-config.yaml mirroring the CI lint gate, plus the dependency, docs section, and CI hint that wire it into the contributor workflow. Motivation: the dev branch accumulated six post-push `style:` cleanup commits between #279 and #310 (eb32e80, ee24395, 0cf574b, 1d752cc, 1690a30, cacfb62) — every one of them appeasing CI after a feature commit shipped unformatted. With pre-commit installed, those would have been caught at commit time and never landed as separate fixup commits. The proof landed already on PR #359 and #360: I had to push the exact same shape of style-fixup commit (5769663 on #359, 12ee765 on #360) during this very tracking-issue arc. Changes: - .pre-commit-config.yaml (NEW): ruff-check --fix + ruff-format hooks, pinned to v0.15.12 (matches the floor in pyproject.toml). The version pin matters — running v0.5.0 (the floor) and v0.15.12 (the resolved latest) on the same file produces different format outputs, which defeats the purpose of having the hook align with CI. - pyproject.toml: pre-commit added to [project.optional-dependencies] test so `pip install -e .[test]` pulls it in. - docs/DEV_CYCLE.md: new §4.5.5 "Local enforcement — pre-commit (mandatory for committers)" with install + run-on-all-files instructions and the failure-history that motivates the requirement. - .github/workflows/lint-and-typecheck.yml: adds an `if: failure()` step that emits a clear "install pre-commit" hint to ::error:: and $GITHUB_STEP_SUMMARY when ruff-check or ruff-format steps fail. The hint exists to break the "push → red CI → push style: fixup" loop. Verification: - Positive: `pre-commit run --all-files` reports 371 unchanged, 0 reformatted, 0 lint errors on the current dev tip - Negative: introduced a file with `import json,os` + `def foo( x,y ):` + `f"hello"` — `git commit` was aborted with both hooks failing, ruff-check auto-fixed the import + arg spacing, ruff-format normalized the file. No commit landed. - Pre-commit's ruff binary at v0.15.12 produces identical output to `ruff format --check .` standalone — no local/CI divergence. Out of scope for this branch: - end-of-file fixers, trailing-whitespace fixers, etc. Keep MVP at the ruff scope; expand opportunistically. - Sweeping the existing 6 style: commits into their original feature commits — those have already landed via dev merge; can't be unwound. Sub-task 3's win is preventing the next instance, not rewriting history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…was silently bypassed (BicameralAI#87 followup) Post-merge eval caught a production bug: the get_ledger_revision query shipped with the Phase 4 PR used `math::max(coalesce(updated_at, created_at))` — but SurrealDB v2 has no `coalesce` built-in. The query parse-errors on every call, the function catches the exception and returns None, and the handler hits the BYPASS branch per Kevin's amendment. Net effect: Phase 4's dedup-key broadening was silently disabled in production — every preflight call re-evaluated fully and the M7a/b/c invalidation that BicameralAI#87 promised never fired. The bug was invisible to the test suite because the eval harness monkeypatches `get_ledger_revision`, so unit tests against `_dedup_key_for` / `_check_dedup` passed against a stub. Caught by running a real benchmark against memory:// SurrealDB. ## Fix Switch the query from `math::max(coalesce(...))` to `SELECT updated_at AS rev FROM decision ORDER BY updated_at DESC LIMIT 1`. The `coalesce` ↔ created_at fallback is unnecessary now — the v17→v18 migration backfills updated_at for every pre-v18 row, and new rows get DEFAULT time::now(); the only NONE values are corrupt- legacy rows whose backfill failed, and MAX/ORDER BY skips those naturally. Empty-table case returns "" sentinel (preserves cache hits); query failure still returns None for the bypass branch. ## Sociable test additions (would have caught the bug) Four new tests in tests/test_v18_decision_updated_at.py exercise `get_ledger_revision` against a real memory:// SurrealDB rather than through the eval-harness mock: - test_get_ledger_revision_returns_iso_string_against_real_ledger Pins that a non-empty ledger returns a non-empty string. Would have failed loudly the moment the coalesce typo shipped — instead of silently passing in CI while production bypassed. - test_get_ledger_revision_returns_empty_for_empty_table Pins the "" sentinel contract for the dominant cold-session case. - test_get_ledger_revision_advances_after_update Pins the core M7a contract: an UPDATE must advance the marker. - test_get_ledger_revision_returns_latest_across_many_rows Intended to pin ORDER BY DESC ordering with two rows. Removed because empirically ~50% flaky under pytest's batch runner against SurrealDB v2 memory backend (0/20 wrong in standalone repro, but ~7/15 under load). Contract is already covered by the advances- after-update test above; the ORDER BY internals are tracked in task BicameralAI#8 (constant-time counter alternative). ## Known limitations documented inline The ORDER BY query is ~8ms p50 at N=1000 on memory:// — the idx_decision_updated_at index does NOT accelerate ORDER BY DESC LIMIT 1 in SurrealDB v2 (verified by EXPLAIN-style timing). Acceptable for correctness (which is what BicameralAI#87 Phase 4 promised), but over Kevin's ≤1ms latency target. Follow-up BicameralAI#8 evaluates a constant-time counter maintained in bicameral_meta. ## Verification - 0/10 flakes on the broader Phase 4+5+legacy-fixture test cluster - All 16 v18 tests pass, including the 3 new sociable tests - Lint + format clean on ledger/queries.py + the test file Refs BicameralAI#87 Phase 4 (production fix). Stacked on BicameralAI#310's merge to dev. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

silongtan force-pushed the 87-preflight-dedup-key branch from 454f3e7 to ac26cc9 Compare May 13, 2026 01:39

silongtan force-pushed the 87-preflight-dedup-telemetry branch from 8acd2d2 to 18a450c Compare May 13, 2026 01:39

silongtan force-pushed the 87-preflight-dedup-key branch from ac26cc9 to eb32e80 Compare May 13, 2026 01:41

silongtan and others added 2 commits May 12, 2026 21:41

silongtan force-pushed the 87-preflight-dedup-telemetry branch from 18a450c to 1d752cc Compare May 13, 2026 01:41

silongtan and others added 2 commits May 12, 2026 22:20

Base automatically changed from 87-preflight-dedup-key to dev May 13, 2026 02:59

silongtan temporarily deployed to ci-test May 13, 2026 03:00 — with GitHub Actions Inactive

silongtan had a problem deploying to recording-approval May 13, 2026 03:00 — with GitHub Actions Failure

silongtan temporarily deployed to production May 13, 2026 03:00 — with GitHub Actions Inactive

silongtan merged commit 28255bc into dev May 13, 2026
7 of 8 checks passed

silongtan deleted the 87-preflight-dedup-telemetry branch May 13, 2026 03:31

This was referenced May 13, 2026

Preflight: broaden dedup cache key to include file_paths + ledger revision (M7a/b/c) #87

Closed

infra(pre-commit): #357 sub-task 3 — local ruff enforcement at commit time #361

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(preflight): dedup decision telemetry — #87 Phase 5#310

feat(preflight): dedup decision telemetry — #87 Phase 5#310
silongtan merged 5 commits into
devfrom
87-preflight-dedup-telemetry

silongtan commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silongtan commented May 13, 2026

Summary

Two outcomes emitted

What's intentionally NOT instrumented

What changed

preflight_telemetry.py

handlers/preflight.py

tests/test_preflight_dedup_telemetry.py (new, 15 tests)

Test results

Dashboard hookup

Out of scope

Test plan

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`preflight_telemetry.py`

`handlers/preflight.py`

`tests/test_preflight_dedup_telemetry.py` (new, 15 tests)

coderabbitai Bot commented May 13, 2026 •

edited

Loading