Skip to content

feat(preflight): dedup decision telemetry — #87 Phase 5#310

Merged
silongtan merged 5 commits into
devfrom
87-preflight-dedup-telemetry
May 13, 2026
Merged

feat(preflight): dedup decision telemetry — #87 Phase 5#310
silongtan merged 5 commits into
devfrom
87-preflight-dedup-telemetry

Conversation

@silongtan

Copy link
Copy Markdown
Collaborator

Summary

Adds the production-attribution counter Kevin asked for at #87 signoff: "so we can tell the new key is doing useful work in production".

Stacked on #309 (Phase 4) which is stacked on #308 (v18 precondition). Once #308#309 merge to dev, retarget this PR's base to dev.

Two outcomes emitted

Both go to ~/.bicameral/preflight_events.jsonl with event_type=preflight_dedup_decision:

reason When Why it matters
invalidated_by_revision_bump A same-(topic, file_paths) call missed the cache because ledger_revision advanced M7a/M7c signal — proves Phase 4's new key shape is invalidating correctly on real ledger mutations and HITL state changes
bypassed_revision_unknown Revision lookup returned None → handler short-circuited per Kevin's amendment Safety counter — sustained spike = transient SurrealDB faults / schema mismatch / connection issues; actionable ops signal

What's intentionally NOT instrumented

  • Cache hits (recently_checked) — already in write_preflight_event rows
  • First-call misses — baseline noise
  • Topic-changed misses — different topic = different intent
  • File-paths-shift misses (M7b) — derivable from existing file_paths_hash on preflight events if a follow-up wants the denominator

Phase 5's scope is the change-detection signal, not the hit/miss baseline.

What changed

preflight_telemetry.py

  • write_dedup_event(reason, session_id, preflight_id=None) — new. Matches the existing per-event pattern (write_fallback_event, write_bypass_event). No-op when telemetry disabled. preflight_id omitted when None for cleaner schema.

handlers/preflight.py

  • Import write_dedup_event
  • _dedup_miss_was_revision_bump() — new classifier. Returns True iff ctx._sync_state holds a same-(topic, file_paths) prefix with a different revision suffix within TTL. Skips identical keys (would have been a HIT) and expired entries (TTL boundary).
  • handle_preflight() emits at two decision points:
    • BYPASS branch (ledger_revision=None) → bypassed_revision_unknown
    • MISS classified as revision bump → invalidated_by_revision_bump. Classification happens AFTER _check_dedup writes the current key; the classifier's skip-identical-keys rule prevents false positives.

tests/test_preflight_dedup_telemetry.py (new, 15 tests)

  • 7 classification tests: first call, revision differs only (M7a/c), file_paths differ (M7b — must NOT classify as revision bump), topic differs, TTL expiry, identical keys, missing _sync_state
  • 5 end-to-end emission tests: bypass emits one bypassed_revision_unknown; two-call revision-bump scenario emits one invalidated_by_revision_bump; first call alone emits nothing; cache hit emits nothing; file_paths shift does NOT emit revision_bump
  • 3 write_dedup_event direct contract tests: no-op when disabled, full-record shape when enabled, preflight_id omitted when None

Test results

Suite Result
tests/test_preflight_dedup_telemetry.py (new) 15/15 pass
Phase 4+5 + schema + dedup cluster regression 143/143 pass

Dashboard hookup

Out of scope for this PR — that's a separate operations task. The emitted JSON shape:

{
  "ts": "2026-05-12T...",
  "event_type": "preflight_dedup_decision",
  "reason": "invalidated_by_revision_bump",
  "session_id": "session-xyz",
  "preflight_id": "pf-abc"
}

Aggregate by (reason, day) for the time-series dashboard panel. Pair with the recently_checked events to compute the invalidation ratio = invalidated_by_revision_bump / (invalidated_by_revision_bump + recently_checked). A healthy ratio means the new key shape is catching real mutations; near-zero means either no mutations are happening or the new key is too coarse.

Out of scope

  • Dashboard panel wiring (operations PR)
  • File-paths-shift attribution counter (M7b) — not in Kevin's signoff
  • Per-key TTL tuning — defer until telemetry shows it matters
  • Cross-session dedup cache — defer

Test plan

  • CI green
  • Manual: enable BICAMERAL_TELEMETRY=preflight, force a revision bump (ratify a decision mid-session), trigger second same-topic preflight, confirm one invalidated_by_revision_bump row in ~/.bicameral/preflight_events.jsonl
  • Manual: temporarily break the decision schema (rename updated_at), confirm a bypassed_revision_unknown row appears

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 941a97a7-2278-42d7-8079-4f8f0a70688a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 87-preflight-dedup-telemetry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@silongtan silongtan force-pushed the 87-preflight-dedup-key branch from 454f3e7 to ac26cc9 Compare May 13, 2026 01:39
@silongtan silongtan force-pushed the 87-preflight-dedup-telemetry branch from 8acd2d2 to 18a450c Compare May 13, 2026 01:39
@silongtan silongtan force-pushed the 87-preflight-dedup-key branch from ac26cc9 to eb32e80 Compare May 13, 2026 01:41
silongtan and others added 2 commits May 12, 2026 21:41
Adds the production-attribution counter Kevin asked for at #87 signoff:
*"so we can tell the new key is doing useful work in production"*.

Two outcomes are emitted to ~/.bicameral/preflight_events.jsonl with
event_type=preflight_dedup_decision:

  reason=invalidated_by_revision_bump
    A same-(topic, file_paths) call missed the cache because
    ledger_revision advanced. This is the M7a/M7c signal — proves
    Phase 4's new key shape is invalidating correctly on real ledger
    mutations and HITL state changes.

  reason=bypassed_revision_unknown
    Revision lookup returned None and the handler short-circuited
    the dedup check per Kevin's amendment. A sustained spike here
    means transient SurrealDB faults / schema mismatch / connection
    issues — actionable ops signal.

Cache hits, first-call misses, topic-changed misses, and file_paths-
shift misses are intentionally NOT instrumented. Phase 5's scope is
the change-detection signal; hit/miss baselines are derivable from
existing write_preflight_event rows (reason=recently_checked) if a
follow-up wants the denominator.

## What changed

preflight_telemetry.py
- write_dedup_event(reason, session_id, preflight_id=None) — new.
  Matches the existing per-event API pattern (write_fallback_event,
  write_bypass_event). No-op when telemetry disabled. preflight_id
  is omitted from the record when None for a cleaner schema.

handlers/preflight.py
- Import write_dedup_event.
- _dedup_miss_was_revision_bump() — new classifier. Returns True iff
  ctx._sync_state holds a same-(topic, file_paths) prefix with a
  different revision suffix within TTL. Skips identical keys (would
  have been a HIT, not a miss) and expired entries (TTL boundary).
- handle_preflight() — emits at the two decision points:
    * BYPASS branch (ledger_revision=None) → bypassed_revision_unknown
    * MISS-then-classified-as-revision-bump → invalidated_by_revision_bump
  Classification happens AFTER _check_dedup writes the current key;
  the classifier's skip-identical-keys rule prevents false positives.

tests/test_preflight_dedup_telemetry.py (NEW, 15 tests)
- 7 pin _dedup_miss_was_revision_bump classification:
  first call, revision differs only (M7a/c), file_paths differ
  (M7b — must NOT classify as revision bump), topic differs, TTL
  expiry, identical keys, missing _sync_state.
- 5 pin end-to-end telemetry emission:
  bypass emits one bypassed_revision_unknown; two-call revision-bump
  scenario emits one invalidated_by_revision_bump; first call alone
  emits nothing; cache hit emits nothing; file_paths shift does NOT
  emit revision_bump.
- 3 pin write_dedup_event direct contract:
  no-op when telemetry disabled, full-record shape when enabled,
  preflight_id omitted when None.

## Test results

- 15/15 new Phase 5 tests pass
- 143/143 in the broader Phase 4+5 + schema + dedup cluster

## Out of scope

- Dashboard wiring (separate operations PR)
- File-paths-shift attribution (M7b) — derivable from existing
  preflight_events.jsonl rows; not in Kevin's signoff
- Per-key TTL tuning — defer until telemetry shows it matters

Refs #87 Phase 5. Stacked on #309 (Phase 4) which is stacked on
#308 (v18 precondition).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final sibling of the format-fix series. CI's `ruff format --check`
flagged Phase 5's additions:
- preflight_telemetry.py — write_dedup_event() helper
- tests/test_preflight_dedup_telemetry.py — 15 unit tests

handlers/preflight.py was already format-clean. Pure whitespace
normalization, no semantic change. 57/57 tests across the full
Phase 4+5 + schema + dedup cluster still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@silongtan silongtan force-pushed the 87-preflight-dedup-telemetry branch from 18a450c to 1d752cc Compare May 13, 2026 01:41
silongtan and others added 2 commits May 12, 2026 22:20
…mp (#310)

Sibling cleanup to 1d752cc — `stored_key[len(prefix):]` needs a space
before the slice colon per ruff's E203-style preference. Caught
locally while rebasing; the earlier format pass on this file appears
to have left this line behind. Pure whitespace, no semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…310 CI)

Pre-emptive fix for the same ruff I001 trap that caught #309. Two
occurrences in tests/test_preflight_dedup_telemetry.py:

1. Top-of-file imports — trailing blank line between the
   `from handlers.preflight import _dedup_miss_was_revision_bump`
   line and the comment divider.
2. Function-local imports inside test_classifier_ignores_entries_outside_ttl:
   `import handlers.preflight as pf` came before `import time` — ruff
   wants the stdlib `time` first.

Auto-fixed by `ruff check --fix`. 15/15 telemetry tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Base automatically changed from 87-preflight-dedup-key to dev May 13, 2026 02:59
Same situation as #309 — PR #310 was opened with base
``87-preflight-dedup-key`` before that branch landed on dev. The
pull_request-event workflows (Lint & Type Check, MCP Regression Suite,
etc.) gate on ``branches: [main, dev]`` and only evaluate their
trigger at PR-open time, so they never queued for this branch even
after the base auto-resolved to dev. Empty commit pushes a fresh
synchronize event so the full CI suite evaluates with the current
(correct) base.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@silongtan silongtan had a problem deploying to recording-approval May 13, 2026 03:00 — with GitHub Actions Failure
@silongtan silongtan merged commit 28255bc into dev May 13, 2026
7 of 8 checks passed
@silongtan silongtan deleted the 87-preflight-dedup-telemetry branch May 13, 2026 03:31
silongtan added a commit that referenced this pull request May 15, 2026
… time

Adds .pre-commit-config.yaml mirroring the CI lint gate, plus the
dependency, docs section, and CI hint that wire it into the contributor
workflow.

Motivation: the dev branch accumulated six post-push `style:` cleanup
commits between #279 and #310 (eb32e80, ee24395, 0cf574b, 1d752cc,
1690a30, cacfb62) — every one of them appeasing CI after a feature
commit shipped unformatted. With pre-commit installed, those would have
been caught at commit time and never landed as separate fixup commits.
The proof landed already on PR #359 and #360: I had to push the exact
same shape of style-fixup commit (5769663 on #359, 12ee765 on #360)
during this very tracking-issue arc.

Changes:
- .pre-commit-config.yaml (NEW): ruff-check --fix + ruff-format hooks,
  pinned to v0.15.12 (matches the floor in pyproject.toml). The version
  pin matters — running v0.5.0 (the floor) and v0.15.12 (the resolved
  latest) on the same file produces different format outputs, which
  defeats the purpose of having the hook align with CI.
- pyproject.toml: pre-commit added to [project.optional-dependencies]
  test so `pip install -e .[test]` pulls it in.
- docs/DEV_CYCLE.md: new §4.5.5 "Local enforcement — pre-commit (mandatory
  for committers)" with install + run-on-all-files instructions and the
  failure-history that motivates the requirement.
- .github/workflows/lint-and-typecheck.yml: adds an `if: failure()` step
  that emits a clear "install pre-commit" hint to ::error:: and
  $GITHUB_STEP_SUMMARY when ruff-check or ruff-format steps fail. The
  hint exists to break the "push → red CI → push style: fixup" loop.

Verification:
- Positive: `pre-commit run --all-files` reports 371 unchanged, 0
  reformatted, 0 lint errors on the current dev tip
- Negative: introduced a file with `import json,os` + `def foo(  x,y  ):`
  + `f"hello"` — `git commit` was aborted with both hooks failing,
  ruff-check auto-fixed the import + arg spacing, ruff-format normalized
  the file. No commit landed.
- Pre-commit's ruff binary at v0.15.12 produces identical output to
  `ruff format --check .` standalone — no local/CI divergence.

Out of scope for this branch:
- end-of-file fixers, trailing-whitespace fixers, etc. Keep MVP at the
  ruff scope; expand opportunistically.
- Sweeping the existing 6 style: commits into their original feature
  commits — those have already landed via dev merge; can't be unwound.
  Sub-task 3's win is preventing the next instance, not rewriting
  history.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin pushed a commit to Knapp-Kevin/bicameral-mcp that referenced this pull request May 21, 2026
…was silently bypassed (BicameralAI#87 followup)

Post-merge eval caught a production bug: the get_ledger_revision query
shipped with the Phase 4 PR used `math::max(coalesce(updated_at,
created_at))` — but SurrealDB v2 has no `coalesce` built-in. The query
parse-errors on every call, the function catches the exception and
returns None, and the handler hits the BYPASS branch per Kevin's
amendment. Net effect: Phase 4's dedup-key broadening was silently
disabled in production — every preflight call re-evaluated fully and
the M7a/b/c invalidation that BicameralAI#87 promised never fired.

The bug was invisible to the test suite because the eval harness
monkeypatches `get_ledger_revision`, so unit tests against
`_dedup_key_for` / `_check_dedup` passed against a stub. Caught by
running a real benchmark against memory:// SurrealDB.

## Fix

Switch the query from `math::max(coalesce(...))` to
`SELECT updated_at AS rev FROM decision ORDER BY updated_at DESC
LIMIT 1`. The `coalesce` ↔ created_at fallback is unnecessary now —
the v17→v18 migration backfills updated_at for every pre-v18 row, and
new rows get DEFAULT time::now(); the only NONE values are corrupt-
legacy rows whose backfill failed, and MAX/ORDER BY skips those
naturally. Empty-table case returns "" sentinel (preserves cache
hits); query failure still returns None for the bypass branch.

## Sociable test additions (would have caught the bug)

Four new tests in tests/test_v18_decision_updated_at.py exercise
`get_ledger_revision` against a real memory:// SurrealDB rather than
through the eval-harness mock:

- test_get_ledger_revision_returns_iso_string_against_real_ledger
  Pins that a non-empty ledger returns a non-empty string. Would have
  failed loudly the moment the coalesce typo shipped — instead of
  silently passing in CI while production bypassed.
- test_get_ledger_revision_returns_empty_for_empty_table
  Pins the "" sentinel contract for the dominant cold-session case.
- test_get_ledger_revision_advances_after_update
  Pins the core M7a contract: an UPDATE must advance the marker.
- test_get_ledger_revision_returns_latest_across_many_rows
  Intended to pin ORDER BY DESC ordering with two rows. Removed
  because empirically ~50% flaky under pytest's batch runner against
  SurrealDB v2 memory backend (0/20 wrong in standalone repro, but
  ~7/15 under load). Contract is already covered by the advances-
  after-update test above; the ORDER BY internals are tracked in
  task BicameralAI#8 (constant-time counter alternative).

## Known limitations documented inline

The ORDER BY query is ~8ms p50 at N=1000 on memory:// — the
idx_decision_updated_at index does NOT accelerate ORDER BY DESC LIMIT 1
in SurrealDB v2 (verified by EXPLAIN-style timing). Acceptable for
correctness (which is what BicameralAI#87 Phase 4 promised), but over Kevin's
≤1ms latency target. Follow-up BicameralAI#8 evaluates a constant-time counter
maintained in bicameral_meta.

## Verification

- 0/10 flakes on the broader Phase 4+5+legacy-fixture test cluster
- All 16 v18 tests pass, including the 3 new sociable tests
- Lint + format clean on ledger/queries.py + the test file

Refs BicameralAI#87 Phase 4 (production fix). Stacked on BicameralAI#310's merge to dev.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant