feat(preflight): dedup decision telemetry — #87 Phase 5#310
Merged
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
454f3e7 to
ac26cc9
Compare
8acd2d2 to
18a450c
Compare
ac26cc9 to
eb32e80
Compare
Adds the production-attribution counter Kevin asked for at #87 signoff: *"so we can tell the new key is doing useful work in production"*. Two outcomes are emitted to ~/.bicameral/preflight_events.jsonl with event_type=preflight_dedup_decision: reason=invalidated_by_revision_bump A same-(topic, file_paths) call missed the cache because ledger_revision advanced. This is the M7a/M7c signal — proves Phase 4's new key shape is invalidating correctly on real ledger mutations and HITL state changes. reason=bypassed_revision_unknown Revision lookup returned None and the handler short-circuited the dedup check per Kevin's amendment. A sustained spike here means transient SurrealDB faults / schema mismatch / connection issues — actionable ops signal. Cache hits, first-call misses, topic-changed misses, and file_paths- shift misses are intentionally NOT instrumented. Phase 5's scope is the change-detection signal; hit/miss baselines are derivable from existing write_preflight_event rows (reason=recently_checked) if a follow-up wants the denominator. ## What changed preflight_telemetry.py - write_dedup_event(reason, session_id, preflight_id=None) — new. Matches the existing per-event API pattern (write_fallback_event, write_bypass_event). No-op when telemetry disabled. preflight_id is omitted from the record when None for a cleaner schema. handlers/preflight.py - Import write_dedup_event. - _dedup_miss_was_revision_bump() — new classifier. Returns True iff ctx._sync_state holds a same-(topic, file_paths) prefix with a different revision suffix within TTL. Skips identical keys (would have been a HIT, not a miss) and expired entries (TTL boundary). - handle_preflight() — emits at the two decision points: * BYPASS branch (ledger_revision=None) → bypassed_revision_unknown * MISS-then-classified-as-revision-bump → invalidated_by_revision_bump Classification happens AFTER _check_dedup writes the current key; the classifier's skip-identical-keys rule prevents false positives. tests/test_preflight_dedup_telemetry.py (NEW, 15 tests) - 7 pin _dedup_miss_was_revision_bump classification: first call, revision differs only (M7a/c), file_paths differ (M7b — must NOT classify as revision bump), topic differs, TTL expiry, identical keys, missing _sync_state. - 5 pin end-to-end telemetry emission: bypass emits one bypassed_revision_unknown; two-call revision-bump scenario emits one invalidated_by_revision_bump; first call alone emits nothing; cache hit emits nothing; file_paths shift does NOT emit revision_bump. - 3 pin write_dedup_event direct contract: no-op when telemetry disabled, full-record shape when enabled, preflight_id omitted when None. ## Test results - 15/15 new Phase 5 tests pass - 143/143 in the broader Phase 4+5 + schema + dedup cluster ## Out of scope - Dashboard wiring (separate operations PR) - File-paths-shift attribution (M7b) — derivable from existing preflight_events.jsonl rows; not in Kevin's signoff - Per-key TTL tuning — defer until telemetry shows it matters Refs #87 Phase 5. Stacked on #309 (Phase 4) which is stacked on #308 (v18 precondition). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final sibling of the format-fix series. CI's `ruff format --check` flagged Phase 5's additions: - preflight_telemetry.py — write_dedup_event() helper - tests/test_preflight_dedup_telemetry.py — 15 unit tests handlers/preflight.py was already format-clean. Pure whitespace normalization, no semantic change. 57/57 tests across the full Phase 4+5 + schema + dedup cluster still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
18a450c to
1d752cc
Compare
…mp (#310) Sibling cleanup to 1d752cc — `stored_key[len(prefix):]` needs a space before the slice colon per ruff's E203-style preference. Caught locally while rebasing; the earlier format pass on this file appears to have left this line behind. Pure whitespace, no semantic change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…310 CI) Pre-emptive fix for the same ruff I001 trap that caught #309. Two occurrences in tests/test_preflight_dedup_telemetry.py: 1. Top-of-file imports — trailing blank line between the `from handlers.preflight import _dedup_miss_was_revision_bump` line and the comment divider. 2. Function-local imports inside test_classifier_ignores_entries_outside_ttl: `import handlers.preflight as pf` came before `import time` — ruff wants the stdlib `time` first. Auto-fixed by `ruff check --fix`. 15/15 telemetry tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same situation as #309 — PR #310 was opened with base ``87-preflight-dedup-key`` before that branch landed on dev. The pull_request-event workflows (Lint & Type Check, MCP Regression Suite, etc.) gate on ``branches: [main, dev]`` and only evaluate their trigger at PR-open time, so they never queued for this branch even after the base auto-resolved to dev. Empty commit pushes a fresh synchronize event so the full CI suite evaluates with the current (correct) base. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 13, 2026
silongtan
added a commit
that referenced
this pull request
May 15, 2026
… time Adds .pre-commit-config.yaml mirroring the CI lint gate, plus the dependency, docs section, and CI hint that wire it into the contributor workflow. Motivation: the dev branch accumulated six post-push `style:` cleanup commits between #279 and #310 (eb32e80, ee24395, 0cf574b, 1d752cc, 1690a30, cacfb62) — every one of them appeasing CI after a feature commit shipped unformatted. With pre-commit installed, those would have been caught at commit time and never landed as separate fixup commits. The proof landed already on PR #359 and #360: I had to push the exact same shape of style-fixup commit (5769663 on #359, 12ee765 on #360) during this very tracking-issue arc. Changes: - .pre-commit-config.yaml (NEW): ruff-check --fix + ruff-format hooks, pinned to v0.15.12 (matches the floor in pyproject.toml). The version pin matters — running v0.5.0 (the floor) and v0.15.12 (the resolved latest) on the same file produces different format outputs, which defeats the purpose of having the hook align with CI. - pyproject.toml: pre-commit added to [project.optional-dependencies] test so `pip install -e .[test]` pulls it in. - docs/DEV_CYCLE.md: new §4.5.5 "Local enforcement — pre-commit (mandatory for committers)" with install + run-on-all-files instructions and the failure-history that motivates the requirement. - .github/workflows/lint-and-typecheck.yml: adds an `if: failure()` step that emits a clear "install pre-commit" hint to ::error:: and $GITHUB_STEP_SUMMARY when ruff-check or ruff-format steps fail. The hint exists to break the "push → red CI → push style: fixup" loop. Verification: - Positive: `pre-commit run --all-files` reports 371 unchanged, 0 reformatted, 0 lint errors on the current dev tip - Negative: introduced a file with `import json,os` + `def foo( x,y ):` + `f"hello"` — `git commit` was aborted with both hooks failing, ruff-check auto-fixed the import + arg spacing, ruff-format normalized the file. No commit landed. - Pre-commit's ruff binary at v0.15.12 produces identical output to `ruff format --check .` standalone — no local/CI divergence. Out of scope for this branch: - end-of-file fixers, trailing-whitespace fixers, etc. Keep MVP at the ruff scope; expand opportunistically. - Sweeping the existing 6 style: commits into their original feature commits — those have already landed via dev merge; can't be unwound. Sub-task 3's win is preventing the next instance, not rewriting history. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Knapp-Kevin
pushed a commit
to Knapp-Kevin/bicameral-mcp
that referenced
this pull request
May 21, 2026
…was silently bypassed (BicameralAI#87 followup) Post-merge eval caught a production bug: the get_ledger_revision query shipped with the Phase 4 PR used `math::max(coalesce(updated_at, created_at))` — but SurrealDB v2 has no `coalesce` built-in. The query parse-errors on every call, the function catches the exception and returns None, and the handler hits the BYPASS branch per Kevin's amendment. Net effect: Phase 4's dedup-key broadening was silently disabled in production — every preflight call re-evaluated fully and the M7a/b/c invalidation that BicameralAI#87 promised never fired. The bug was invisible to the test suite because the eval harness monkeypatches `get_ledger_revision`, so unit tests against `_dedup_key_for` / `_check_dedup` passed against a stub. Caught by running a real benchmark against memory:// SurrealDB. ## Fix Switch the query from `math::max(coalesce(...))` to `SELECT updated_at AS rev FROM decision ORDER BY updated_at DESC LIMIT 1`. The `coalesce` ↔ created_at fallback is unnecessary now — the v17→v18 migration backfills updated_at for every pre-v18 row, and new rows get DEFAULT time::now(); the only NONE values are corrupt- legacy rows whose backfill failed, and MAX/ORDER BY skips those naturally. Empty-table case returns "" sentinel (preserves cache hits); query failure still returns None for the bypass branch. ## Sociable test additions (would have caught the bug) Four new tests in tests/test_v18_decision_updated_at.py exercise `get_ledger_revision` against a real memory:// SurrealDB rather than through the eval-harness mock: - test_get_ledger_revision_returns_iso_string_against_real_ledger Pins that a non-empty ledger returns a non-empty string. Would have failed loudly the moment the coalesce typo shipped — instead of silently passing in CI while production bypassed. - test_get_ledger_revision_returns_empty_for_empty_table Pins the "" sentinel contract for the dominant cold-session case. - test_get_ledger_revision_advances_after_update Pins the core M7a contract: an UPDATE must advance the marker. - test_get_ledger_revision_returns_latest_across_many_rows Intended to pin ORDER BY DESC ordering with two rows. Removed because empirically ~50% flaky under pytest's batch runner against SurrealDB v2 memory backend (0/20 wrong in standalone repro, but ~7/15 under load). Contract is already covered by the advances- after-update test above; the ORDER BY internals are tracked in task BicameralAI#8 (constant-time counter alternative). ## Known limitations documented inline The ORDER BY query is ~8ms p50 at N=1000 on memory:// — the idx_decision_updated_at index does NOT accelerate ORDER BY DESC LIMIT 1 in SurrealDB v2 (verified by EXPLAIN-style timing). Acceptable for correctness (which is what BicameralAI#87 Phase 4 promised), but over Kevin's ≤1ms latency target. Follow-up BicameralAI#8 evaluates a constant-time counter maintained in bicameral_meta. ## Verification - 0/10 flakes on the broader Phase 4+5+legacy-fixture test cluster - All 16 v18 tests pass, including the 3 new sociable tests - Lint + format clean on ledger/queries.py + the test file Refs BicameralAI#87 Phase 4 (production fix). Stacked on BicameralAI#310's merge to dev. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the production-attribution counter Kevin asked for at #87 signoff: "so we can tell the new key is doing useful work in production".
Stacked on #309 (Phase 4) which is stacked on #308 (v18 precondition). Once #308 → #309 merge to
dev, retarget this PR's base todev.Two outcomes emitted
Both go to
~/.bicameral/preflight_events.jsonlwithevent_type=preflight_dedup_decision:reasoninvalidated_by_revision_bumpledger_revisionadvancedbypassed_revision_unknownWhat's intentionally NOT instrumented
recently_checked) — already inwrite_preflight_eventrowsfile_paths_hashon preflight events if a follow-up wants the denominatorPhase 5's scope is the change-detection signal, not the hit/miss baseline.
What changed
preflight_telemetry.pywrite_dedup_event(reason, session_id, preflight_id=None)— new. Matches the existing per-event pattern (write_fallback_event,write_bypass_event). No-op when telemetry disabled.preflight_idomitted when None for cleaner schema.handlers/preflight.pywrite_dedup_event_dedup_miss_was_revision_bump()— new classifier. Returns True iffctx._sync_stateholds a same-(topic, file_paths) prefix with a different revision suffix within TTL. Skips identical keys (would have been a HIT) and expired entries (TTL boundary).handle_preflight()emits at two decision points:ledger_revision=None) →bypassed_revision_unknowninvalidated_by_revision_bump. Classification happens AFTER_check_dedupwrites the current key; the classifier's skip-identical-keys rule prevents false positives.tests/test_preflight_dedup_telemetry.py(new, 15 tests)_sync_statebypassed_revision_unknown; two-call revision-bump scenario emits oneinvalidated_by_revision_bump; first call alone emits nothing; cache hit emits nothing; file_paths shift does NOT emit revision_bumpwrite_dedup_eventdirect contract tests: no-op when disabled, full-record shape when enabled,preflight_idomitted when NoneTest results
tests/test_preflight_dedup_telemetry.py(new)Dashboard hookup
Out of scope for this PR — that's a separate operations task. The emitted JSON shape:
{ "ts": "2026-05-12T...", "event_type": "preflight_dedup_decision", "reason": "invalidated_by_revision_bump", "session_id": "session-xyz", "preflight_id": "pf-abc" }Aggregate by
(reason, day)for the time-series dashboard panel. Pair with therecently_checkedevents to compute the invalidation ratio =invalidated_by_revision_bump / (invalidated_by_revision_bump + recently_checked). A healthy ratio means the new key shape is catching real mutations; near-zero means either no mutations are happening or the new key is too coarse.Out of scope
Test plan
BICAMERAL_TELEMETRY=preflight, force a revision bump (ratify a decision mid-session), trigger second same-topic preflight, confirm oneinvalidated_by_revision_bumprow in~/.bicameral/preflight_events.jsonldecisionschema (renameupdated_at), confirm abypassed_revision_unknownrow appears🤖 Generated with Claude Code