vaaraio · vaaraio · May 28, 2026 · May 28, 2026 · May 28, 2026 · May 28, 2026
@@ -18,13 +18,17 @@ dist_verify/
 research/
 .claude/
 
-# Private / internal — never publish
+# Private / internal, never publish
 *.tape
 .regwatch/
 scripts/regwatch*
 .shipped/
 .v0*_watch/
+application_*.pdf
+outbound_*.md
+site.py.live
 
 # Bench output (PAIR runs, dist-shift, vLLM logs). Reproducible by rerun.
 tests/adversarial/v031/
 .parachute/
+claude-code-audit.db
@@ -530,7 +530,7 @@ the closed-weight attacker patterns the v7 fold was missing.
   tampering rejection, canonicalization invariants, and TTL handling.
 
 ### Changed
-- Production classifier: v7 → v8. v7 retained on disk for cross-eval
+- Production classifier: v7 to v8. v7 retained on disk for cross-eval
   reproducibility. Threshold unchanged at 0.9006.
 - `attestation` optional extra: adds `rfc8785>=0.1.4` for JCS
   canonicalization.
@@ -1531,8 +1531,8 @@ unchanged in behaviour; this patch restores PyPI/npm version lockstep
 established in v0.15.0. No Python code changes versus 0.18.0.
 
 ### Changed
-- `clients/ts/package.json`: 0.17.0 → 0.18.1 (lockstep with PyPI).
-- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 → 0.18.1.
+- `clients/ts/package.json`: 0.17.0 to 0.18.1 (lockstep with PyPI).
+- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 to 0.18.1.
 
 ## [0.18.0] - 2026-05-17
 
@@ -1574,7 +1574,7 @@ TEE report is a sibling artefact bound to a specific envelope by placing
   non-SEV-SNP host error path.
 
 ### Not in this release
-- AMD KDS-based cert-chain validation (VCEK → ASK → ARK). Validating
+- AMD KDS-based cert-chain validation (VCEK to ASK to ARK). Validating
   against AMD's Key Distribution Service requires a network fetch
   against `https://kdsintf.amd.com/` and is tracked for v0.19+.
 - Live `/dev/sev-guest` ioctl emission. The `SNP_GET_REPORT` ioctl path
@@ -1681,8 +1681,8 @@ Node service) can call Vaara without spawning a Python sidecar.
   stays manual. Once enabled and an ``NPM_TOKEN`` secret is set, every
   tag push publishes ``@vaara/client`` to npm with provenance.
 - 6 new TypeScript tests covering URL construction, JSON body
-  serialisation, the 4xx → ``VaaraError`` path with server-supplied
-  code, the network-failure → ``VaaraTransportError`` path, the
+  serialisation, the 4xx to ``VaaraError`` path with server-supplied
+  code, the network-failure to ``VaaraTransportError`` path, the
   detector response shape, and constructor input validation.
 
 ### Notes
@@ -1737,7 +1737,7 @@ slot in alongside Vaara's adaptive scorer with a single object.
 close the most legible competitive gaps without diluting the kernel
 position. Hot policy reload meets the Galileo Agent Control selling
 point on its own ground. The OVERT 1.0 Phase 3 Independent Attestation
-Provider (IAP) reference closes the AAL-3 → AAL-4 promotion path that
+Provider (IAP) reference closes the AAL-3 to AAL-4 promotion path that
 v0.11.0's Provisional Receipt opens, so Vaara owns the full path
 without forcing dependence on an external IAP vendor. Named injection
 and PII detectors expose existing scoring surface under buyer-visible
@@ -1775,7 +1775,7 @@ facing visual artefact that the peer set has converged on.
   vaara-bench-v1's published numbers (heuristic fallback when the ml
   extra is absent; the `backend` field reports which path served the
   call). `vaara.detect.detect_pii` is a zero-dependency regex extractor
-  over six categories — email, phone, US SSN, IPv4, credit_card
+  over six categories: email, phone, US SSN, IPv4, credit_card
   (Luhn-checked), IBAN (mod-97 checksum). `POST /v1/detect/injection`
   and `POST /v1/detect/pii` mirror the CLI. `vaara detect injection`
   and `vaara detect pii` read text from `--text`, `--file`, or
@@ -2090,7 +2090,7 @@ their conformal interval, get claimed by an operator, and produce an
 - **`vaara.audit.review_queue` module.** `ReviewQueue` is a
   SQLite-backed queue in its own DB file, separate from the audit DB
   (which keeps its append-only invariant clean). Statuses:
-  `pending → claimed → resolved` happy path, `pending → expired`
+  `pending to claimed to resolved` happy path, `pending to expired`
   stale path. Resolutions: `allow`, `deny`, `abstain`. `enqueue`
   records each item with the conformal interval, risk signals,
   bucket category, and request parameters/context as JSON. The
@@ -2183,7 +2183,7 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
 - **`scripts/lint_full.sh` pre-push lint sweep** - chains `ruff` (style + correctness), `bandit` (security), `mypy` (types - strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites.
 
 ### Changed
-- **Audit DB schema v2 → v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
+- **Audit DB schema v2 to v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
 - **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results:
   - **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
   - **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant - heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
@@ -2211,8 +2211,8 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
 - `tests/test_adversarial_classifier_integration.py` covers the bundle-load, score-range, and known-bad-input paths end-to-end. Skipped when `vaara[ml]` extras are not installed.
 
 ### Changed
-- **Default classifier threshold: `0.5` → `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
-- **Bundle format `version` bumped `1.1` → `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
+- **Default classifier threshold: `0.5` to `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
+- **Bundle format `version` bumped `1.1` to `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
 - **`scripts/train_adversarial_classifier.py`** now coerces non-dict `context` and `parameters` entries (string-typed entries existed in the corpus from v0.5.0 onward but the trainer crashed on them) and runs `baseline_predictions` in `best_effort=True` mode. Net effect: trainer runs cleanly on the heterogeneous corpus.
 
 ### Benchmarks (5-fold CV OOF, threshold 0.55)
@@ -2259,7 +2259,7 @@ Per-category allow-leakage on the seed corpus (`tests/adversarial/<category>.jso
 | destructive_actions | 20% | **4%** |
 
 ### Known limits / honest read
-- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% → 79.4%) and attack recall regressed **5.4pp** (85.2% → 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
+- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% to 79.4%) and attack recall regressed **5.4pp** (85.2% to 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
 - LLM-generated content shares Qwen-style writing. The distribution-shift gap between generated-test recall and hand-curated-held-out recall has **not** been measured separately in this release. It will be reported in v0.6. Hand-curated regression numbers above are evidence that transfer is happening, but a formal split is owed.
 - Attacker-as-iterative-PAIR ceiling has **not** been measured. `COMPLIANCE.md` does not yet quote an adaptive-ASR figure.
 
@@ -2281,10 +2281,10 @@ At threshold 0.55, the 21.0% global FPR is a **reviewer queue**, not a blast doo
 - `_STATIC_FEATURES` constant plus load-time schema-drift check in `src/vaara/adversarial_classifier.py`. A bundle whose `feature_names` tail diverges from the runtime static feature list now raises `ValueError` at construction time, pinpointing the first differing index. This class of bug is no longer shippable without failing loud.
 
 ### Changed
-- Default threshold: `0.3` (v0.5.1) → `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
+- Default threshold: `0.3` (v0.5.1) to `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
 - `scripts/train_adversarial_classifier.py` `load_corpus` now uses `rglob` to recurse into `tests/adversarial/generated/` and `benign_generated/` automatically.
 - The `data_exfil` and `destructive_actions` regressions disclosed in v0.5.1 were artifacts of the broken bundle. The rebuilt classifier in v0.5.2 beats the heuristic in both: `destructive_actions` +40.2, `data_exfil` +24.7.
-- Bundle format `version` bumped 1.0 → 1.1.
+- Bundle format `version` bumped 1.0 to 1.1.
 
 ### Benchmarks (by-seed held-out, threshold 0.5)
 - Attack recall: **85.2%**

@@ -470,7 +470,7 @@ correspondence.
   policy id and violation reason.
 - **TOOL-1.4** (provisional receipt before execution, upgrade to full
   attestation after notary validation) - ✅ structurally at AAL-3,
-  with the AAL-3 → AAL-4 path now implementable in-tree. The Article
+  with the AAL-3 to AAL-4 path now implementable in-tree. The Article
   12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
   Provisional Receipt; the v0.11.0 OVERT Base Envelope is the
   attested form. v0.13.0 ships a reference Phase 3 IAP

@@ -41,7 +41,7 @@ primitive in Section 9, MEA-2.
   policy id and violation reason.
 - **TOOL-1.4** (provisional receipt before execution, upgrade to full
   attestation after notary validation) - ✅ structurally at AAL-3,
-  with the AAL-3 → AAL-4 path now implementable in-tree. The Article
+  with the AAL-3 to AAL-4 path now implementable in-tree. The Article
   12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
   Provisional Receipt. The v0.11.0 OVERT Base Envelope is the
   attested form. v0.13.0 ships a reference Phase 3 IAP

@@ -26,11 +26,11 @@ Held-out TEST recall 84.7% (95% Wilson [82.4, 86.7]) at FPR 4.1% [2.9, 5.7]. Pha
 - Classifier v9 with 236 hand-features + 384-dim MiniLM embeddings at calibrated threshold 0.9150 on held-out TEST n=1,827: recall 84.7% [82.4, 86.7] at FPR 4.1% [2.9, 5.7]
 - Multi-attacker PAIR robustness: 0/25 successes per attacker across Qwen2.5-32B, Qwen2.5-72B, Llama-3.3-70B hitting identical seed indices, Wilson upper 13.3%
 - BIPIA-pressure FPR on benign tool calls 1.2% [0.4, 3.6] across four agent backends, n=244 benign tool calls under `context.source=injected_via_bipia_<class>`
-- Chain of custody: corpus manifest SHA → split manifest SHA → training commit → bundle SHA, all locked and printed by every script
+- Chain of custody: corpus manifest SHA, split manifest SHA, training commit, bundle SHA, all locked and printed by every script
 - 140 µs mean / 210 µs p99 inference latency, commodity CPU
 - Distribution-free conformal coverage on the score
 - MWU regret bound O(sqrt(T log N))
-- [vaara-bench-v0.39](bench/vaara-bench-v0.39.md): current methodology, chain of custody, ship-gate record. v9 retrain on BIPIA-augmented corpus with follows upweighted (`--follow-weight 8.0`), calibrated to T=0.9150 at a 5% FPR target on v035 VAL. BIPIA-pressure FPR collapses from 35.2% on v8 to 1.2% on v9. In-distribution recall flat within Wilson intervals. Found-and-fixed in tree: auto-labeller `example.com` placeholder false-positive rule (42 → 14 true follows across four backends). Historical bench docs live under `bench/` for chain-of-custody continuity.
+- [vaara-bench-v0.39](bench/vaara-bench-v0.39.md): current methodology, chain of custody, ship-gate record. v9 retrain on BIPIA-augmented corpus with follows upweighted (`--follow-weight 8.0`), calibrated to T=0.9150 at a 5% FPR target on v035 VAL. BIPIA-pressure FPR collapses from 35.2% on v8 to 1.2% on v9. In-distribution recall flat within Wilson intervals. Found-and-fixed in tree: auto-labeller `example.com` placeholder false-positive rule (42 to 14 true follows across four backends). Historical bench docs live under `bench/` for chain-of-custody continuity.
 - [vaara-bench-v1](bench/vaara-bench-v1.md): 77-trace synthetic-corpus regression baseline with frozen methodology, 100% soft TPR, 0% hard FPR
 
 Each figure is reproducible from the public corpus or the bench pipeline in `bench/`.
@@ -87,7 +87,7 @@ else:
 
 The same data renders as a styled PDF for Notified Bodies (`vaara compliance report --format pdf`, requires `pip install 'vaara[pdf]'`), a static HTML dashboard (`vaara compliance dashboard`), or a Sigstore-signed regulator-handoff envelope (`vaara trail export`, optional ML-DSA-65 / FIPS 204 post-quantum signer via `pip install 'vaara[pq]'`).
 
-Each article verdict carries `verdict_inputs` (threshold-vs-observed snapshot), `verdict_reasons` (rationale lines), and `contributing_events` (the audit records the verdict sits on, with a `drill_down` of the data that fed the risk/decision/outcome). Reviewers can trace `status → threshold delta → concrete event` without re-running the engine.
+Each article verdict carries `verdict_inputs` (threshold-vs-observed snapshot), `verdict_reasons` (rationale lines), and `contributing_events` (the audit records the verdict sits on, with a `drill_down` of the data that fed the risk/decision/outcome). Reviewers can trace `status` to `threshold delta` to `concrete event` without re-running the engine.
 
 ## Framework adapters
 
@@ -155,7 +155,7 @@ if (r.decision === "deny") throw new Error("blocked");
 Four preset operating points for the risk thresholds, shaped like CPU power profiles:
 
 - `eco` (escalate 0.40, deny 0.60). Tight deny threshold cuts agent loops short on borderline risk. Pair with regex-first gating to short-circuit before any model forward pass.
-- `balanced` (0.55, 0.85). Vaara's default behaviour.
+- `balanced` (0.55, 0.85). Vaara's default behavior.
 - `performance` (0.70, 0.92). Looser thresholds let more through. For high-throughput pipelines where the deployer keeps tight action-class overrides on the few classes that matter.
 - `strict` (0.30, 0.55). Escalate-on-doubt. For incident response, audit prep, or production lockdown windows.
 
@@ -210,8 +210,8 @@ OVERT envelopes per governed interaction turn on with `--overt-signing-key`, `--
 
 Worked examples:
 
-- [`examples/github-mcp-proxy-demo/`](examples/github-mcp-proxy-demo/) — Vaara in front of [`github/github-mcp-server`](https://github.com/github/github-mcp-server), 42 tools, hash-chained audit trail recorded end-to-end.
-- [`examples/sap-mcp-proxy-demo/`](examples/sap-mcp-proxy-demo/) — Vaara in front of community SAP MCP servers ([`SAP/mdk-mcp-server`](https://github.com/SAP/mdk-mcp-server), [`mario-andreschak/mcp-abap-abap-adt-api`](https://github.com/mario-andreschak/mcp-abap-abap-adt-api), [`lemaiwo/btp-sap-odata-to-mcp-server`](https://github.com/lemaiwo/btp-sap-odata-to-mcp-server)).
+- [`examples/github-mcp-proxy-demo/`](examples/github-mcp-proxy-demo/): Vaara in front of [`github/github-mcp-server`](https://github.com/github/github-mcp-server), 42 tools, hash-chained audit trail recorded end-to-end.
+- [`examples/sap-mcp-proxy-demo/`](examples/sap-mcp-proxy-demo/): Vaara in front of community SAP MCP servers ([`SAP/mdk-mcp-server`](https://github.com/SAP/mdk-mcp-server), [`mario-andreschak/mcp-abap-abap-adt-api`](https://github.com/mario-andreschak/mcp-abap-abap-adt-api), [`lemaiwo/btp-sap-odata-to-mcp-server`](https://github.com/lemaiwo/btp-sap-odata-to-mcp-server)).
 
 ## OVERT 1.0 attestation
 

@@ -68,8 +68,8 @@ listed for that article, then evaluates in this order:
    produces `moderate`. Any qualifying records below those bars are
    `weak`. No records are `absent`.
 6. **Future-timestamp downgrade.** If any record carries a future
-   timestamp, strength drops one tier (`strong` → `moderate`,
-   `moderate` → `weak`). The freshness signal cannot be trusted when
+   timestamp, strength drops one tier (`strong` to `moderate`,
+   `moderate` to `weak`). The freshness signal cannot be trusted when
    the clock cannot be trusted.
 
 ## EU AI Act per-article thresholds