Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,17 @@ dist_verify/
research/
.claude/

# Private / internal never publish
# Private / internal, never publish
*.tape
.regwatch/
scripts/regwatch*
.shipped/
.v0*_watch/
application_*.pdf
outbound_*.md
site.py.live

# Bench output (PAIR runs, dist-shift, vLLM logs). Reproducible by rerun.
tests/adversarial/v031/
.parachute/
claude-code-audit.db
30 changes: 15 additions & 15 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,7 @@ the closed-weight attacker patterns the v7 fold was missing.
tampering rejection, canonicalization invariants, and TTL handling.

### Changed
- Production classifier: v7 v8. v7 retained on disk for cross-eval
- Production classifier: v7 to v8. v7 retained on disk for cross-eval
reproducibility. Threshold unchanged at 0.9006.
- `attestation` optional extra: adds `rfc8785>=0.1.4` for JCS
canonicalization.
Expand Down Expand Up @@ -1531,8 +1531,8 @@ unchanged in behaviour; this patch restores PyPI/npm version lockstep
established in v0.15.0. No Python code changes versus 0.18.0.

### Changed
- `clients/ts/package.json`: 0.17.0 0.18.1 (lockstep with PyPI).
- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 0.18.1.
- `clients/ts/package.json`: 0.17.0 to 0.18.1 (lockstep with PyPI).
- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 to 0.18.1.

## [0.18.0] - 2026-05-17

Expand Down Expand Up @@ -1574,7 +1574,7 @@ TEE report is a sibling artefact bound to a specific envelope by placing
non-SEV-SNP host error path.

### Not in this release
- AMD KDS-based cert-chain validation (VCEK ASK ARK). Validating
- AMD KDS-based cert-chain validation (VCEK to ASK to ARK). Validating
against AMD's Key Distribution Service requires a network fetch
against `https://kdsintf.amd.com/` and is tracked for v0.19+.
- Live `/dev/sev-guest` ioctl emission. The `SNP_GET_REPORT` ioctl path
Expand Down Expand Up @@ -1681,8 +1681,8 @@ Node service) can call Vaara without spawning a Python sidecar.
stays manual. Once enabled and an ``NPM_TOKEN`` secret is set, every
tag push publishes ``@vaara/client`` to npm with provenance.
- 6 new TypeScript tests covering URL construction, JSON body
serialisation, the 4xx ``VaaraError`` path with server-supplied
code, the network-failure ``VaaraTransportError`` path, the
serialisation, the 4xx to ``VaaraError`` path with server-supplied
code, the network-failure to ``VaaraTransportError`` path, the
detector response shape, and constructor input validation.

### Notes
Expand Down Expand Up @@ -1737,7 +1737,7 @@ slot in alongside Vaara's adaptive scorer with a single object.
close the most legible competitive gaps without diluting the kernel
position. Hot policy reload meets the Galileo Agent Control selling
point on its own ground. The OVERT 1.0 Phase 3 Independent Attestation
Provider (IAP) reference closes the AAL-3 AAL-4 promotion path that
Provider (IAP) reference closes the AAL-3 to AAL-4 promotion path that
v0.11.0's Provisional Receipt opens, so Vaara owns the full path
without forcing dependence on an external IAP vendor. Named injection
and PII detectors expose existing scoring surface under buyer-visible
Expand Down Expand Up @@ -1775,7 +1775,7 @@ facing visual artefact that the peer set has converged on.
vaara-bench-v1's published numbers (heuristic fallback when the ml
extra is absent; the `backend` field reports which path served the
call). `vaara.detect.detect_pii` is a zero-dependency regex extractor
over six categories email, phone, US SSN, IPv4, credit_card
over six categories: email, phone, US SSN, IPv4, credit_card
(Luhn-checked), IBAN (mod-97 checksum). `POST /v1/detect/injection`
and `POST /v1/detect/pii` mirror the CLI. `vaara detect injection`
and `vaara detect pii` read text from `--text`, `--file`, or
Expand Down Expand Up @@ -2090,7 +2090,7 @@ their conformal interval, get claimed by an operator, and produce an
- **`vaara.audit.review_queue` module.** `ReviewQueue` is a
SQLite-backed queue in its own DB file, separate from the audit DB
(which keeps its append-only invariant clean). Statuses:
`pending claimed resolved` happy path, `pending expired`
`pending to claimed to resolved` happy path, `pending to expired`
stale path. Resolutions: `allow`, `deny`, `abstain`. `enqueue`
records each item with the conformal interval, risk signals,
bucket category, and request parameters/context as JSON. The
Expand Down Expand Up @@ -2183,7 +2183,7 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
- **`scripts/lint_full.sh` pre-push lint sweep** - chains `ruff` (style + correctness), `bandit` (security), `mypy` (types - strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites.

### Changed
- **Audit DB schema v2 v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
- **Audit DB schema v2 to v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
- **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results:
- **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
- **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant - heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
Expand Down Expand Up @@ -2211,8 +2211,8 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
- `tests/test_adversarial_classifier_integration.py` covers the bundle-load, score-range, and known-bad-input paths end-to-end. Skipped when `vaara[ml]` extras are not installed.

### Changed
- **Default classifier threshold: `0.5` `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
- **Bundle format `version` bumped `1.1` `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
- **Default classifier threshold: `0.5` to `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
- **Bundle format `version` bumped `1.1` to `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
- **`scripts/train_adversarial_classifier.py`** now coerces non-dict `context` and `parameters` entries (string-typed entries existed in the corpus from v0.5.0 onward but the trainer crashed on them) and runs `baseline_predictions` in `best_effort=True` mode. Net effect: trainer runs cleanly on the heterogeneous corpus.

### Benchmarks (5-fold CV OOF, threshold 0.55)
Expand Down Expand Up @@ -2259,7 +2259,7 @@ Per-category allow-leakage on the seed corpus (`tests/adversarial/<category>.jso
| destructive_actions | 20% | **4%** |

### Known limits / honest read
- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% 79.4%) and attack recall regressed **5.4pp** (85.2% 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% to 79.4%) and attack recall regressed **5.4pp** (85.2% to 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
- LLM-generated content shares Qwen-style writing. The distribution-shift gap between generated-test recall and hand-curated-held-out recall has **not** been measured separately in this release. It will be reported in v0.6. Hand-curated regression numbers above are evidence that transfer is happening, but a formal split is owed.
- Attacker-as-iterative-PAIR ceiling has **not** been measured. `COMPLIANCE.md` does not yet quote an adaptive-ASR figure.

Expand All @@ -2281,10 +2281,10 @@ At threshold 0.55, the 21.0% global FPR is a **reviewer queue**, not a blast doo
- `_STATIC_FEATURES` constant plus load-time schema-drift check in `src/vaara/adversarial_classifier.py`. A bundle whose `feature_names` tail diverges from the runtime static feature list now raises `ValueError` at construction time, pinpointing the first differing index. This class of bug is no longer shippable without failing loud.

### Changed
- Default threshold: `0.3` (v0.5.1) `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
- Default threshold: `0.3` (v0.5.1) to `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
- `scripts/train_adversarial_classifier.py` `load_corpus` now uses `rglob` to recurse into `tests/adversarial/generated/` and `benign_generated/` automatically.
- The `data_exfil` and `destructive_actions` regressions disclosed in v0.5.1 were artifacts of the broken bundle. The rebuilt classifier in v0.5.2 beats the heuristic in both: `destructive_actions` +40.2, `data_exfil` +24.7.
- Bundle format `version` bumped 1.0 1.1.
- Bundle format `version` bumped 1.0 to 1.1.

### Benchmarks (by-seed held-out, threshold 0.5)
- Attack recall: **85.2%**
Expand Down
2 changes: 1 addition & 1 deletion COMPLIANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,7 @@ correspondence.
policy id and violation reason.
- **TOOL-1.4** (provisional receipt before execution, upgrade to full
attestation after notary validation) - ✅ structurally at AAL-3,
with the AAL-3 AAL-4 path now implementable in-tree. The Article
with the AAL-3 to AAL-4 path now implementable in-tree. The Article
12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
Provisional Receipt; the v0.11.0 OVERT Base Envelope is the
attested form. v0.13.0 ships a reference Phase 3 IAP
Expand Down
2 changes: 1 addition & 1 deletion OVERT_CONTROLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ primitive in Section 9, MEA-2.
policy id and violation reason.
- **TOOL-1.4** (provisional receipt before execution, upgrade to full
attestation after notary validation) - ✅ structurally at AAL-3,
with the AAL-3 AAL-4 path now implementable in-tree. The Article
with the AAL-3 to AAL-4 path now implementable in-tree. The Article
12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
Provisional Receipt. The v0.11.0 OVERT Base Envelope is the
attested form. v0.13.0 ships a reference Phase 3 IAP
Expand Down
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,11 +26,11 @@ Held-out TEST recall 84.7% (95% Wilson [82.4, 86.7]) at FPR 4.1% [2.9, 5.7]. Pha
- Classifier v9 with 236 hand-features + 384-dim MiniLM embeddings at calibrated threshold 0.9150 on held-out TEST n=1,827: recall 84.7% [82.4, 86.7] at FPR 4.1% [2.9, 5.7]
- Multi-attacker PAIR robustness: 0/25 successes per attacker across Qwen2.5-32B, Qwen2.5-72B, Llama-3.3-70B hitting identical seed indices, Wilson upper 13.3%
- BIPIA-pressure FPR on benign tool calls 1.2% [0.4, 3.6] across four agent backends, n=244 benign tool calls under `context.source=injected_via_bipia_<class>`
- Chain of custody: corpus manifest SHAsplit manifest SHAtraining commit bundle SHA, all locked and printed by every script
- Chain of custody: corpus manifest SHA, split manifest SHA, training commit, bundle SHA, all locked and printed by every script
- 140 µs mean / 210 µs p99 inference latency, commodity CPU
- Distribution-free conformal coverage on the score
- MWU regret bound O(sqrt(T log N))
- [vaara-bench-v0.39](bench/vaara-bench-v0.39.md): current methodology, chain of custody, ship-gate record. v9 retrain on BIPIA-augmented corpus with follows upweighted (`--follow-weight 8.0`), calibrated to T=0.9150 at a 5% FPR target on v035 VAL. BIPIA-pressure FPR collapses from 35.2% on v8 to 1.2% on v9. In-distribution recall flat within Wilson intervals. Found-and-fixed in tree: auto-labeller `example.com` placeholder false-positive rule (42 14 true follows across four backends). Historical bench docs live under `bench/` for chain-of-custody continuity.
- [vaara-bench-v0.39](bench/vaara-bench-v0.39.md): current methodology, chain of custody, ship-gate record. v9 retrain on BIPIA-augmented corpus with follows upweighted (`--follow-weight 8.0`), calibrated to T=0.9150 at a 5% FPR target on v035 VAL. BIPIA-pressure FPR collapses from 35.2% on v8 to 1.2% on v9. In-distribution recall flat within Wilson intervals. Found-and-fixed in tree: auto-labeller `example.com` placeholder false-positive rule (42 to 14 true follows across four backends). Historical bench docs live under `bench/` for chain-of-custody continuity.
- [vaara-bench-v1](bench/vaara-bench-v1.md): 77-trace synthetic-corpus regression baseline with frozen methodology, 100% soft TPR, 0% hard FPR

Each figure is reproducible from the public corpus or the bench pipeline in `bench/`.
Expand Down Expand Up @@ -87,7 +87,7 @@ else:

The same data renders as a styled PDF for Notified Bodies (`vaara compliance report --format pdf`, requires `pip install 'vaara[pdf]'`), a static HTML dashboard (`vaara compliance dashboard`), or a Sigstore-signed regulator-handoff envelope (`vaara trail export`, optional ML-DSA-65 / FIPS 204 post-quantum signer via `pip install 'vaara[pq]'`).

Each article verdict carries `verdict_inputs` (threshold-vs-observed snapshot), `verdict_reasons` (rationale lines), and `contributing_events` (the audit records the verdict sits on, with a `drill_down` of the data that fed the risk/decision/outcome). Reviewers can trace `statusthreshold deltaconcrete event` without re-running the engine.
Each article verdict carries `verdict_inputs` (threshold-vs-observed snapshot), `verdict_reasons` (rationale lines), and `contributing_events` (the audit records the verdict sits on, with a `drill_down` of the data that fed the risk/decision/outcome). Reviewers can trace `status` to `threshold delta` to `concrete event` without re-running the engine.

## Framework adapters

Expand Down Expand Up @@ -155,7 +155,7 @@ if (r.decision === "deny") throw new Error("blocked");
Four preset operating points for the risk thresholds, shaped like CPU power profiles:

- `eco` (escalate 0.40, deny 0.60). Tight deny threshold cuts agent loops short on borderline risk. Pair with regex-first gating to short-circuit before any model forward pass.
- `balanced` (0.55, 0.85). Vaara's default behaviour.
- `balanced` (0.55, 0.85). Vaara's default behavior.
- `performance` (0.70, 0.92). Looser thresholds let more through. For high-throughput pipelines where the deployer keeps tight action-class overrides on the few classes that matter.
- `strict` (0.30, 0.55). Escalate-on-doubt. For incident response, audit prep, or production lockdown windows.

Expand Down Expand Up @@ -210,8 +210,8 @@ OVERT envelopes per governed interaction turn on with `--overt-signing-key`, `--

Worked examples:

- [`examples/github-mcp-proxy-demo/`](examples/github-mcp-proxy-demo/) Vaara in front of [`github/github-mcp-server`](https://github.com/github/github-mcp-server), 42 tools, hash-chained audit trail recorded end-to-end.
- [`examples/sap-mcp-proxy-demo/`](examples/sap-mcp-proxy-demo/) Vaara in front of community SAP MCP servers ([`SAP/mdk-mcp-server`](https://github.com/SAP/mdk-mcp-server), [`mario-andreschak/mcp-abap-abap-adt-api`](https://github.com/mario-andreschak/mcp-abap-abap-adt-api), [`lemaiwo/btp-sap-odata-to-mcp-server`](https://github.com/lemaiwo/btp-sap-odata-to-mcp-server)).
- [`examples/github-mcp-proxy-demo/`](examples/github-mcp-proxy-demo/): Vaara in front of [`github/github-mcp-server`](https://github.com/github/github-mcp-server), 42 tools, hash-chained audit trail recorded end-to-end.
- [`examples/sap-mcp-proxy-demo/`](examples/sap-mcp-proxy-demo/): Vaara in front of community SAP MCP servers ([`SAP/mdk-mcp-server`](https://github.com/SAP/mdk-mcp-server), [`mario-andreschak/mcp-abap-abap-adt-api`](https://github.com/mario-andreschak/mcp-abap-abap-adt-api), [`lemaiwo/btp-sap-odata-to-mcp-server`](https://github.com/lemaiwo/btp-sap-odata-to-mcp-server)).

## OVERT 1.0 attestation

Expand Down
4 changes: 2 additions & 2 deletions VERDICTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,8 @@ listed for that article, then evaluates in this order:
produces `moderate`. Any qualifying records below those bars are
`weak`. No records are `absent`.
6. **Future-timestamp downgrade.** If any record carries a future
timestamp, strength drops one tier (`strong` `moderate`,
`moderate` `weak`). The freshness signal cannot be trusted when
timestamp, strength drops one tier (`strong` to `moderate`,
`moderate` to `weak`). The freshness signal cannot be trusted when
the clock cannot be trusted.

## EU AI Act per-article thresholds
Expand Down
Loading