diff --git a/CHANGELOG.md b/CHANGELOG.md index 7fb434a..cd0184a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -42,15 +42,15 @@ rebuttable-presumption hook), transposition deadline 9 December 2026. statistics over Vaara's per-action conformal prediction intervals alongside the standard binomial CI. Carries the same non-parametric coverage guarantee with no distributional assumption. - Standard OVERT verifiers ignore the extension; Vaara-aware + Standard OVERT verifiers ignore the extension. Vaara-aware verifiers cross-check it against the per-action receipts. - **vaara-bench-v1.** Versioned adversarial-detection benchmark with frozen corpus (`bench/adversarial_corpus.jsonl`, SHA-256 `7a3219776e1c93a5127ab3b63832d73ba75f32fa044cabdbaa4e5d7088b33ff2`), frozen methodology (`bench/scorer_eval.py`), and frozen headline numbers under v0.11.0 (soft TPR 100%, soft FPR 20%, hard TPR - 28.85%, hard FPR 0%). Spec doc at `bench/vaara-bench-v1.md`; - machine-readable results at `bench/vaara-bench-v1-results.json`. + 28.85%, hard FPR 0%). Spec doc at `bench/vaara-bench-v1.md`. + Machine-readable results at `bench/vaara-bench-v1-results.json`. Apache-2.0 licensed. - 18 new tests in `tests/test_attestation_s3p.py` covering Clopper-Pearson against textbook values, the k=0 and k=n @@ -100,7 +100,7 @@ statement in COMPLIANCE.md. request commitments per Annex B.4 and SHA-256 encoder-identity derivation. IEEE-754 floats are rejected at the canonical-encoding boundary per Protocol Profile 1.0 numeric rules. Vaara operates as - the **Arbiter** in OVERT terms; external Independent Attestation + the **Arbiter** in OVERT terms. External Independent Attestation Providers promote AAL-3 emission to AAL-4 by attaching Phase 3 notary signatures and transparency-log inclusion proofs. Public API: `BaseEnvelope`, `emit_base_envelope`, `verify_base_envelope`, @@ -118,7 +118,7 @@ statement in COMPLIANCE.md. 3(45), not a qualified trust service under Article 3(16)) and a position statement relative to OVERT 1.0 (Glacis Technologies). Vaara is structurally independent of the agent it governs and maps to - OVERT AAL-3 operator-controlled attestation; reaching AAL-4 requires + OVERT AAL-3 operator-controlled attestation. Reaching AAL-4 requires pairing Vaara with an external Independent Attestation Provider. The design admits an external IAP layer without internal change. @@ -127,7 +127,7 @@ statement in COMPLIANCE.md. **Theme: Vaara as the kernel others build around.** v0.10.0 ships the network-callable surface, the auditor-facing evidence artefact, and the offline-verifiable receipt pair. Each of the three pieces is additive -and backward-compatible; together they reposition Vaara from a Python +and backward-compatible. Together they reposition Vaara from a Python library to a runtime kernel that control planes, audit consumers, and orchestration frameworks reference. The HTTP contract at `docs/openapi.yaml` is versioned `/v1/` independently of the project @@ -139,13 +139,13 @@ version, following the OPA pattern. contract in `docs/openapi.yaml`. Endpoints: `POST /v1/score`, `POST /v1/score/outcome`, `POST /v1/audit/events`, `GET /v1/audit/actions/{action_id}/chain`, `POST /v1/audit/verify`, - `GET /v1/server`, `GET /v1/health`. The spec is authoritative; the + `GET /v1/server`, `GET /v1/health`. The spec is authoritative. The reference server in `src/vaara/server/` is a FastAPI implementation suitable for local development and modest production loads. - **`vaara serve`** CLI subcommand. - **OpenAPI 3.1 contract at `docs/openapi.yaml`.** Stable v1 surface, intended as the integration point for control planes, orchestration - frameworks, and audit consumers. Vaara defines the interface; the + frameworks, and audit consumers. Vaara defines the interface. The vendors call it. - 11 new HTTP server tests (`tests/test_server.py`). - **Auditor-facing evidence report rendering.** New module @@ -193,7 +193,7 @@ it governs. `validate_source(source, fmt="auto")` combines load and check so a single call yields `(policy, report)` or `(None, report-with-error)`. Stable JSON shape via `ValidationReport.to_dict()`. -- **`vaara.policy.test_cases` module — Conftest analog for Vaara +- **`vaara.policy.test_cases` module - Conftest analog for Vaara policies.** `evaluate(policy, action_class, risk_score, matched_sequences=())` is the underlying primitive: applies any matched sequence pattern boosts (capped at 1.0), resolves the @@ -216,7 +216,7 @@ it governs. subcommands. Both honour standard CI exit codes: validate returns 1 on parse errors (warnings do not flip), test returns 1 on any failed case (and 2 if the policy itself fails to parse). -- **`examples/policies/test_cases.yaml`** — six worked test cases +- **`examples/policies/test_cases.yaml`** - six worked test cases exercising thresholds, sequence-pattern boost, default and article-matched escalation routes against `examples/policies/full.yaml`. @@ -235,7 +235,7 @@ it governs. ### Note Backwards-compatible. Pure addition. No existing module signatures -change. `Policy` and the load path are unchanged; the new modules +change. `Policy` and the load path are unchanged. The new modules sit beside them under `vaara.policy.*`. ### Provenance note @@ -273,7 +273,7 @@ publishes, the schema_version bumps and new fields land additively. - **`vaara trail export-incident` CLI subcommand.** Reads a trail JSONL plus an operator-supplied incident metadata JSON, writes the report to the output path. Picks the most recent trigger-eligible record by - default; explicit `--trigger-record-id ID` overrides. No external + default. Explicit `--trigger-record-id ID` overrides. No external template dependency, zero new runtime deps. - **`tests/test_incident_export.py`** covers schema shape, deadline mapping per Article 3(49) sub-category, trigger-event validation, @@ -290,7 +290,7 @@ concept, so no `AuditRecord` column was added. **Theme: human-in-the-loop review queue (Article 14).** Adds the storage layer and operator surface that turn an `escalate` decision into a substantive Article 14(4)(d) override path. The pipeline -already wrote `ESCALATION_SENT` for every escalated action; with a +already wrote `ESCALATION_SENT` for every escalated action. With a queue wired in, those actions now wait in a queryable place with their conformal interval, get claimed by an operator, and produce an `ESCALATION_RESOLVED` audit record when resolved. @@ -302,20 +302,20 @@ their conformal interval, get claimed by an operator, and produce an `pending → claimed → resolved` happy path, `pending → expired` stale path. Resolutions: `allow`, `deny`, `abstain`. `enqueue` records each item with the conformal interval, risk signals, - bucket category, and request parameters/context as JSON; the + bucket category, and request parameters/context as JSON. The interval is what makes Article 14 oversight substantive rather - than cosmetic (see `COMPLIANCE.md`). `claim` is optimistic — + than cosmetic (see `COMPLIANCE.md`). `claim` is optimistic - concurrent claim races resolve with one winner and `InvalidTransitionError` for the loser. `resolve` accepts an optional `trail` and writes `ESCALATION_RESOLVED` so the Article 14(4)(d) evidence row lands on the hash chain. `expire_stale` - marks pending items past a timeout; claimed items are left alone + marks pending items past a timeout. Claimed items are left alone since they are under active review. - **`InterceptionPipeline(review_queue=...)`.** Optional constructor parameter. When supplied, every `escalate` decision is enqueued alongside the existing `ESCALATION_SENT` audit record. Default `None` preserves prior behaviour bit-for-bit. Queue write failure - logs and continues — the action is already gated by the escalate + logs and continues - the action is already gated by the escalate verdict and the audit record stands. - **`vaara review` CLI.** Subcommands `list`, `show`, `claim`, `resolve`, `expire`. `resolve --audit-db PATH` writes the @@ -333,8 +333,8 @@ their conformal interval, get claimed by an operator, and produce an subcommand including `resolve --audit-db` writing the audit row. ### Note -Backwards-compatible. Pure addition. No existing schemas migrate; -the queue lives in its own DB file with its own `review_queue_meta` +Backwards-compatible. Pure addition. No existing schemas migrate. +The queue lives in its own DB file with its own `review_queue_meta` schema-version row. ## [0.7.0] - 2026-05-10 @@ -366,41 +366,41 @@ Backwards-compatible release. All four PRs are additive. Mondrian is opt-in via **Theme: documentation sync to PyPI.** v0.6.0 shipped the functional changes (policy DSL, retention purge, transparency taxonomy, distribution-shift / stack-ablation / PAIR evals, lint sweep) but the README, library docstring, example file headers, and PyPI tagline stayed at the v0.5.0 framing. PyPI's package page kept publishing pre-rebalance numbers and the wrong default threshold. v0.6.1 ships only the documentation cleanup so new PyPI installs see the current state. ### Changed -- `README.md`: replaced the "Numbers" section with the v0.6 distribution-shift table (97.1% recall / 70.0% FPR hand-curated held-out; 95.2% / 87.5% LLM-generated in-sample) plus PAIR ASR 0.0% (0/25). Threshold default 0.5 -> 0.55, corpus description updated to the 5,955-entry rebalanced corpus, threshold-direction note corrected (recall drops as threshold rises). PR #44. -- `src/vaara/adversarial_classifier.py`: rewrote the module docstring. Removed all version-bound numbers; readers now point at README + COMPLIANCE so the docstring does not go stale on every release. PR #44. +- `README.md`: replaced the "Numbers" section with the v0.6 distribution-shift table (97.1% recall / 70.0% FPR hand-curated held-out. 95.2% / 87.5% LLM-generated in-sample) plus PAIR ASR 0.0% (0/25). Threshold default 0.5 -> 0.55, corpus description updated to the 5,955-entry rebalanced corpus, threshold-direction note corrected (recall drops as threshold rises). PR #44. +- `src/vaara/adversarial_classifier.py`: rewrote the module docstring. Removed all version-bound numbers. Readers now point at README + COMPLIANCE so the docstring does not go stale on every release. PR #44. - `examples/adversarial_classifier.py`: ship-note threshold 0.8 -> 0.55. PR #44. - `scripts/classifier_vs_heuristic.py`: clarified the script is the v0.5.0 historical reproducer, not the current production training path. PR #44. - `pyproject.toml`: rephrased description for cleaner PyPI tagline rendering. PR #45. ### Note -No functional code changes. v0.6.0 users are on the same code; v0.6.1 only refreshes documentation surfaces visible to new PyPI installs and to anyone reading the package source. +No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refreshes documentation surfaces visible to new PyPI installs and to anyone reading the package source. ## [0.6.0] - 2026-04-27 **Theme: standards alignment + legibility.** v0.5.x was the capability axis (jailbreak coverage closed, classifier rebalanced). v0.6 is the legibility axis: policies become readable, audit records become standards-aligned, adversarial numbers become honest, architecture contribution becomes documented. ### Added -- **`vaara.policy` package — JSON-native policy loader plus optional YAML via `vaara[yaml]` extra.** Frozen dataclasses for action classes, threshold curves, sequence patterns, and escalation routes. Hand-rolled validation with field-path error messages. Reuses existing `vaara.taxonomy.actions` enums verbatim. Threshold partial-overrides supported (set just `deny`, inherit default `escalate`). Implements Sketch A from the v0.6 DSL design exploration; embedded Python DSL (Sketch B) and standalone DSL (Sketch C) stay deferred to v0.7+ pending external pull. -- **`vaara trail purge --db PATH --retention-days N (--tenant TID | --all-tenants) [--dry-run]` CLI subcommand** plus `SQLiteAuditBackend.purge_older_than(seconds, *, dry_run=False)` Python API. Article 12(2) retention enforcement. Tenant scoping is required: pick `--tenant TID` for a single tenant or `--all-tenants` explicitly, so a shared multi-tenant audit DB can never be silently purged across all tenants. Hash-chain integrity: surviving records still reference deleted predecessors via `previous_hash`, leaving a documented seam at the retention boundary that subsequent loads expose as a hash mismatch. Intended workflow: export a signed handoff zip BEFORE purging, archive externally, then purge. The signed zip remains self-consistent forever; the live DB chain has the seam. -- **prEN ISO/IEC 12792 four-axis transparency taxonomy on `AuditRecord`.** Four optional fields (`system_operation`, `data_usage`, `decision_making`, `limitations`) with default-classification heuristic per `EventType`. Per-record override via construction kwargs. NOT tamper-evident in v0.6 — fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. v0.7+ may add a separate signing mechanism if compliance requires. -- **`scripts/eval_distribution_shift.py`** — runs the full Vaara stack against the adversarial corpus with per-source tagging (hand-curated vs LLM-generated). Reports recall and FPR per source/class. -- **`scripts/eval_stack_ablation.py`** — runs three configurations (heuristic-only, classifier-only, full-stack) against the same corpus. Quantifies the independent contribution of each layer. -- **`scripts/eval_pair_attack.py`** — PAIR (Chao et al. 2023) iterative adaptive attacker. Uses an OpenAI-compatible vLLM endpoint for both attacker and judge roles. Zero new runtime deps (uses `urllib.request`). +- **`vaara.policy` package - JSON-native policy loader plus optional YAML via `vaara[yaml]` extra.** Frozen dataclasses for action classes, threshold curves, sequence patterns, and escalation routes. Hand-rolled validation with field-path error messages. Reuses existing `vaara.taxonomy.actions` enums verbatim. Threshold partial-overrides supported (set just `deny`, inherit default `escalate`). Implements Sketch A from the v0.6 DSL design exploration. Embedded Python DSL (Sketch B) and standalone DSL (Sketch C) stay deferred to v0.7+ pending external pull. +- **`vaara trail purge --db PATH --retention-days N (--tenant TID | --all-tenants) [--dry-run]` CLI subcommand** plus `SQLiteAuditBackend.purge_older_than(seconds, *, dry_run=False)` Python API. Article 12(2) retention enforcement. Tenant scoping is required: pick `--tenant TID` for a single tenant or `--all-tenants` explicitly, so a shared multi-tenant audit DB can never be silently purged across all tenants. Hash-chain integrity: surviving records still reference deleted predecessors via `previous_hash`, leaving a documented seam at the retention boundary that subsequent loads expose as a hash mismatch. Intended workflow: export a signed handoff zip BEFORE purging, archive externally, then purge. The signed zip remains self-consistent forever. The live DB chain has the seam. +- **prEN ISO/IEC 12792 four-axis transparency taxonomy on `AuditRecord`.** Four optional fields (`system_operation`, `data_usage`, `decision_making`, `limitations`) with default-classification heuristic per `EventType`. Per-record override via construction kwargs. NOT tamper-evident in v0.6 - fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. v0.7+ may add a separate signing mechanism if compliance requires. +- **`scripts/eval_distribution_shift.py`** - runs the full Vaara stack against the adversarial corpus with per-source tagging (hand-curated vs LLM-generated). Reports recall and FPR per source/class. +- **`scripts/eval_stack_ablation.py`** - runs three configurations (heuristic-only, classifier-only, full-stack) against the same corpus. Quantifies the independent contribution of each layer. +- **`scripts/eval_pair_attack.py`** - PAIR (Chao et al. 2023) iterative adaptive attacker. Uses an OpenAI-compatible vLLM endpoint for both attacker and judge roles. Zero new runtime deps (uses `urllib.request`). - **`[yaml]` optional extra in `pyproject.toml`** (`pyyaml>=6.0`). Core `dependencies = []` preserved. - **`examples/policies/minimal.json` and `full.yaml`** as reference policies. -- **COMPLIANCE.md gains "EU AI Act Annex IV evidence sections"** (maps Vaara contribution per §1–§9; direct fill on §3, §5, §9; contributes on §2, §4, §6, §7; out of scope for §1, §8) **and "CEN-CENELEC harmonised standards alignment"** (per-standard table for ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792). -- **`scripts/lint_full.sh` pre-push lint sweep** — chains `ruff` (style + correctness), `bandit` (security), `mypy` (types — strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites. +- **COMPLIANCE.md gains "EU AI Act Annex IV evidence sections"** (maps Vaara contribution per §1–§9. Direct fill on §3, §5, §9. Contributes on §2, §4, §6, §7. Out of scope for §1, §8) **and "CEN-CENELEC harmonised standards alignment"** (per-standard table for ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792). +- **`scripts/lint_full.sh` pre-push lint sweep** - chains `ruff` (style + correctness), `bandit` (security), `mypy` (types - strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites. ### Changed -- **Audit DB schema v2 → v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns; their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work. +- **Audit DB schema v2 → v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work. - **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results: - **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal. - - **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant — heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades. - - **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. + - **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant - heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades. + - **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers - stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. ### Deferred to v0.7+ -- **prEN ISO/IEC 24970 field-alias layer** — pending public final of the standard. Will land when 24970 publishes. -- **DORA mapping refinement** — pending deployer-side signal. Conservative defaults shipped in v0.5.3 stay until a financial deployer's input refines them. +- **prEN ISO/IEC 24970 field-alias layer** - pending public final of the standard. Will land when 24970 publishes. +- **DORA mapping refinement** - pending deployer-side signal. Conservative defaults shipped in v0.5.3 stay until a financial deployer's input refines them. ### Reproducible artifacts - `tests/adversarial/distribution_shift_v0_5_3.json` @@ -525,7 +525,7 @@ v0.5.1 remains on PyPI but ships a broken classifier. Upgrade to 0.5.2 to get th ### Changed - `AdversarialClassifier` retrained on an expanded benign corpus to reduce false-positive rate in live agent traffic. -- Recommended operating threshold changed from `0.8` to `0.3` — the added benigns shifted the score distribution, and 0.3 is now the optimal balanced-accuracy point. +- Recommended operating threshold changed from `0.8` to `0.3` - the added benigns shifted the score distribution, and 0.3 is now the optimal balanced-accuracy point. ### Added - `tests/adversarial/benign_generated/BT-new-http_post.jsonl` (170 variants) @@ -541,7 +541,7 @@ v0.5.1 remains on PyPI but ships a broken classifier. Upgrade to 0.5.2 to get th ### Known regressions (disclosed) The new benigns shifted the decision surface toward allow. Per-category accuracy regressed in three attack categories: -- `data_exfil`: 0% (was 28.6% heuristic baseline — classifier now worse than heuristic here) +- `data_exfil`: 0% (was 28.6% heuristic baseline - classifier now worse than heuristic here) - `destructive_actions`: 25% (was 87.5% heuristic) - `jailbreak`: 0% (was 100% heuristic) @@ -550,12 +550,12 @@ The heuristic scorer retains strong coverage in these categories. Stack both rat ## [0.5.0] - 2026-04-23 ### Added -- `AdversarialClassifier` — opt-in XGBoost scorer for adversarial tool-call detection. Install with `pip install vaara[ml]`. -- `src/vaara/data/adversarial_classifier_v1.joblib` — 295 KB pre-trained bundle shipped with the wheel. -- `scripts/classifier_vs_heuristic.py` — reproducible comparison harness (by-seed train/test split, no leakage). -- `tests/adversarial/generated/` — 1945 LLM-generated adversarial variants across 8 attack categories (Qwen2.5-3B on MI300X, batched). -- `tests/adversarial/benign_generated/` — 635 benign variants (487 broad + 148 targeted at http_post/send_email/shell_exec legit uses). -- `[project.optional-dependencies.ml]` — `xgboost`, `scikit-learn`, `joblib`, `numpy`. Default install stays zero-dep. +- `AdversarialClassifier` - opt-in XGBoost scorer for adversarial tool-call detection. Install with `pip install vaara[ml]`. +- `src/vaara/data/adversarial_classifier_v1.joblib` - 295 KB pre-trained bundle shipped with the wheel. +- `scripts/classifier_vs_heuristic.py` - reproducible comparison harness (by-seed train/test split, no leakage). +- `tests/adversarial/generated/` - 1945 LLM-generated adversarial variants across 8 attack categories (Qwen2.5-3B on MI300X, batched). +- `tests/adversarial/benign_generated/` - 635 benign variants (487 broad + 148 targeted at http_post/send_email/shell_exec legit uses). +- `[project.optional-dependencies.ml]` - `xgboost`, `scikit-learn`, `joblib`, `numpy`. Default install stays zero-dep. ### Benchmarks On the by-seed held-out test (50 adversarial seeds held out entirely, plus 60 benign entries), classifier at threshold 0.8: @@ -564,7 +564,7 @@ On the by-seed held-out test (50 adversarial seeds held out entirely, plus 60 be - Balanced accuracy: 74% (vs 50% heuristic) - Latency: 140 µs mean, 210 µs p99 (commodity CPU, no GPU at inference) -Live dogfood on 174 Qwen-generated prompts with an end-to-end LangChain-style agent loop: recall 70%, FPR 51% — real-world distribution is tougher than the clean held-out test. Ship with `decision="escalate"` in production, not `decision="deny"`. +Live dogfood on 174 Qwen-generated prompts with an end-to-end LangChain-style agent loop: recall 70%, FPR 51% - real-world distribution is tougher than the clean held-out test. Ship with `decision="escalate"` in production, not `decision="deny"`. ## [0.4.4] - 2026-04-22 diff --git a/COMPLIANCE.md b/COMPLIANCE.md index 0843fe3..ef77df3 100644 --- a/COMPLIANCE.md +++ b/COMPLIANCE.md @@ -31,7 +31,7 @@ types that populate evidence for it. The list matches the default | **11(1)** | Technical Documentation | Checked outside the audit trail. Vaara does not replace the Annex IV technical file. | | **12(1)** | Record-Keeping (Logging) | Every `ACTION_REQUESTED`, `RISK_SCORED`, and `DECISION_MADE` is written to a hash-chained, tamper-evident trail. See "Audit trail integrity" below. | | **13(1)** | Transparency and Provision of Information | `RISK_SCORED` and `DECISION_MADE` records carry the risk score, the interval, the decision, and the reason string shown to the operator. | -| **14(1)** | Human Oversight -- Design | `ESCALATION_SENT` and `ESCALATION_RESOLVED` events prove the oversight path exists and was exercised. The `vaara.audit.review_queue` storage layer turns `escalate` into a substantive queued-for-review step rather than a fire-and-forget log line; the `vaara review` CLI is the operator surface. | +| **14(1)** | Human Oversight -- Design | `ESCALATION_SENT` and `ESCALATION_RESOLVED` events prove the oversight path exists and was exercised. The `vaara.audit.review_queue` storage layer turns `escalate` into a substantive queued-for-review step rather than a fire-and-forget log line. The `vaara review` CLI is the operator surface. | | **14(4)(d)** | Human Oversight -- Override Capability | `ESCALATION_RESOLVED` and `POLICY_OVERRIDE` events prove a human can decide not to proceed or can override Vaara's decision. The `vaara review resolve --audit-db PATH` CLI writes the `ESCALATION_RESOLVED` row directly from an operator action, so the override is a single recorded transaction rather than an out-of-band promise. | | **15(1)** | Accuracy, Robustness and Cybersecurity | `OUTCOME_RECORDED` events feed the adaptive scorer. Recency is tracked (default weekly calibration window). | | **61(1)** | Post-Market Monitoring | `OUTCOME_RECORDED` events form the post-market signal, tied back to the original action via `action_id`. | @@ -56,7 +56,7 @@ independently from the agent code that uses it: (parse errors plus warnings for narrow threshold bands, dangling per-class overrides, unreachable escalation routes, sequence steps not naming a declared action class, missing default escalation - route). Exit code 0 if no errors; warnings print without flipping + route). Exit code 0 if no errors. Warnings print without flipping the exit code. - `vaara policy test POLICY_PATH --cases CASES_PATH` runs a YAML/JSON cases file against the policy (Conftest analog for Vaara). Each @@ -131,7 +131,7 @@ finalize. Status as of April 2026. | **ISO/IEC 42006** Requirements for AI Management System Auditors | WG2 | DIS Stage 40 | Vaara's hash-chained trail is the artefact 42006-qualified auditors examine for surveillance evidence. | | **prEN ISO/IEC 24970** AI System Logging | WG3 | Stage 30.2 (comment resolution) | Vaara aligns with the tamper-resistance, decision-factor logging, and audit-system integration requirements. Field-level alignment pending the published version. | | **prEN 18229-1** Trustworthiness Framework Pt 1 (logging, transparency, human oversight) | WG4 | Public enquiry | Implements AI Act Articles 12-14, which Vaara already maps to in the article table above. Field-level alignment pending the published version. | -| **prEN ISO/IEC 12792** Transparency Taxonomy of AI Systems | WG4 | Stage 40 (final vote) | v0.6 ships per-action audit records tagged against the four-axis model (System Operation, Data Usage, Decision Making, Limitations) via four optional `AuditRecord` fields. Default classification heuristic by event type; per-record override available. NOT tamper-evident in v0.6 — fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. | +| **prEN ISO/IEC 12792** Transparency Taxonomy of AI Systems | WG4 | Stage 40 (final vote) | v0.6 ships per-action audit records tagged against the four-axis model (System Operation, Data Usage, Decision Making, Limitations) via four optional `AuditRecord` fields. Default classification heuristic by event type. Per-record override available. NOT tamper-evident in v0.6 - fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. | **What "alignment" means here.** Most of these standards have not published. The mapping above is pre-compliance positioning: Vaara is @@ -220,7 +220,7 @@ qualified seal or signature without changing the underlying evidence. ## Position relative to open runtime-attestation standards The runtime-attestation space is converging on the principle that -**self-attestation is not sufficient** — the entity attesting to +**self-attestation is not sufficient** - the entity attesting to governance should be structurally independent of the entity being governed. OVERT 1.0 (Glacis Technologies, overt.is) makes this explicit through its four-tier Attestation Assurance Level model, @@ -239,8 +239,8 @@ Vaara's position in this picture: operator.** The operator deploys Vaara in their own environment. In OVERT 1.0 terms this maps to AAL-3 (automated monitoring with operator-controlled infrastructure), not AAL-4. Reaching AAL-4 - requires layering an external IAP — a notary service that the - operator does not control — on top of Vaara's emitted evidence. + requires layering an external IAP - a notary service that the + operator does not control - on top of Vaara's emitted evidence. - **Vaara's design admits an external IAP layer without internal change.** The hash chain, the commit-prove receipt pair, and the HTTP API surface all produce structured, signable artefacts that @@ -251,7 +251,7 @@ Vaara's position in this picture: This positioning is deliberate. Vaara does not claim AAL-4 conformance and does not market a self-attestation pattern. Operators who need AAL-4 should pair Vaara with an independent -attestation provider; the Vaara-emitted evidence is the input to +attestation provider. The Vaara-emitted evidence is the input to that provider, not a replacement for it. ## OVERT 1.0 Part 3 (Agentic AI Controls) mapping @@ -263,141 +263,141 @@ the-loop attestation, and behavioural drift governance. The mapping below states, control by control, whether Vaara satisfies the requirement today (✅), partially satisfies it (◐), or leaves it as explicit gap-to-deployer or future-work (◯). Per OVERT Annex F.2 this -mapping does not establish legal compliance with any regulation; it +mapping does not establish legal compliance with any regulation. It records technical correspondence. -### Section 11 — Tool-Call Governance +### Section 11 - Tool-Call Governance -- **TOOL-1.1** (intercept all tool calls before execution) — ✅. - `InterceptionPipeline.intercept()` is the enforcement boundary; no +- **TOOL-1.1** (intercept all tool calls before execution) - ✅. + `InterceptionPipeline.intercept()` is the enforcement boundary. No tool call proceeds without a governance decision. -- **TOOL-1.2** (evaluate against capability policy) — ✅. The policy +- **TOOL-1.2** (evaluate against capability policy) - ✅. The policy DSL declares permitted tools, parameter ranges, destinations, and - approval gates; `policy.evaluate` returns the verdict carried in + approval gates. `policy.evaluate` returns the verdict carried in the per-call receipt. - **TOOL-1.3** (denial receipt with policy reference and violation - type) — ✅. Denials emit a `DENY` event on the hash chain with + type) - ✅. Denials emit a `DENY` event on the hash chain with policy id and violation reason. - **TOOL-1.4** (provisional receipt before execution, upgrade to full - attestation after notary validation) — ◐ at AAL-3. The Article 12 + attestation after notary validation) - ◐ at AAL-3. The Article 12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2 Provisional Receipt. Phase 3 (full notary attestation) requires an external IAP per the OVERT-position section above. - **TOOL-2.1** (explicit function allowlist with hash in policy - attestation) — ✅. Policy hash flows into `encoder_binary_identity` + attestation) - ✅. Policy hash flows into `encoder_binary_identity` in the Base Envelope (v0.11.0). -- **TOOL-2.2** (parameter schema validation before execution) — ✅ - for declared parameter shapes; ◐ for arbitrary deep schemas (the +- **TOOL-2.2** (parameter schema validation before execution) - ✅ + for declared parameter shapes. ◐ for arbitrary deep schemas (the policy DSL is intentionally bounded). -- **TOOL-2.3** (rejection receipt with parameter violation detail) — +- **TOOL-2.3** (rejection receipt with parameter violation detail) - ✅. -- **TOOL-3.1** (per-tool rate limits with attested enforcement) — ◐. - The adaptive scorer applies velocity-aware risk signals; explicit +- **TOOL-3.1** (per-tool rate limits with attested enforcement) - ◐. + The adaptive scorer applies velocity-aware risk signals. Explicit per-tool calls-per-epoch counters are not yet emitted as standalone receipts. -- **TOOL-3.2** (per-session / per-user velocity caps) — ◐ via the +- **TOOL-3.2** (per-session / per-user velocity caps) - ◐ via the agent profile in `scorer/adaptive.py`. -- **TOOL-3.3** (circuit breakers on error / violation rate) — ◐ in - policy DSL; circuit-breaker receipt not yet a first-class artefact. -- **TOOL-3.4** (recursion-depth termination per trace_id) — ◯. - Not implemented; agent-loop termination is currently the deployer's +- **TOOL-3.3** (circuit breakers on error / violation rate) - ◐ in + policy DSL. Circuit-breaker receipt not yet a first-class artefact. +- **TOOL-3.4** (recursion-depth termination per trace_id) - ◯. + Not implemented. Agent-loop termination is currently the deployer's responsibility. -- **TOOL-4** (human approval gates) — ◐. The SQLite-backed review +- **TOOL-4** (human approval gates) - ◐. The SQLite-backed review queue (`vaara.audit.review_queue`) routes `ESCALATE` verdicts to human reviewers and records `ESCALATION_RESOLVED` events with reviewer identity, timestamp, and decision. TOOL-4.4 approval- velocity caps are not enforced. -- **TOOL-5** (tamper-evident tool-call log with epoch attestation) — +- **TOOL-5** (tamper-evident tool-call log with epoch attestation) - ✅ for TOOL-5.1 and TOOL-5.2 (hash-chained `AuditTrail`, Article 12 commit-prove receipt pair). TOOL-5.3 epoch notary attestation is the external-IAP layer. -### Section 11.5 — MCP Server Trust Governance +### Section 11.5 - MCP Server Trust Governance Vaara ships an MCP server (`vaara.integrations.mcp_server`) that -exposes governance tools to MCP clients; it does not currently act +exposes governance tools to MCP clients. It does not currently act as an MCP *client* governing tools hosted on third-party MCP servers. The MCP-1/2/3 control set therefore applies to Vaara only in the **custom (operator-hosted)** mode (MCP-2): the operator runs the Vaara MCP server in their own environment. -- **MCP-2.1** (server binary identity in co-epoch binding) — ◐ at +- **MCP-2.1** (server binary identity in co-epoch binding) - ◐ at v0.12.0: arbiter binary identity is captured in - `encoder_binary_identity`; a dedicated MCP-server binary identity + `encoder_binary_identity`. A dedicated MCP-server binary identity field is future work. -- **MCP-2.2** (network topology attestation) — ◯. Deployer concern; +- **MCP-2.2** (network topology attestation) - ◯. Deployer concern. Vaara does not measure its own network position. -- **MCP-2.3** (per-call authorization at the MCP server boundary) — +- **MCP-2.3** (per-call authorization at the MCP server boundary) - ✅. Every MCP tool invocation passes through `intercept()`. -- **MCP-2.4** (configuration change detection within an epoch) — ◯. +- **MCP-2.4** (configuration change detection within an epoch) - ◯. Future work. - **MCP-1** and **MCP-3** (managed-vendor and external-third-party - MCP servers) — outside Vaara's current surface. An operator using + MCP servers) - outside Vaara's current surface. An operator using Vaara as the *governor in front of* a third-party MCP server would - need adapter work; the architecture admits it but no implementation + need adapter work. The architecture admits it but no implementation ships today. -### Section 12 — Multi-Agent System Controls +### Section 12 - Multi-Agent System Controls -- **MULTI-1** (inter-agent trust boundaries) — ◯. Per-agent policy +- **MULTI-1** (inter-agent trust boundaries) - ◯. Per-agent policy evaluation works today (each `intercept()` call carries an `agent_id`), but agent-vs-agent trust separation is not enforced beyond what the deployment policy declares. -- **MULTI-2** (agent composition / topology attestation) — ◯. - Deployer-side documentation; no Vaara-emitted topology receipt. +- **MULTI-2** (agent composition / topology attestation) - ◯. + Deployer-side documentation. No Vaara-emitted topology receipt. -### Section 13 — Capability-Based Access Control +### Section 13 - Capability-Based Access Control -- **CAP-1** (data provenance tracking) — ◐. The taxonomy and policy - DSL accept provenance tags on actions; transformation propagation +- **CAP-1** (data provenance tracking) - ◐. The taxonomy and policy + DSL accept provenance tags on actions. Transformation propagation (CAP-1.2) is the deployer's responsibility because Vaara intercepts tool calls, not arbitrary data transformations inside the agent process. - **CAP-2** (architectural separation of planning from untrusted - data) — ◯. AAL-2 documentation at most; this is a deployer-side + data) - ◯. AAL-2 documentation at most. This is a deployer-side architecture choice that Vaara records but does not enforce. -### Section 14 — Agent Disclosure and Transparency +### Section 14 - Agent Disclosure and Transparency -- **DISC-1.1** (capability documentation) — ◐ via the deployer's +- **DISC-1.1** (capability documentation) - ◐ via the deployer's policy file + `vaara compliance report`. -- **DISC-1.2** (AIBOM in CycloneDX-AI or SPDX 3.0) — ◯. Future - work; the auditor-facing evidence export (v0.10.0) is a candidate +- **DISC-1.2** (AIBOM in CycloneDX-AI or SPDX 3.0) - ◯. Future + work. The auditor-facing evidence export (v0.10.0) is a candidate surface to embed AIBOM references. - **DISC-1.3** (attestation summary with coverage ratio, S3P - signals, override frequency) — ◐ from v0.12.0: Vaara now emits + signals, override frequency) - ◐ from v0.12.0: Vaara now emits S3P attestations (`vaara.attestation.s3p`) carrying coverage - ratio and binomial CI; the deployer aggregates these for + ratio and binomial CI. The deployer aggregates these for disclosure. -### Section 15 — Human-in-the-Loop Attestation +### Section 15 - Human-in-the-Loop Attestation -- **HITL-1** (consent attestation) — ◯. Deployer-side concern; +- **HITL-1** (consent attestation) - ◯. Deployer-side concern. Vaara does not collect end-user consent. -- **HITL-2** (human review attestation) — ◐. Review-queue resolution +- **HITL-2** (human review attestation) - ◐. Review-queue resolution events on the audit chain carry reviewer identity (when supplied by the deployer), timestamp, decision, and reference to the original `ESCALATE` verdict by `action_id`. AAL-4 identity binding is the deployer's responsibility. -- **HITL-3** (human correction and override) — ◐ via +- **HITL-3** (human correction and override) - ◐ via `report_outcome` and the review-queue resolution event. - **HITL-4** (policy and configuration approval with separation of - duties) — ◯ at the receipt level; policy-change approval is + duties) - ◯ at the receipt level. Policy-change approval is currently a git-history artefact, not an attested OVERT event. -- **SESS-1..5** (session-scoped attestation) — ◯. +- **SESS-1..5** (session-scoped attestation) - ◯. - **STATE-1, STATE-2** (durable state sealing and prompt artifact - binding) — ◯. -- **IDENT-1** (federated identity / token provenance chain) — ◐. + binding) - ◯. +- **IDENT-1** (federated identity / token provenance chain) - ◐. `vaara.auth` accepts authenticated caller identity into the audit - record; full delegation-chain attestation per IDENT-1.2 is future + record. Full delegation-chain attestation per IDENT-1.2 is future work. -### Section 16 — Behavioural Drift Governance +### Section 16 - Behavioural Drift Governance -- **DRIFT-1** (baseline intent declaration) — ◯. Future work; the +- **DRIFT-1** (baseline intent declaration) - ◯. Future work. The policy DSL is the candidate surface for machine-readable behavioural bounds. -- **DRIFT-2** and downstream drift controls — ◐ in spirit. The +- **DRIFT-2** and downstream drift controls - ◐ in spirit. The adaptive scorer tracks coverage error via FACI (`scorer/adaptive.py`) and emits drift signals through audit events, but these are not yet packaged as DRIFT-* receipts. @@ -407,31 +407,31 @@ the Vaara MCP server in their own environment. S3P sits in Domain 5 (MEASURE), not Part 3, but it is the agentic- relevant measurement primitive that ties everything above together. -- **MEA-1** (deterministic sampling infrastructure) — ◯. Vaara - evaluates every intercepted action; sampling-rate-based +- **MEA-1** (deterministic sampling infrastructure) - ◯. Vaara + evaluates every intercepted action. Sampling-rate-based measurement is opt-in. A deployer who wants S3P sampling provides the PRF tag and threshold. -- **MEA-2.1** (epoch nonce commitment) — ✅ via +- **MEA-2.1** (epoch nonce commitment) - ✅ via `vaara.attestation.s3p.make_epoch_nonce_commitment`. -- **MEA-2.4** (exact binomial CI) — ✅. Pure-Python Clopper-Pearson - via the regularized incomplete beta function; no scipy dependency. +- **MEA-2.4** (exact binomial CI) - ✅. Pure-Python Clopper-Pearson + via the regularized incomplete beta function. No scipy dependency. - **MEA-2.6** (closed-schema S3P attestation, Ed25519-signed, - canonical CBOR per Protocol Profile 1.0) — ✅ via + canonical CBOR per Protocol Profile 1.0) - ✅ via `emit_s3p_attestation`. - **Vaara conformal extension (proposed Protocol Profile extension):** the `ConformalExtension` field reports aggregate statistics over Vaara's per-action conformal prediction intervals alongside the standard Clopper-Pearson CI. The conformal aggregates carry the same non-parametric coverage guarantee with - no distributional assumption — exactly the property MEA-2.4 + no distributional assumption - exactly the property MEA-2.4 requires from a method offered as an alternative to (or complement of) Clopper-Pearson. The extension rides in a single - field in the signed metadata; standard OVERT verifiers ignore it. + field in the signed metadata. Standard OVERT verifiers ignore it. ## EU Product Liability Directive 2024/2853 Directive (EU) 2024/2853 of 23 October 2024 on liability for defective -products treats software — including AI systems — as a product within +products treats software - including AI systems - as a product within scope of strict product-liability rules. Member State transposition deadline is **9 December 2026** (Article 22). The provisions that matter for runtime evidence: @@ -440,7 +440,7 @@ matter for runtime evidence: national court SHALL presume the defectiveness of a product, or the causal link between defectiveness and damage, where the claimant faces excessive difficulties proving the technical - facts — in particular due to the technical complexity of the + facts - in particular due to the technical complexity of the product (Article 9(4)). The defendant rebuts the presumption by showing the product was not defective. - **Article 7 (Defectiveness assessment).** Defectiveness is @@ -465,7 +465,7 @@ How Vaara fits: - The hash-chain integrity, Ed25519 signatures, and Article 12 receipt pair give the evidence the tamper-evident shape that national courts will expect from contemporaneous records. -- Vaara does not generate liability defences; it produces the +- Vaara does not generate liability defences. It produces the technical evidence those defences are built from. Legal strategy, expert witness work, and the substantive risk-management policy remain with the deployer's counsel. @@ -565,7 +565,7 @@ problem: `vaara trail verify` will report a chain break at the boundary. Intended workflow: export a signed handoff zip BEFORE purging, archive the zip externally for long-tail audit history, then purge - the live DB. The signed zip remains self-consistent forever; the + the live DB. The signed zip remains self-consistent forever. The live DB chain has a documented seam at the retention boundary. ## Current limits @@ -604,7 +604,7 @@ Honest about the edges: Note on FPR vs CHANGELOG headline: the CHANGELOG quotes "global benign FPR 21.0%" which is classifier-alone 5-fold CV OOF. The full-stack - numbers above are dominated by the heuristic — most benign escalations + numbers above are dominated by the heuristic - most benign escalations come from the heuristic `ESCALATE` branch, not from classifier upgrades on heuristic-`ALLOW`ed entries. @@ -614,8 +614,8 @@ Honest about the edges: a v0.7 follow-up if the gap demands it. - **Stack composition (v0.6 measurement).** The full-stack numbers above decompose into independent layer contributions. `heuristic_only` recall - is 35% / 63% (hand-curated / LLM-generated); `classifier_only` recall - is 94% / 86%. Layers are not redundant — heuristic catches a small set + is 35% / 63% (hand-curated / LLM-generated). `classifier_only` recall + is 94% / 86%. Layers are not redundant - heuristic catches a small set of attacks the classifier misses, justifying the ensemble. Most of the full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades. Detailed breakdown: `tests/adversarial/stack_ablation_v0_5_3.json`. @@ -629,13 +629,13 @@ Honest about the edges: - Total LLM calls: 125 attacker iterations across 25 seeds, plus judge confirmations on heuristic-ALLOW outcomes - **ASR: 0.0% (0/25)**. Across 125 candidate prompts, Vaara - escalated 124 and allowed 1; the judge ruled the allowed candidate + escalated 124 and allowed 1. The judge ruled the allowed candidate not a successful jailbreak. Reading: Vaara stack catches DAN-roleplay, "hypothetical scenario", and "security drill" -style jailbreak attempts at this attacker capability level. NOT a claim of imperviousness to all adaptive attackers - — a stronger attacker model (70B+), longer iteration budgets, or + - a stronger attacker model (70B+), longer iteration budgets, or different strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. v0.7 follow-up: re-run with 70B+ attacker + judge if a compliance audience requires the harder calibration. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2750ad0..58e8d5c 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -36,7 +36,7 @@ pip install -e '.[dev]' # one-time setup scripts/lint_full.sh ``` -The script chains four checks: `ruff` (style + correctness), `bandit` (security), `mypy` (types — strict on `vaara.policy`, lenient elsewhere while legacy modules are migrated), and `pytest`. Total runtime ~10s. CI runs the same gates, so a green local sweep should mean a green PR. +The script chains four checks: `ruff` (style + correctness), `bandit` (security), `mypy` (types - strict on `vaara.policy`, lenient elsewhere while legacy modules are migrated), and `pytest`. Total runtime ~10s. CI runs the same gates, so a green local sweep should mean a green PR. New modules under `src/vaara/` are expected to type-check cleanly. As legacy modules get cleaned up, add them to the strict mypy block in `pyproject.toml` so the typing floor only ratchets upward. diff --git a/README.md b/README.md index 4c23f49..3de6784 100644 --- a/README.md +++ b/README.md @@ -71,11 +71,11 @@ curl -sX POST http://localhost:8000/v1/score \ -d '{"tool_name":"tx.transfer","agent_id":"agent-007","base_risk_score":0.5}' ``` -The contract is in [docs/openapi.yaml](docs/openapi.yaml). Vaara defines the interface; control-plane and orchestration vendors call it. Integration recipes for adopters live under `examples/recipes/`. +The contract is in [docs/openapi.yaml](docs/openapi.yaml). Vaara defines the interface. Control-plane and orchestration vendors call it. Integration recipes for adopters live under `examples/recipes/`. ## OVERT 1.0 attestation -Vaara is the first OSS Python reference implementation of the OVERT 1.0 ([overt.is](https://overt.is/), Glacis Technologies, March 2026) Protocol Profile 1.0 Base Envelope at AAL-3 Phase 2 (Provisional Receipt). Closed-schema 9-field structure, canonical CBOR encoding, Ed25519 signatures, HMAC-SHA256 keyed commitments, IEEE-754 float rejection. External Independent Attestation Providers can promote AAL-3 emission to AAL-4 by attaching Phase 3 notary signatures and transparency-log inclusion proofs. +Vaara implements the OVERT 1.0 ([overt.is](https://overt.is/)) Protocol Profile 1.0 Base Envelope. OVERT 1.0 is an open standard for runtime trust in AI systems, authored by Glacis Technologies and published in March 2026. Closed-schema 9-field structure at AAL-3 Phase 2 (Provisional Receipt), canonical CBOR (RFC 8949), Ed25519 signatures, HMAC-SHA256 keyed commitments, IEEE-754 float rejection. External Independent Attestation Providers can promote AAL-3 emission to AAL-4 by attaching Phase 3 notary signatures and transparency-log inclusion proofs. ``` pip install 'vaara[attestation]' diff --git a/SECURITY.md b/SECURITY.md index 2160908..5898d7d 100644 --- a/SECURITY.md +++ b/SECURITY.md @@ -7,7 +7,7 @@ Please report security vulnerabilities privately through GitHub's feature. **Do not open a public issue for anything that could be exploited.** For communication outside GitHub, reach the maintainers at -`security@vaara.io`. Use PGP if you prefer end-to-end-encrypted email; the +`security@vaara.io`. Use PGP if you prefer end-to-end-encrypted email. The current public key is published at . diff --git a/bench/COMPARISON.md b/bench/COMPARISON.md index 0083833..8f1d571 100644 --- a/bench/COMPARISON.md +++ b/bench/COMPARISON.md @@ -1,9 +1,12 @@ # Comparison with adjacent tools This doc compares Vaara against the open-source tools most often -named in the same breath: **NVIDIA NeMo Guardrails**, **Guardrails AI**, -**OpenAI Guardrails** (for Agents SDK), **LangChain callback handlers**, -and the **OWASP LLM Top 10** threat taxonomy. +named in the same breath. Two clusters: LLM-text rails and output +validators on one side (**NVIDIA NeMo Guardrails**, **Guardrails AI**, +**OpenAI Guardrails** for Agents SDK, **LangChain callback handlers**, +and the **OWASP LLM Top 10** threat taxonomy), and agent governance +plus attestation tools on the other (**Glacis Python SDK** and +**Microsoft Agent Governance Toolkit**). No benchmark numbers are cited for the other tools here. Each one solves a different problem than Vaara, so a head-to-head TPR/FPR on @@ -18,24 +21,36 @@ prose, read the sections below it. ## Capability matrix -| Concern | Vaara | NeMo Guardrails | Guardrails AI | OpenAI Guardrails | LangChain callbacks | OWASP LLM Top 10 | -| ------------------------------------------------ | :---: | :-------------: | :-----------: | :---------------: | :-----------------: | :--------------: | -| Validates tool-call **arguments** at runtime | ✓ | ✗ | ✗ | ✗ | observes only | not software | -| Probabilistic / conformal risk scoring per call | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Detects temporal **sequence** patterns | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Hash-chained, regulator-exportable audit trail | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| EU AI Act Art. 12 / 14 / 26 evidence mapping | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | -| Validates LLM *output text* (PII, toxicity, etc) | ✗ | ✓ | ✓ | ✓ | ✗ | advisory only | -| Validates LLM *input prompt* (jailbreak etc) | ✗ | ✓ | ✓ | ✓ | ✗ | advisory only | -| Structured-output validation (schema / regex) | partial| ✓ | ✓ | ✓ | ✗ | ✗ | -| Self-hostable Python library (no SaaS required) | ✓ | ✓ | ✓ | ✓ | ✓ | document | -| Apache-2.0 | ✓ | Apache-2.0 | Apache-2.0| MIT | MIT | CC-BY | - -Reading the matrix: Vaara and the output-validation tools are -complementary, not competitive. A real deployment uses output -validation **and** tool-call governance. Vaara does not validate LLM -text output, so use Guardrails AI or NeMo for that. NeMo and Guardrails -AI do not validate tool-call arguments at runtime, so use Vaara for that. +| Concern | Vaara | NeMo Guardrails | Guardrails AI | OpenAI Guardrails | LangChain callbacks | OWASP LLM Top 10 | Glacis Python SDK | MS Agent Governance Toolkit | +| ------------------------------------------------ | :---: | :-------------: | :-----------: | :---------------: | :-----------------: | :--------------: | :---------------: | :-------------------------: | +| Validates tool-call **arguments** at runtime | ✓ | ✗ | ✗ | ✗ | observes only | not software | ✗ | ✓ | +| Probabilistic / conformal risk scoring per call | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Detects temporal **sequence** patterns | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| Hash-chained, regulator-exportable audit trail | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | partial (Merkle) | partial (logging) | +| EU AI Act Art. 12 / 14 / 26 evidence mapping | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| OVERT 1.0 Base Envelope emission (RFC 8949 CBOR) | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | +| RFC 6962 Merkle inclusion proof integration | ext. IAP | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ (hosted) | ✗ | +| Validates LLM *output text* (PII, toxicity, etc) | ✗ | ✓ | ✓ | ✓ | ✗ | advisory only | ✗ | ✗ | +| Validates LLM *input prompt* (jailbreak etc) | ✗ | ✓ | ✓ | ✓ | ✗ | advisory only | ✗ | ✗ | +| Structured-output validation (schema / regex) | partial| ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | partial | +| Zero-trust agent identity primitives | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | +| Capability-based access control | policy schema | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | +| Execution sandboxing | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✓ | +| Multi-language SDKs | Python only | N/A | Python | Python (Agents) | Python / JS | N/A | Python only | ✓ | +| Self-hostable Python library (no SaaS required) | ✓ | ✓ | ✓ | ✓ | ✓ | document | ✓ | ✓ | +| License | Apache-2.0 | Apache-2.0 | Apache-2.0 | MIT | MIT | CC-BY | Apache-2.0 | MIT | + +Reading the matrix: Vaara and the other tools are complementary, not +competitive. Different cells of the matrix. Different parts of the +stack. A real production agent deployment uses several of these at +once. Vaara owns the runtime risk-scoring + Article 14 evidence + +OVERT 1.0 attestation slice. NeMo and Guardrails AI cover the LLM +text-rail slice. Microsoft AGT covers the agent identity, capability, +and sandboxing slice. Glacis SDK is a client to Glacis's hosted +attestation service. Vaara does not validate LLM text output, so use +Guardrails AI or NeMo for that. Vaara does not provide zero-trust +agent identity, so use Microsoft AGT for that. The text-rail tools do +not validate tool-call arguments at runtime, so use Vaara for that. ## One paragraph each @@ -79,6 +94,29 @@ vocabulary. Not software, so there is nothing to install. Vaara's signals and sequence patterns are informed by this taxonomy, but the taxonomy itself does not do runtime enforcement. +**Glacis Python SDK.** Apache-2.0 client library for Glacis +Technologies' hosted attestation service, using RFC 8785 canonical +JSON, SHA-256 hashing, Ed25519 signatures, and RFC 6962 Merkle +inclusion proofs delivered in-line by the hosted service. Glacis +Technologies also authored OVERT 1.0, the open standard for +runtime trust in AI systems, published at overt.is in March 2026. +Either tool can be used depending on whether you need a +Glacis-hosted-service client or an OVERT 1.0 Base Envelope emitter +in your runtime. + +**Microsoft Agent Governance Toolkit.** MIT-licensed toolkit for +agent identity, capability-based access control, execution sandboxing, +and reliability engineering. The toolkit frames its surface around +the OWASP Agentic Top 10 and zero-trust principles, with multi-language +SDKs for deployers running heterogeneous agent stacks. Where Vaara +provides runtime risk scoring and Article 14 audit evidence, AGT +provides agent identity primitives and the sandboxing layer that +isolates agent execution from the host environment. The two tools +cover different layers of the same governance stack. The +`GenAI-Gurus/awesome-eu-ai-act` curator places Vaara and AGT side +by side in the AI Agent Governance section for exactly this reason: +deployers running production agents typically want both wired in. + ## Where Vaara fits Vaara is the gate between an AI agent's *decision* to take an action @@ -96,15 +134,24 @@ The three things Vaara does that the tools above do not: 3. Produce **regulator-ready** evidence: cryptographic audit chain, signal breakdown per decision, conformity report. -The three things Vaara does not do that the tools above handle well: +The things Vaara does not do that the tools above handle well: -1. LLM output validation (PII, toxicity, schema). -2. LLM input guardrails (jailbreak detection, topical rails). -3. Constrained decoding and structured output generation. +1. LLM output validation, PII redaction, toxicity filtering (NeMo, + Guardrails AI, OpenAI Guardrails). +2. LLM input guardrails, jailbreak detection, topical rails (same). +3. Constrained decoding and structured output generation (same). +4. Zero-trust agent identity primitives and capability-based access + control as first-class types (Microsoft Agent Governance Toolkit). +5. Execution sandboxing as a built-in primitive (Microsoft AGT). +6. Hosted Merkle-inclusion-proof attestation as a managed service + (Glacis Python SDK). If you are building an agent that writes to user-visible text **and** -executes tools, you want both Vaara and one of the output-validation -tools wired in. They run in different places in the stack. +executes tools, you want Vaara plus one of the output-validation +tools wired in. If you are running agents in production, you want +Vaara plus Microsoft AGT for the identity, capability, and sandboxing +layer Vaara does not cover. They run in different places in the +stack and the matrix above shows where each tool lives. ## Numbers we publish diff --git a/bench/README.md b/bench/README.md index 8a6a2fc..10f986f 100644 --- a/bench/README.md +++ b/bench/README.md @@ -123,7 +123,7 @@ contract as **vaara-bench-v1**. See [`vaara-bench-v1.md`](vaara-bench-v1.md) for the frozen corpus hash, the methodology, the headline numbers under Vaara 0.11.0, the reproduction commands, and the license. Use the spec doc when citing Vaara's adversarial-detection numbers -externally; this README is the running commentary. +externally. This README is the running commentary. `bench/adversarial_corpus.jsonl` is a **synthetic** labelled corpus of 77 traces generated deterministically by `bench/build_corpus.py`.