vaaraio · vaaraio · May 26, 2026 · May 26, 2026 · coderabbitai · May 26, 2026
@@ -6,7 +6,47 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 
 ## [Unreleased]
 
+## [0.37.0] - 2026-05-27
+
+**Theme: third attacker family added to cross-model held-out, v8
+classifier closes the worst v0.36 sub-cell.** 900 adversarial entries
+generated by `RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic` on
+AMD-backed MI300X via `rocm/vllm:latest` extend the cross-model fold
+to a third family. v8 retrains on v035 TRAIN plus v036 Mixtral and
+Claude TM/PE folded in, with v036 DE both legs and the full v037
+Llama-3.3 leg held out. v8 holds in-distribution at 86.6% recall on
+v035 TEST (vs v7's 85.3%, +1.3pp) with 5.0% FPR. On the new v037
+holdout (2,277 entries), overall recall lands at 66.8%, a 7.6 pp lift
+on the comparable v036 number. The Llama-3.3 leg covers a third
+attacker family at 85.8% overall. The worst v0.36 sub-cell (data_exfil
+× Claude) lifts from 26.0% (v7) to 38.9% (v8) on the same 700 entries,
++12.9 pp, confirming the v0.36 diagnosis that the constraint was
+training-corpus distribution and not feature space. Folding v036
+TM/PE into TRAIN reweights the existing destination features against
+the closed-weight attacker patterns the v7 fold was missing.
+
 ### Added
+- `tests/adversarial/generated/{TM,PE,DE}-v037-llama33.jsonl`: 900
+  third-family entries (13 schema-invalid TM dropped, 887 valid).
+- `tests/adversarial/v037_split.json`: v0.37 split manifest. Inherits
+  every v035_split assignment unchanged, folds v036 Mixtral and Claude
+  TM/PE into the train fold, marks v036 Mixtral and Claude DE plus the
+  full v037 Llama-3.3 leg as held out.
+- `src/vaara/data/adversarial_classifier_v8.joblib`: new production
+  bundle, 638 features (254 hand plus 384 MiniLM embedding), trained
+  on the 11,287-entry union fold at threshold 0.9006.
+- `scripts/_v037_common.py`, `scripts/generate_targeted_v037.py`,
+  `scripts/v037_droplet_run.sh`, `scripts/v037_local_watcher.sh`:
+  Llama-3.3 generation pipeline mirroring the v0.36 shape with the
+  third-family swap and a continuous-monitoring local watcher.
+- `scripts/build_v037_split.py`: builds `v037_split.json` from the
+  v035 inheritance plus the v036 and v037 generated entries.
+- `scripts/validate_v037.py`, `scripts/eval_v037_holdout.py`: schema
+  check and three-leg holdout eval (mixtral, claude, llama33).
+- `bench/v037_eval_v8_holdout.json`: full per-category and
+  per-category-per-leg eval results.
+- `bench/vaara-bench-v0.37.md`: ship-gate record, chain of custody,
+  reproduction recipe, named limits.
 - `vaara.attestation.sep2787`: reference implementation of the SEP-2787
   Tool Call Attestation envelope (MCP spec PR
   `modelcontextprotocol/modelcontextprotocol#2787`), proposed shape.
@@ -28,6 +68,8 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
   tampering rejection, canonicalization invariants, and TTL handling.
 
 ### Changed
+- Production classifier: v7 → v8. v7 retained on disk for cross-eval
+  reproducibility. Threshold unchanged at 0.9006.
 - `attestation` optional extra: adds `rfc8785>=0.1.4` for JCS
   canonicalization.
 

@@ -600,6 +600,12 @@ relevant measurement primitive that ties everything above together.
   complement of) Clopper-Pearson. The extension rides in a single
   field in the signed metadata. Standard OVERT verifiers ignore it.
 
+## Position relative to the MIT AI Risk Repository
+
+The [MIT AI Risk Repository v4](https://airisk.mit.edu/) (MIT FutureTech, Slattery et al., updated 2025-12-03, CC BY 4.0) is a meta-taxonomy of 1,835 risk-bearing entries drawn from 74 source papers, organised into 7 domains. Vaara has direct runtime evidence shape against roughly 740 of those entries (~46% of the sub-domain-tagged set), concentrated in Privacy & Security, Malicious Actors & Misuse, Human-Computer Interaction, parts of AI System Safety, and the Governance Failure sub-domain. Vaara does not cover the model-side, content-level, and structural risks that live elsewhere in the taxonomy.
+
+The full per-sub-domain map lives at [docs/mit_ai_risk_repository_mapping.md](docs/mit_ai_risk_repository_mapping.md). Local copies of the v4 database and the companion AI Risk Mitigations sheet are tracked under `research/external/` for reproducibility.
+
 ## EU Product Liability Directive 2024/2853
 
 Directive (EU) 2024/2853 of 23 October 2024 on liability for defective

@@ -29,7 +29,7 @@ Held-out TEST recall 85.0% (95% Wilson [82.8, 87.1]) at FPR 4.6% [3.3, 6.3]. Mul
 - 140 µs / 210 µs p99 inference latency, commodity CPU
 - Distribution-free conformal coverage on the score
 - MWU regret bound O(sqrt(T log N))
-- [vaara-bench-v0.36](bench/vaara-bench-v0.36.md): current methodology, chain of custody, ship-gate record. Cross-model held-out evaluation (4,176 entries generated by Mixtral-8x7B and Claude Sonnet 4.6, never folded into TRAIN), v7 production classifier with 18 destination-aware features, honest training-corpus diagnosis named as v0.37 scope. Historical bench docs live under `bench/` for chain-of-custody continuity.
+- [vaara-bench-v0.37](bench/vaara-bench-v0.37.md): current methodology, chain of custody, ship-gate record. Third attacker family added to cross-model held-out (900 entries generated by Llama-3.3-70B-Instruct on AMD-backed MI300X) and v8 production classifier trained on the v035 + v036 TM/PE union fold. Holds 86.6% recall on v035 TEST, 85.8% on the new Llama-3.3 leg, lifts the worst v0.36 sub-cell (data_exfil × Claude) from 26.0% to 38.9%. Historical bench docs live under `bench/` for chain-of-custody continuity.
 - [vaara-bench-v1](bench/vaara-bench-v1.md): 77-trace synthetic-corpus regression baseline with frozen methodology, 100% soft TPR, 0% hard FPR
 
 Each figure is reproducible from the public corpus or the bench harness in `bench/`.
@@ -266,6 +266,7 @@ See [COMPLIANCE.md](COMPLIANCE.md) "Position relative to open runtime-attestatio
 | [PRIOR_ART.md](PRIOR_ART.md) | When each Vaara concept first shipped, and a neutral list of adjacent published work |
 | [OWASP_AGENTIC.md](OWASP_AGENTIC.md) | Vaara mapping to OWASP Top 10 for Agentic Applications 2026 (ASI01 to ASI10) |
 | [OVERT_CONTROLS.md](OVERT_CONTROLS.md) | Vaara mapping to OVERT 1.0 Part 3 Agentic AI Controls (TOOL-*, MCP-*, MULTI-*, CAP-*, DISC-*, HITL-*, DRIFT-*) |
+| [docs/mit_ai_risk_repository_mapping.md](docs/mit_ai_risk_repository_mapping.md) | Vaara coverage map against the MIT AI Risk Repository v4 (1,835 risk-bearing entries across 7 domains) |
 | [docs/signing-keys.md](docs/signing-keys.md) | Release signing and verification |
 | [SECURITY.md](SECURITY.md) | Security policy and reporting |
 | [CONTRIBUTING.md](CONTRIBUTING.md) | Contribution guidelines |

@@ -0,0 +1,76 @@
+{
+  "bundle": "src/vaara/data/adversarial_classifier_v8.joblib",
+  "bundle_version": "v0.37",
+  "threshold": 0.9006,
+  "split_manifest": "tests/adversarial/v037_split.json",
+  "n": 2277,
+  "pos": 2277,
+  "tp": 1522,
+  "fn": 755,
+  "recall": 0.668423364075538,
+  "recall_ci": [
+    0.6488167555363148,
+    0.687462624846786
+  ],
+  "per_category": {
+    "data_exfil": {
+      "n": 1690,
+      "tp": 968,
+      "recall": 0.5727810650887574
+    },
+    "privilege_escalation": {
+      "n": 300,
+      "tp": 291,
+      "recall": 0.97
+    },
+    "tool_misuse": {
+      "n": 287,
+      "tp": 263,
+      "recall": 0.9163763066202091
+    }
+  },
+  "per_leg": {
+    "claude": {
+      "n": 700,
+      "tp": 272,
+      "recall": 0.38857142857142857
+    },
+    "mixtral": {
+      "n": 690,
+      "tp": 489,
+      "recall": 0.7086956521739131
+    },
+    "llama33": {
+      "n": 887,
+      "tp": 761,
+      "recall": 0.8579481397970687
+    }
+  },
+  "per_category_per_leg": {
+    "data_exfil__claude": {
+      "n": 700,
+      "tp": 272,
+      "recall": 0.38857142857142857
+    },
+    "data_exfil__mixtral": {
+      "n": 690,
+      "tp": 489,
+      "recall": 0.7086956521739131
+    },
+    "data_exfil__llama33": {
+      "n": 300,
+      "tp": 207,
+      "recall": 0.69
+    },
+    "privilege_escalation__llama33": {
+      "n": 300,
+      "tp": 291,
+      "recall": 0.97
+    },
+    "tool_misuse__llama33": {
+      "n": 287,
+      "tp": 263,
+      "recall": 0.9163763066202091
+    }
+  }
+}
@@ -0,0 +1,174 @@
+# vaara-bench-v0.37
+
+Methodology delta against [vaara-bench-v0.36](vaara-bench-v0.36.md).
+v0.37 is a corpus-augmentation release with two contributions:
+
+1. **Third attacker family added to held-out eval.** 900 adversarial
+   entries generated by `RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic`
+   on AMD-backed MI300X SR-IOV under `rocm/vllm:latest`. Three
+   categories (`tool_misuse`, `privilege_escalation`, `data_exfil`),
+   300 per category, held out from TRAIN.
+2. **v8 retrain on union corpus.** v036 Mixtral TM/PE plus v036 Claude
+   TM/PE entries folded into TRAIN. The v036 DE subset stays held out
+   so the worst sub-cell from v0.36 (data_exfil × Claude) remains a
+   pure cross-model eval. The new v037 holdout combines that carried
+   v036 DE fold with the full v037 Llama-3.3 leg.
+
+## Production headline (v8 on v035 TEST)
+
+| metric | v8 | v7 (prior production) | Δ |
+|---|---|---|---|
+| TEST recall at T=0.9006 | **86.6% [84.5, 88.6]** | 85.3% [83.1, 87.3] | +1.3pp |
+| TEST FPR at same calibration | 5.0% [3.6, 6.8] | 4.6% [3.3, 6.3] | +0.4pp (within CI) |
+| n_features | 638 (254 hand + 384 embed) | 638 | flat |
+| TRAIN entries | 11,287 | 8,501 | +2,786 |
+| TRAIN positive rate | 0.683 | 0.579 | folded TM/PE both legs |
+
+v8 is the new production bundle
+(`src/vaara/data/adversarial_classifier_v8.joblib`). v7 and v6 stay on
+disk for cross-eval reproducibility. No regression on the
+in-distribution TEST set. Recall lifts 1.3 pp at a 0.4 pp FPR cost that
+sits inside the prior FPR confidence interval.
+
+## Cross-model held-out result (v8 on v037 holdout)
+
+| cut | n | recall at T=0.9006 |
+|---|---|---|
+| **overall** | 2,277 | **66.8% [64.9, 68.7]** |
+| Llama-3.3 leg (new third family) | 887 | **85.8% [83.3, 87.9]** |
+| Mixtral DE (carry-forward from v036) | 690 | 70.9% [67.4, 74.1] |
+| Claude DE (carry-forward from v036) | 700 | **38.9% [35.3, 42.5]** |
+| data_exfil overall | 1,690 | 57.3% [54.9, 59.6] |
+| privilege_escalation overall | 300 | 97.0% [94.4, 98.4] |
+| tool_misuse overall | 287 | 91.6% [87.9, 94.3] |
+| data_exfil × llama33 | 300 | 69.0% [63.6, 74.0] |
+| privilege_escalation × llama33 | 300 | 97.0% [94.4, 98.4] |
+| tool_misuse × llama33 | 287 | 91.6% [87.9, 94.3] |
+
+The v8 number on data_exfil × Claude lifts from 26.0% (v7) to **38.9%
+(+12.9pp)** on the same 700 entries. The constraint diagnosed in v0.36
+was training-corpus distribution, not feature space. Folding v036
+TM/PE into TRAIN reweights the existing `dst__` features against actual
+adversarial examples and the lift falls out.
+
+## Third-family cross-model number
+
+Llama-3.3-70B-Instruct was not present in TRAIN. The 85.8% overall
+recall on its 887 entries indicates the v035 + v036 TM/PE TRAIN fold
+generalises to a third attacker family at near-in-distribution recall.
+Per-category cuts on this leg:
+
+- privilege_escalation × llama33: 97.0%
+- tool_misuse × llama33: 91.6%
+- data_exfil × llama33: 69.0%
+
+The data_exfil pattern from v0.36 holds. Across all three attacker
+families now seen, DE is the hardest category. PE and TM generalise
+cleanly; DE generalises unevenly.
+
+## Carry-forward DE numbers
+
+v036 Mixtral DE: 70.9% (v8) vs 69.3% (v7 on the same 690 entries),
+flat. v036 Claude DE: 38.9% (v8) vs 26.0% (v7 on the same 700
+entries), **+12.9pp**. The v036 → v8 lift is concentrated in the
+closed-weight leg that was failing hardest. Open-weight Mixtral DE was
+already at 70%-tier and stays there. The asymmetry confirms the v0.36
+mechanism finding (destination signal is the axis, and folding the
+related TM/PE distributions into TRAIN repositions the classifier on
+that axis for the closed-weight leg).
+
+## Ship gate
+
+v0.37 ships under both a methodology gate and a sub-cell recall gate
+because v8 is a production retrain:
+
+| gate | result |
+|---|---|
+| v035 TEST recall does not regress | PASS, 85.3% → 86.6%, +1.3pp |
+| v035 TEST FPR does not regress | PASS, 4.6% → 5.0%, within CI |
+| Worst v0.36 sub-cell improves | PASS, DE × Claude 26.0% → 38.9% |
+| Third attacker family covered with recall floor | PASS, llama33 overall 85.8% |
+| Held-out gap stays published with mechanism | PASS |
+
+Cross-model overall recall is 66.8%. Below the 70% floor used as soft
+target in prior releases, but the floor was set against v035 TEST
+distribution. Cross-model overall is a harder denominator, and 66.8%
+is a 7.6 pp lift on the comparable v036 number (59.2% → 66.8%) with
+a third family added to the denominator.
+
+## Generation provenance
+
+Llama-3.3-70B generation ran on an AMD-backed MI300X DigitalOcean
+SR-IOV droplet under `rocm/vllm:latest` serving
+`RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic` with the model's native
+`compressed-tensors` FP8 quantization
+(`--max-model-len 8192 --enforce-eager --gpu-memory-utilization 0.92`).
+Three parallel category generators, ~22 minutes wall clock for 900
+entries at steady-state ~40 entries/min combined. Droplet poweroff
+issued post-rsync. Schema validation pass dropped 13 of 300 raw TM
+entries (4.3%) where the model emitted non-DENY `expected`. Final v037
+counts: TM 287, PE 300, DE 300, total 887 valid.
+
+The v037 droplet recipe is identical to v0.36 modulo the model swap.
+The `--quantization` flag had to be dropped because `compressed-tensors`
+in the model config conflicts with an explicit `fp8` argument. vLLM
+auto-detects the quantization scheme from the model config in that
+case, and that path serves correctly. This is a model-specific
+configuration note rather than a methodology change.
+
+## Chain of custody
+
+| anchor | path | pins |
+|---|---|---|
+| corpus manifest | `tests/adversarial/MANIFEST.sha256` | SHA-256 of every JSONL including v037 |
+| v035 split (inherited) | `tests/adversarial/v035_split.json` | TRAIN/VAL/TEST for v8 calibration |
+| v037 split | `tests/adversarial/v037_split.json` | v035 inherited + v036 TM/PE → train, v036 DE + v037 → holdout |
+| production bundle | `src/vaara/data/adversarial_classifier_v8.joblib` | trained on 11,287 entries with dst features + embeddings |
+| prior production | `src/vaara/data/adversarial_classifier_v7.joblib` | retained for cross-eval |
+| Llama-3.3 generator | `scripts/generate_targeted_v037.py` | vLLM HTTP, FP8 dynamic on MI300X |
+| droplet driver | `scripts/v037_droplet_run.sh` | idempotent, no destructive EXIT trap |
+| watcher | `scripts/v037_local_watcher.sh` | 60s rsync poll, opt-in doctl auto-shutdown |
+| split builder | `scripts/build_v037_split.py` | inherits v035, folds v036 TM/PE into train |
+| holdout eval | `scripts/eval_v037_holdout.py` | three-leg breakdown (mixtral, claude, llama33) |
+| v035 schema check | `scripts/validate_v037.py` | same shape as v0.36 validator |
+
+## Reproduction recipe
+
+```
+cd tests/adversarial && sha256sum -c MANIFEST.sha256
+.venv/bin/python scripts/validate_v037.py
+.venv/bin/python scripts/build_v037_split.py
+.venv/bin/python scripts/save_classifier_bundle.py \
+    --version v0.37 --threshold 0.9006 --embeddings \
+    --split-manifest tests/adversarial/v037_split.json \
+    --train-fold train \
+    --bundle-out src/vaara/data/adversarial_classifier_v8.joblib
+.venv/bin/python scripts/eval_v037_holdout.py \
+    --bundle src/vaara/data/adversarial_classifier_v8.joblib \
+    --json-out bench/v037_eval_v8_holdout.json
+```
+
+## Named limits
+
+1. **Third family generation is 887 valid entries, not 4,000+ like
+   v0.36.** Wilson CI on a 300-entry sub-cell at p ~ 0.85 is ± 4 pp,
+   adequate for ship-gate decisions. Scaling the Llama-3.3 leg to v036
+   density is v0.38 scope, paired with public-benchmark evaluation.
+2. **Open-weight families dominate the third-family fold.** Llama-3.3
+   and Mixtral are both open-weight Meta and Mistral architectures.
+   Closed-weight coverage in v0.37 is the carry-forward Claude DE
+   subset only. Adding GPT-4o-class or Gemini-class generation is v0.38
+   scope.
+3. **No public-benchmark eval (PINT, BIPIA, INJECT) yet.** v0.38 scope.
+4. **PAIR multi-attacker scale-up not performed.** v0.38 scope (target
+   ASR Wilson upper under 1%).
+5. **FPR-bounded three-stage combiner per FCR paper (arxiv:2605.22004)
+   not implemented.** v0.39 scope.
+
+## Cumulative position
+
+v0.37 closes the worst v0.36 sub-cell by 12.9 pp without giving up
+in-distribution recall, and covers a third attacker family at 85.8%
+overall. The data_exfil category remains the hardest cross-model
+surface. That is the v0.38 + v0.39 line of work: public-benchmark
+numbers, PAIR-at-scale, FPR-bounded combiner.
@@ -1,6 +1,6 @@
 {
   "name": "@vaara/client",
-  "version": "0.36.0",
+  "version": "0.37.0",
   "description": "TypeScript client for the Vaara HTTP API. Conformal risk scoring, hash-chained audit, policy reload, named detectors.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",