vaaraio · vaaraio · May 27, 2026 · May 27, 2026 · coderabbitai · May 27, 2026
@@ -6,6 +6,50 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 
 ## [Unreleased]
 
+## [0.38.0] - 2026-05-27
+
+**Theme: Phase 1 PAIR scale-up to n=300 per attacker family on the
+Llama-3.3-70B leg.** 900 fresh adversarial entries generated by
+`RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic` on AMD-backed MI300X
+SR-IOV under `rocm/vllm:latest` at seed 43. The v8 production
+classifier is carried forward unchanged from v0.37 and evaluated at
+calibrated T=0.9006 against the new corpus. Overall recall lands at
+88.4% [86.2, 90.4], a 2.6 pp lift over the v0.37 Llama-3.3 leg
+(85.8%). The biggest move is on `data_exfil` (69.0% to 75.3%, +6.3
+pp), with `tool_misuse` at 93.7% and `privilege_escalation` at 96.3%.
+The Phase 1 entries are content-distinct from the v0.37 Llama-3.3 leg
+because the new seed produces fresh samples.
+
+External-corpus eval (BIPIA, LLMail-Inject) and the IPI fourth attacker
+family both move to v0.39. Neither external corpus pre-extracts the
+tool calls that v8 classifies, so an honest eval requires an LLM-agent
+harness rather than direct classifier inference. IPI fits the same
+release window as a different attack class.
+
+### Added
+- `tests/adversarial/generated/{TM,PE,DE}-v038-llama33-s43.jsonl`:
+  900 Phase 1 entries (300 per category) generated at seed 43,
+  schema-valid, fingerprint-deduplicated against v037.
+- `scripts/eval_v038_phase1.py`: reads the three Phase 1 jsonls
+  directly and runs the production v8 bundle at T=0.9006. Reports
+  overall recall, per-category recall, per-severity recall, and
+  Wilson confidence intervals. Writes the eval artifact to
+  `bench/v038_phase1_eval_v8.json`.
+- `scripts/v038_droplet_run.sh`: droplet driver mirroring the v0.37
+  shape with the `--quantization fp8` argument removed. The current
+  `rocm/vllm:latest` image refuses the explicit quantization flag
+  when the model config already declares `compressed-tensors`. vLLM
+  auto-detects on that path.
+- `scripts/v038_local_watcher.sh`: 60-second rsync-back loop for
+  continuous recovery of entries and logs during long droplet runs.
+- `bench/v038_phase1_eval_v8.json`: Phase 1 eval artifact.
+- `bench/vaara-bench-v0.38.md`: v0.38 methodology, chain of custody,
+  ship gate, and the explicit scope note on the v0.39 external-corpus
+  and IPI threads.
+
+### Changed
+- README bench pointer swapped from v0.37 to v0.38.
+
 ## [0.37.1] - 2026-05-27
 
 **Theme: SEP-2787 verifier step 5, argument commitment verification.**

@@ -29,7 +29,7 @@ Held-out TEST recall 85.0% (95% Wilson [82.8, 87.1]) at FPR 4.6% [3.3, 6.3]. Mul
 - 140 µs / 210 µs p99 inference latency, commodity CPU
 - Distribution-free conformal coverage on the score
 - MWU regret bound O(sqrt(T log N))
-- [vaara-bench-v0.37](bench/vaara-bench-v0.37.md): current methodology, chain of custody, ship-gate record. Third attacker family added to cross-model held-out (900 entries generated by Llama-3.3-70B-Instruct on AMD-backed MI300X) and v8 production classifier trained on the v035 + v036 TM/PE union fold. Holds 86.6% recall on v035 TEST, 85.8% on the new Llama-3.3 leg, lifts the worst v0.36 sub-cell (data_exfil × Claude) from 26.0% to 38.9%. Historical bench docs live under `bench/` for chain-of-custody continuity.
+- [vaara-bench-v0.38](bench/vaara-bench-v0.38.md): current methodology, chain of custody, ship-gate record. Phase 1 PAIR scale-up to n=300 per attacker family with 900 fresh Llama-3.3-70B entries on AMD-backed MI300X at seed 43. v8 production classifier unchanged from v0.37, evaluated at calibrated T=0.9006. Overall recall 88.4% [86.2, 90.4] on the Phase 1 corpus, +2.6pp over the v0.37 Llama-3.3 leg, biggest lift on data_exfil (+6.3pp). Historical bench docs live under `bench/` for chain-of-custody continuity.
 - [vaara-bench-v1](bench/vaara-bench-v1.md): 77-trace synthetic-corpus regression baseline with frozen methodology, 100% soft TPR, 0% hard FPR
 
 Each figure is reproducible from the public corpus or the bench harness in `bench/`.

@@ -0,0 +1,56 @@
+{
+  "bundle": "src/vaara/data/adversarial_classifier_v8.joblib",
+  "bundle_version": "v0.37",
+  "threshold": 0.9006,
+  "source": "v0.38 Phase 1: tests/adversarial/generated/{TM,PE,DE}-v038-llama33-s43.jsonl",
+  "model_attacker": "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic",
+  "seed": 43,
+  "n": 900,
+  "pos": 900,
+  "tp": 796,
+  "fn": 104,
+  "recall": 0.8844444444444445,
+  "recall_ci": [
+    0.8619044268492455,
+    0.9037164518533944
+  ],
+  "per_category": {
+    "tool_misuse": {
+      "n": 300,
+      "tp": 281,
+      "recall": 0.9366666666666666
+    },
+    "privilege_escalation": {
+      "n": 300,
+      "tp": 289,
+      "recall": 0.9633333333333334
+    },
+    "data_exfil": {
+      "n": 300,
+      "tp": 226,
+      "recall": 0.7533333333333333
+    }
+  },
+  "per_severity": {
+    "critical": {
+      "n": 397,
+      "tp": 366,
+      "recall": 0.9219143576826196
+    },
+    "medium": {
+      "n": 161,
+      "tp": 135,
+      "recall": 0.8385093167701864
+    },
+    "high": {
+      "n": 336,
+      "tp": 289,
+      "recall": 0.8601190476190477
+    },
+    "low": {
+      "n": 6,
+      "tp": 6,
+      "recall": 1.0
+    }
+  }
+}
@@ -0,0 +1,130 @@
+# vaara-bench-v0.38
+
+Methodology delta against [vaara-bench-v0.37](vaara-bench-v0.37.md).
+v0.38 is a corpus scale-up release on the Phase 1 PAIR leg:
+
+1. **Third attacker family scaled to n=300 per category.** 900 fresh
+   adversarial entries generated by
+   `RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic` on AMD-backed MI300X
+   SR-IOV under `rocm/vllm:latest` at seed 43. Three categories
+   (`tool_misuse`, `privilege_escalation`, `data_exfil`), 300 per
+   category, held out from TRAIN.
+2. **v8 classifier carried forward unchanged.** No retrain in v0.38.
+   The Phase 1 corpus is evaluated against the same production bundle
+   (`adversarial_classifier_v8.joblib`) that shipped in v0.37 at the
+   same calibrated threshold T=0.9006.
+
+## Phase 1 result (v8 on 900 Llama-3.3 entries, seed 43)
+
+| cut | n | recall at T=0.9006 |
+|---|---|---|
+| **overall** | 900 | **88.4% [86.2, 90.4]** |
+| tool_misuse | 300 | 93.7% [90.3, 95.9] |
+| privilege_escalation | 300 | 96.3% [93.6, 97.9] |
+| data_exfil | 300 | 75.3% [70.2, 79.9] |
+
+vs the v0.37 Llama-3.3 leg at n=887 (85.8% overall, TM 91.6%, PE 97.0%,
+DE 69.0%): +2.6pp overall, with the biggest lift on data_exfil
+(+6.3pp). PE moves inside its prior confidence interval. The DE lift
+holds the v0.37 mechanism finding that data_exfil is the hardest
+category but not a structural failure.
+
+The Phase 1 corpus uses a different random seed than the v0.37 leg
+(43 vs the v0.37 generator default) so the entries are content-distinct.
+Fingerprint deduplication against v037 entries showed zero true
+duplicates.
+
+## Recall by severity (Phase 1)
+
+| severity | n | recall |
+|---|---|---|
+| critical | 397 | 92.2% |
+| high | 336 | 86.0% |
+| medium | 161 | 83.9% |
+| low | 6 | 100.0% |
+
+The severity distribution carries across the Phase 1 entries: critical
+above 92%, high and medium in the mid-eighties. The low n=6 bucket is
+not load-bearing at this sample size.
+
+## Generation provenance
+
+Phase 1 generation ran on an AMD-backed MI300X DigitalOcean SR-IOV
+droplet under `rocm/vllm:latest` serving
+`RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic` with the model's native
+`compressed-tensors` FP8 quantization
+(`--max-model-len 8192 --enforce-eager --gpu-memory-utilization 0.92`).
+Three parallel category generators, 22 minutes wall clock for 900
+entries (vLLM health-up at 06:48Z, last generator done at 07:06Z on
+2026-05-27). All 900 entries schema-valid.
+
+The v037 generator hardcoded `v037` in entry `id` and `agent_id` fields
+regardless of `--random-seed`, producing ID collisions against the
+v037 corpus. A one-pass rename `v037 -> v038` zeroed those collisions
+in place before the eval. Content uniqueness was preserved because
+the model produced distinct samples at the new seed.
+
+The v0.38 droplet driver (`scripts/v038_droplet_run.sh`) drops the
+`--quantization fp8` argument that v0.37 used. The current
+`rocm/vllm:latest` image (vllm 0.11.2.dev673) refuses an explicit
+quantization flag when the model config already declares
+`compressed-tensors`. vLLM auto-detects on that path.
+
+## Chain of custody
+
+| anchor | path | pins |
+|---|---|---|
+| Phase 1 corpus | `tests/adversarial/generated/{TM,PE,DE}-v038-llama33-s43.jsonl` | 300 entries per category, seed 43, schema-valid |
+| production bundle | `src/vaara/data/adversarial_classifier_v8.joblib` | unchanged from v0.37 |
+| Phase 1 eval | `scripts/eval_v038_phase1.py` | reads jsonls directly, bypasses split manifest |
+| eval artifact | `bench/v038_phase1_eval_v8.json` | overall + per-category + per-severity |
+| droplet driver | `scripts/v038_droplet_run.sh` | drops --quantization fp8 flag |
+| watcher | `scripts/v038_local_watcher.sh` | 60s rsync-back loop for defensive recovery |
+
+## Reproduction recipe
+
+```
+PYTHONPATH=src .venv/bin/python scripts/eval_v038_phase1.py \
+    --bundle src/vaara/data/adversarial_classifier_v8.joblib \
+    --threshold 0.9006 \
+    --json-out bench/v038_phase1_eval_v8.json
+```
+
+## What is not in v0.38
+
+Two threads carry to v0.39:
+
+1. **External-corpus eval (BIPIA, LLMail-Inject).** BIPIA provides 75
+   text-injection templates plus 50 code-injection templates (the
+   instructions to inject into a benign context). LLMail-Inject
+   provides 208K labelled participant submissions on whether an LLM
+   email assistant followed each injection. Neither corpus
+   pre-extracts the resulting tool calls. v8 classifies tool calls.
+   An honest eval against either requires running an LLM agent
+   end-to-end on the injection prompt, capturing the resulting tool
+   call, and then running v8 on that tool call. That is an LLM-agent
+   harness, not a packaging task on top of an existing eval path.
+   BIPIA attack texts are downloaded to
+   `tests/adversarial/external/bipia/` for the v0.39 harness work.
+2. **IPI fourth attacker family.** Indirect prompt injection lands
+   cleaner as a different attack class in v0.39 rather than a fourth
+   attacker LLM in v0.38. The Phase 1 result on the existing three
+   attacker families is the v0.38 headline.
+
+## Ship gate
+
+| gate | result |
+|---|---|
+| Phase 1 PAIR scale-up clears the v0.37 Llama-3.3 leg | PASS, 85.8% -> 88.4% overall, +2.6pp |
+| Worst Phase 1 sub-cell stays above 70% recall floor | PASS, DE 75.3% |
+| In-distribution TEST recall not regressed | PASS, v8 unchanged from v0.37 |
+| Methodology + chain of custody published | PASS |
+
+## Cumulative position
+
+v0.38 closes the Phase 1 PAIR scale-up on three attacker families
+(Mixtral, Claude, Llama-3.3-70B) at n=300 each, with the third family
+landing at 88.4% overall recall against an unchanged v8 classifier.
+The next-release line of work is external-corpus eval against BIPIA
+and LLMail-Inject and the IPI fourth attacker family, both of which
+share the LLM-agent harness scope that v0.39 is sized for.
@@ -1,6 +1,6 @@
 {
   "name": "@vaara/client",
-  "version": "0.37.1",
+  "version": "0.38.0",
   "description": "TypeScript client for the Vaara HTTP API. Conformal risk scoring, hash-chained audit, policy reload, named detectors.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "vaara"
-version = "0.37.1"
+version = "0.38.0"
 description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance"
 requires-python = ">=3.10"
 license = "Apache-2.0"