release(v0.37.0): third attacker family + v8 classifier closes worst v0.36 sub-cell by vaaraio · Pull Request #144 · vaaraio/vaara

vaaraio · 2026-05-26T22:14:52Z

Summary

Third attacker family added to the cross-model held-out fold and v8 classifier ships as the new production default. Closes the worst v0.36 sub-cell by 12.9 pp on the same 700 Claude data_exfil entries (26.0% under v7 to 38.9% under v8), covers a third attacker family at 85.8% overall recall, and holds in-distribution at 86.6% on v035 TEST with FPR inside the prior confidence interval.

Headline numbers

eval	v8	v7 (prior production)	Δ
v035 TEST recall at T=0.9006	86.6% [84.5, 88.6]	85.3% [83.1, 87.3]	+1.3 pp
v035 TEST FPR	5.0% [3.6, 6.8]	4.6% [3.3, 6.3]	+0.4 pp (within CI)
v037 holdout overall (n=2,277)	66.8% [64.9, 68.7]	n/a	new
Llama-3.3 leg (third family, n=887)	85.8% [83.3, 87.9]	n/a	new
data_exfil x Claude (carry-forward, n=700)	38.9% [35.3, 42.5]	26.0% [22.9, 29.4]	+12.9 pp
TRAIN entries	11,287	8,501	+2,786

What ships

v8 classifier. src/vaara/data/adversarial_classifier_v8.joblib, 638 features (254 hand plus 384 MiniLM embedding), trained on v035 TRAIN plus v036 Mixtral and Claude TM/PE folded in. v7 retained on disk for cross-eval reproducibility. Threshold unchanged at 0.9006.
v037 cross-model holdout. 900 entries from RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic on AMD-backed MI300X via rocm/vllm:latest. 13 schema-invalid TM dropped, 887 valid (TM 287, PE 300, DE 300). Combined with v036 DE both legs for the new 2,277-entry v037 holdout fold.
v0.37 split manifest. tests/adversarial/v037_split.json inherits v035_split verbatim, folds v036 TM/PE into train, marks v036 DE plus the full v037 leg as holdout.
SEP-2787 reference implementation. Already on main via PR feat(attestation): add SEP-2787 reference implementation (proposed shape) #139, bundled into the v0.37 release notes for one clean validator-stack line.
MIT AI Risk Repository v4 position document. docs/mit_ai_risk_repository_mapping.md plus the COMPLIANCE.md position paragraph.

What changed

New files: bench/vaara-bench-v0.37.md, bench/v037_eval_v8_holdout.json, tests/adversarial/v037_split.json, src/vaara/data/adversarial_classifier_v8.joblib, three v037 generated jsonl files, seven v037 scripts (_v037_common.py, generate_targeted_v037.py, v037_droplet_run.sh, v037_local_watcher.sh, validate_v037.py, build_v037_split.py, eval_v037_holdout.py), docs/mit_ai_risk_repository_mapping.md.

Updated: tests/adversarial/MANIFEST.sha256 (v037 entries appended, 302 to 305), version bumps across pyproject.toml + clients/ts/package.json + src/vaara/__init__.py (0.36.0 to 0.37.0), CHANGELOG, README bench pointer, COMPLIANCE MIT Risk Repository position section.

Test plan

Ruff clean on all new and modified Python files
Schema validation: 887 of 900 v037 entries pass (13 TM dropped on bad expected)
v8 on v035 TEST: 86.6% recall at 5.0% FPR, no regression vs v7 85.3% at 4.6%
v8 on v037 holdout: 66.8% overall, per-category-per-leg cuts published in bench/v037_eval_v8_holdout.json
MANIFEST.sha256 verifies all 305 entries (cd tests/adversarial && sha256sum -c MANIFEST.sha256)
Droplet poweroff issued post-rsync, no idle compute cost

Methodology notes

RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic ships with compressed-tensors quantization in its config. The explicit --quantization fp8 argument conflicts and must be dropped. vLLM auto-detects the scheme from the model config and serves correctly.
The data_exfil category remains the hardest cross-model surface across all three attacker families now in scope. PE and TM generalise to Llama-3.3 cleanly. DE generalises unevenly. That is the v0.38 public-benchmark and PAIR-at-scale line of work.
Total Llama-3.3 generation: 22 minutes wall clock at steady-state 40 entries/min combined across three parallel category generators on one MI300X SR-IOV droplet.

Summary by CodeRabbit

New Features
- Released v0.37 with upgraded production classifier delivering improved detection capabilities.
- Expanded evaluation coverage across multiple attack scenarios and model families.
Documentation
- Added MIT AI Risk Repository v4 mapping documentation.
- Updated benchmark methodology and detailed v0.37 evaluation results.

…v0.36 sub-cell Adds a third attacker family to the cross-model held-out fold and ships a v8 classifier that trains on the v035 + v036 TM/PE union and keeps v036 DE both legs plus the full v037 Llama-3.3 leg as the new held-out evaluation. Closes the worst sub-cell from v0.36 by 12.9 pp on the same 700 Claude DE entries (26.0% under v7 to 38.9% under v8), covers a third attacker family at 85.8% overall recall, and holds in-distribution at 86.6% on v035 TEST with FPR inside the prior confidence interval. Generation provenance: 900 entries from RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic on AMD-backed MI300X SR-IOV under rocm/vllm:latest, three parallel category generators, 22 minutes wall clock, droplet poweroff post-rsync. 13 schema-invalid TM entries dropped, 887 valid (TM 287, PE 300, DE 300). v8 production bundle ships at src/vaara/data/adversarial_classifier_v8.joblib. v7 retained on disk for cross-eval reproducibility. Threshold unchanged at 0.9006. Also bundles the MIT AI Risk Repository v4 position document and the SEP-2787 Tool Call Attestation reference implementation that landed on main via PR #139. Both fit the v0.37 ship window and add one clean line each to the validator stack and the public taxonomy positioning. Full methodology, chain of custody, ship gate, and named limits in bench/vaara-bench-v0.37.md.

coderabbitai · 2026-05-26T22:15:06Z

📝 Walkthrough

Walkthrough

v0.37 release updates project versions across versioning files, documents benchmark methodology and cross-model evaluation results, implements adversarial data generation infrastructure with Llama-3.3 targeting, adds remote droplet orchestration for large-scale generation, and updates data integrity manifest.

Changes

v0.37 Release Delivery

Layer / File(s)	Summary
Release versioning and documentation `CHANGELOG.md`, `COMPLIANCE.md`, `README.md`, `clients/ts/package.json`, `pyproject.toml`, `src/vaara/__init__.py`	Version bumps from `0.36.0` to `0.37.0` across all versioning files; release notes in CHANGELOG.md detail v0.37 theme including Llama-3.3 generation, split/validation/eval artifacts, and v8 classifier bundle; COMPLIANCE.md adds MIT AI Risk Repository v4 coverage section; README.md updates bench reference and adds MIT mapping doc table entry.
Benchmark methodology, results, and coverage mapping `bench/vaara-bench-v0.37.md`, `bench/v037_eval_v8_holdout.json`, `docs/mit_ai_risk_repository_mapping.md`	vaara-bench-v0.37.md documents two methodology changes (third held-out attacker family, retraining v8 on union corpus), production headline metrics, cross-model held-out recall breakdowns by category/leg, generation provenance, reproduction recipe, and operational limits; v037_eval_v8_holdout.json records detailed evaluation results including overall and per-group recalls with Wilson confidence intervals; new mapping doc describes Vaara's coverage of MIT AI Risk Repository v4 domains and sub-domains.
V037 adversarial generation shared infrastructure `scripts/_v037_common.py`	Shared configuration (category instructions, schema template, prompts) and utilities: `load_seeds` (sample from JSON file), `build_user_message` (schema substitution with seed embedding), `parse_jsonl` (category filtering), `fingerprint` (SHA-1 dedup over tool_name/parameters/context), `load_prior_fingerprints` (dedupe set from existing JSONL).
V037 adversarial generation and validation `scripts/generate_targeted_v037.py`, `scripts/validate_v037.py`	`generate_targeted_v037.py` calls Llama-3.3 vLLM endpoint with category-specific prompts, fingerprints and deduplicates output JSONL, and appends unique entries with batch error handling. `validate_v037.py` scans generated JSONL files, enforces schema (required fields, context.original_task presence, expected == DENY, severity allowlist, category match from filename), and reports per-file and total validation statistics.
V037 data composition and evaluation `scripts/build_v037_split.py`, `scripts/eval_v037_holdout.py`	`build_v037_split.py` composes v037_split.json by inheriting v035 assignments, folding v036 TM/PE into TRAIN, keeping v036 DE holdout, and adding v037 Llama entries; deduplicates by fingerprint and drops invalid records. `eval_v037_holdout.py` loads classifier bundle, builds feature matrices from holdout split, computes recall with Wilson confidence intervals, and aggregates per-category, per-leg, and per-category×leg breakdowns.
Remote generation orchestration `scripts/v037_droplet_run.sh`, `scripts/v037_local_watcher.sh`	`v037_droplet_run.sh` launches ROCm vLLM container, waits for model endpoint, and starts three parallel generator processes (TM/PE/DE) under nohup with idempotent re-run safety. `v037_local_watcher.sh` polls remote generator progress via rsync, tracks per-category line counts, verifies vLLM model ID, logs state transitions, and upon target completion, optionally auto-shuts down droplet via doctl or exits with manual shutdown instructions.
Data integrity tracking `tests/adversarial/MANIFEST.sha256`	Three new SHA256 manifest entries for generated v037 Llama-3.3 datasets: DE-v037-llama33.jsonl, PE-v037-llama33.jsonl, TM-v037-llama33.jsonl.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

vaaraio/vaara#48: Updates README.md benchmark methodology section; v0.37 further evolves the benchmark scope and reported recall metrics.

Poem

🐇 A v-zero-three-seven hop, so fine,
With Llama's third family in line,
Split, validate, eval with care,
Watch the droplets run with flair!
Data blessed with checksums bright,
Release complete—a fuzzy delight. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the primary change: a v0.37 release with a third attacker family (Llama-3.3) and an improved v8 classifier that resolves the worst-performing sub-cell from v0.36.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch release/v0.37.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

README.md (1)
23-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Numbers section mixes stale headline metrics with the new “current” v0.37 benchmark.

Lines 23-27 still present older benchmark headlines while Line 32 declares v0.37 as current methodology. Please either update the top-line metrics to v0.37 or explicitly label the older figures as historical to avoid reader confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 23 - 33, The README's top-line metrics (the initial
headline paragraph and the first few bullets that report recall, FPR, p99
latency, etc.) are inconsistent with the declared current methodology
vaara-bench-v0.37; either replace those headline numbers with the v0.37 values
or explicitly mark the existing headline metrics as "historical (pre-v0.37)" and
add updated v0.37 figures next to or below them; update the classifier
recall/FPR, multi-attacker PAIR, and latency bullets (the items referencing
recall 85.0%, FPR 4.6%, 140 µs p99, and the multi-attacker 0/25 result) so they
match the vaara-bench-v0.37 subsection (or are clearly labeled as older), and
ensure the vaara-bench-v0.37 reference remains accurate.

🧹 Nitpick comments (3)

scripts/build_v037_split.py (1)

31-41: 💤 Low value

Consider SHA-256 for consistency with project standards.

The MANIFEST.sha256 file (mentioned in PR description) uses SHA-256, while this fingerprint function uses SHA-1. While SHA-1 is acceptable for non-cryptographic deduplication, using SHA-256 would provide consistency across the codebase.

♻️ Proposed change to use SHA-256

 def fingerprint(entry: dict) -> str:
     canonical = json.dumps(
         {
             "t": entry.get("tool_name", ""),
             "p": entry.get("parameters", {}),
             "c": entry.get("context", {}),
         },
         sort_keys=True,
         default=str,
     )
-    return hashlib.sha1(canonical.encode()).hexdigest()
+    return hashlib.sha256(canonical.encode()).hexdigest()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_v037_split.py` around lines 31 - 41, The fingerprint function
currently uses SHA-1; change it to SHA-256 for consistency with the project's
MANIFEST.sha256 by replacing the hashlib.sha1(...) call with hashlib.sha256(...)
when hashing the canonical JSON in fingerprint(entry: dict) and keep using
.hexdigest() to return the hex string; ensure canonicalization via
json.dumps(..., sort_keys=True, default=str) remains unchanged so only the hash
algorithm is substituted.

scripts/eval_v037_holdout.py (2)

86-90: 💤 Low value

Consider explicit handling when no positive samples exist.

If tp + fn == 0 (no positive samples), the code returns recall = 0.0 due to max(tp + fn, 1). While this edge case is unlikely in production, it would be clearer to handle it explicitly or add an assertion that tp + fn > 0.

♻️ Proposed clarification

     pos = int((y == 1).sum())
     tp = int(((pred == 1) & (y == 1)).sum())
     fn = int(((pred == 0) & (y == 1)).sum())
+    if tp + fn == 0:
+        print("[error] no positive samples in holdout")
+        return 2
     recall = tp / max(tp + fn, 1)

Or simply add an assertion:

+    assert tp + fn > 0, "holdout must contain positive samples"
     recall = tp / max(tp + fn, 1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval_v037_holdout.py` around lines 86 - 90, Handle the
no-positive-samples edge explicitly: compute total_pos = tp + fn and if
total_pos == 0, set recall = 0.0 and set rlo, rhi to safe defaults (e.g., 0.0,
0.0 or None) instead of calling wilson_ci with zero trials; otherwise compute
recall = tp / total_pos and call wilson_ci(tp, total_pos). Update the block
using the variables pos, tp, fn, recall and the wilson_ci call to implement this
conditional or replace it with an assertion that total_pos > 0 if you prefer
failing fast.

16-23: 💤 Low value

sys.path manipulation is fragile.

The script manipulates sys.path to import from the scripts directory. This works when run from the repository root but may fail if executed from a different working directory. Consider making this a proper package or using relative imports if the scripts directory becomes a module.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval_v037_holdout.py` around lines 16 - 23, Currently the script
mutates sys.path (REPO and sys.path.insert) to import
train_adversarial_classifier which is fragile; replace that with a dynamic,
path-based import using importlib (spec_from_file_location / module_from_spec)
to load the file at REPO/"scripts"/"train_adversarial_classifier.py" and then
bind load_corpus_keyed, build_labels, build_features from that module, or
alternatively convert the scripts folder to a proper package and use a normal
package import (e.g., from scripts.train_adversarial_classifier import
load_corpus_keyed, build_labels, build_features) and remove the sys.path
manipulation and REPO logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bench/vaara-bench-v0.37.md`:
- Line 137: The fenced code block using triple backticks in the markdown
contains no language tag; update that opening fence (```) to include an
appropriate language hint (for example, ```bash) so markdownlint passes and the
snippet is syntax-highlighted and more readable.
- Around line 137-149: The reproduction steps change into tests/adversarial then
run scripts and reference src paths as if at repo root, which breaks copy-paste
execution; update the commands so paths are consistent from the repository root
(e.g. check MANIFEST.sha256 as tests/adversarial/MANIFEST.sha256 and keep
subsequent .venv/bin/python calls and --split-manifest
tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.

In `@docs/mit_ai_risk_repository_mapping.md`:
- Around line 86-90: The percentage for the "4. Malicious Actors & Misuse" row
is incorrect (267/268 ≈ 99.6% not 100%); update that table row's percentage cell
from "100%" to a correctly rounded value (e.g., "99.6%" or "99.63%") or add an
overall rounding note in the document explaining percentages are rounded to one
decimal place so counts may not exactly match; make the change where the table
row for "4. Malicious Actors & Misuse" and the surrounding summary/Total row are
defined.

In `@scripts/build_v037_split.py`:
- Around line 90-94: The JSON parsing try/except around json.loads(line) should
catch json.JSONDecodeError instead of a bare Exception: change the except block
to except json.JSONDecodeError: increment schema_dropped and continue; for any
other unexpected errors re-raise or let them propagate (or log and raise) so
they aren't silently swallowed—update the handling around json.loads(line), the
variable e, and the schema_dropped increment accordingly.
- Around line 73-74: Validate the two-character prefix before using it as a key:
check that cat_prefix (derived from f.name[:2]) exists in EXPECTED_CAT and
handle the unexpected case (e.g., log a clear warning and skip the file or raise
a descriptive exception) instead of doing EXPECTED_CAT[cat_prefix] directly;
update the code around cat_prefix and EXPECTED_CAT to perform the membership
test and a safe lookup to avoid KeyError.

In `@scripts/eval_v037_holdout.py`:
- Around line 99-107: The loop aggregating true positives uses e.get("expected")
instead of the precomputed label array y; change the per-bucket true-positive
check in the for (k, e), pr in zip(holdout, pred): block to reference the
corresponding label from y (use the same zip order or index into y) so that TP
counting aligns with build_labels and the recall calculation; update references
around per_cat, per_leg, per_cat_leg where TP is incremented (currently in the
if pr == 1 and e.get("expected") == "DENY" clause) to use y (e.g., y_entry
indicates DENY) instead of e.get("expected").

In `@scripts/generate_targeted_v037.py`:
- Around line 35-52: The call_chat function currently assumes the API returns a
JSON with choices[0].message.content and will raise KeyError/IndexError on
malformed responses; update call_chat to defensively validate the HTTP response
and parsed JSON: check HTTP status / ensure resp.read() is valid JSON, confirm
top-level "choices" is present and a non-empty list, that choices[0] contains a
"message" dict with a "content" key (and that content is a string), and if any
of those checks fail raise a clear exception (or return a descriptive error
string) so callers (e.g., main()) receive an informative error instead of a
cryptic KeyError/IndexError. Ensure these checks reference the call_chat
function and the "choices" → "message" → "content" path.

In `@scripts/v037_droplet_run.sh`:
- Around line 69-82: The readiness loop only checks that /v1/models is
reachable, not that it serves the expected model; update the script around the
PORT loop to, after curl succeeds, issue a second curl to
http://localhost:${PORT}/v1/models and inspect the JSON/text for the expected
model identifier (use a constant/variable like EXPECTED_MODEL or the literal
model name you expect), and if the returned list does not contain that model
then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to
stderr and exit non-zero; ensure this check runs before breaking out of the wait
loop so the script fails fast if another service is answering on the port.
- Line 57: The docker run command currently interpolates the secret into the
command line via -e "HF_TOKEN=${HF_TOKEN:-}", which exposes HF_TOKEN; change it
to pass only the environment variable name (e.g., -e HF_TOKEN or --env HF_TOKEN)
so Docker pulls the value from the process environment instead of embedding it,
or alternatively reference an --env-file if you need to supply multiple vars;
update the line containing -e "HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure
the runtime environment actually contains HF_TOKEN before invoking the script.
- Around line 51-67: The docker run invocation is redirecting the CLI output
(container ID) to vllm_llama33.log instead of the container's runtime logs;
after creating the container with the existing docker run -d ... --name
vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log" redirection and
instead launch docker logs -f vllm-llama33 >"${LOG_DIR}/vllm_llama33.log" 2>&1 &
(or equivalent backgrounded tail) so the file receives the container
stdout/stderr; apply the same change to the similar docker run / docker logs
pair around the other block (the one referenced near lines ~77-79) to ensure the
watcher/health-check tails real container logs.

In `@scripts/validate_v037.py`:
- Line 24: Guard the lookup into EXPECTED_CAT by first extracting the filename
prefix (prefix = f.name[:2]) and verifying it exists in EXPECTED_CAT; if the
prefix is missing, raise a clear ValueError (or log and skip) that includes the
offending filename so the validator fails with a helpful message instead of a
KeyError. Replace the direct lookup at the assignment to expected (where
EXPECTED_CAT[f.name[:2]] is used) with this check and then assign expected =
EXPECTED_CAT[prefix] only after the prefix is validated. Ensure the error
message names the file (f.name) and the invalid prefix to aid debugging.
- Around line 18-64: The main() function currently always returns 0 despite the
module claiming it should "fail-loud on any malformed entry"; change the
function to return a non-zero exit code when validation failures are detected
(e.g., if total_ok != total_in or any of n_miss/n_exp/n_sev/n_cat > 0 across
files) so callers see failure, or alternatively update the module docstring to
remove the "fail-loud" claim; update the return at the end of main() (and any
caller expectations) to reflect the chosen behavior.

---

Outside diff comments:
In `@README.md`:
- Around line 23-33: The README's top-line metrics (the initial headline
paragraph and the first few bullets that report recall, FPR, p99 latency, etc.)
are inconsistent with the declared current methodology vaara-bench-v0.37; either
replace those headline numbers with the v0.37 values or explicitly mark the
existing headline metrics as "historical (pre-v0.37)" and add updated v0.37
figures next to or below them; update the classifier recall/FPR, multi-attacker
PAIR, and latency bullets (the items referencing recall 85.0%, FPR 4.6%, 140 µs
p99, and the multi-attacker 0/25 result) so they match the vaara-bench-v0.37
subsection (or are clearly labeled as older), and ensure the vaara-bench-v0.37
reference remains accurate.

---

Nitpick comments:
In `@scripts/build_v037_split.py`:
- Around line 31-41: The fingerprint function currently uses SHA-1; change it to
SHA-256 for consistency with the project's MANIFEST.sha256 by replacing the
hashlib.sha1(...) call with hashlib.sha256(...) when hashing the canonical JSON
in fingerprint(entry: dict) and keep using .hexdigest() to return the hex
string; ensure canonicalization via json.dumps(..., sort_keys=True, default=str)
remains unchanged so only the hash algorithm is substituted.

In `@scripts/eval_v037_holdout.py`:
- Around line 86-90: Handle the no-positive-samples edge explicitly: compute
total_pos = tp + fn and if total_pos == 0, set recall = 0.0 and set rlo, rhi to
safe defaults (e.g., 0.0, 0.0 or None) instead of calling wilson_ci with zero
trials; otherwise compute recall = tp / total_pos and call wilson_ci(tp,
total_pos). Update the block using the variables pos, tp, fn, recall and the
wilson_ci call to implement this conditional or replace it with an assertion
that total_pos > 0 if you prefer failing fast.
- Around line 16-23: Currently the script mutates sys.path (REPO and
sys.path.insert) to import train_adversarial_classifier which is fragile;
replace that with a dynamic, path-based import using importlib
(spec_from_file_location / module_from_spec) to load the file at
REPO/"scripts"/"train_adversarial_classifier.py" and then bind
load_corpus_keyed, build_labels, build_features from that module, or
alternatively convert the scripts folder to a proper package and use a normal
package import (e.g., from scripts.train_adversarial_classifier import
load_corpus_keyed, build_labels, build_features) and remove the sys.path
manipulation and REPO logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2cdf0f5a-e520-4bf6-be3a-e232db2be1f0

📥 Commits

Reviewing files that changed from the base of the PR and between 660884a and f3d2ec4.

⛔ Files ignored due to path filters (3)

tests/adversarial/generated/DE-v037-llama33.jsonl is excluded by !**/generated/**
tests/adversarial/generated/PE-v037-llama33.jsonl is excluded by !**/generated/**
tests/adversarial/generated/TM-v037-llama33.jsonl is excluded by !**/generated/**

📒 Files selected for processing (19)

CHANGELOG.md
COMPLIANCE.md
README.md
bench/v037_eval_v8_holdout.json
bench/vaara-bench-v0.37.md
clients/ts/package.json
docs/mit_ai_risk_repository_mapping.md
pyproject.toml
scripts/_v037_common.py
scripts/build_v037_split.py
scripts/eval_v037_holdout.py
scripts/generate_targeted_v037.py
scripts/v037_droplet_run.sh
scripts/v037_local_watcher.sh
scripts/validate_v037.py
src/vaara/__init__.py
src/vaara/data/adversarial_classifier_v8.joblib
tests/adversarial/MANIFEST.sha256
tests/adversarial/v037_split.json

coderabbitai · 2026-05-26T22:23:07Z

+
+## Reproduction recipe
+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

At Line 137, use a language hint (e.g., bash) to satisfy markdownlint and improve readability.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 137-137: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bench/vaara-bench-v0.37.md` at line 137, The fenced code block using triple backticks in the markdown contains no language tag; update that opening fence (```) to include an appropriate language hint (for example, ```bash) so markdownlint passes and the snippet is syntax-highlighted and more readable.

coderabbitai · 2026-05-26T22:23:07Z

+```
+cd tests/adversarial && sha256sum -c MANIFEST.sha256
+.venv/bin/python scripts/validate_v037.py
+.venv/bin/python scripts/build_v037_split.py
+.venv/bin/python scripts/save_classifier_bundle.py \
+    --version v0.37 --threshold 0.9006 --embeddings \
+    --split-manifest tests/adversarial/v037_split.json \
+    --train-fold train \
+    --bundle-out src/vaara/data/adversarial_classifier_v8.joblib
+.venv/bin/python scripts/eval_v037_holdout.py \
+    --bundle src/vaara/data/adversarial_classifier_v8.joblib \
+    --json-out bench/v037_eval_v8_holdout.json
+```


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix reproduction command paths to be internally consistent.

At Line 138, the recipe changes directory to tests/adversarial, but subsequent commands at Lines 139-148 call scripts/... and src/... as if run from repo root. This likely makes the copy-paste reproduction flow fail.

Proposed doc fix

-``` -cd tests/adversarial && sha256sum -c MANIFEST.sha256 -.venv/bin/python scripts/validate_v037.py -.venv/bin/python scripts/build_v037_split.py +```bash +sha256sum -c tests/adversarial/MANIFEST.sha256 +.venv/bin/python scripts/validate_v037.py +.venv/bin/python scripts/build_v037_split.py .venv/bin/python scripts/save_classifier_bundle.py \ --version v0.37 --threshold 0.9006 --embeddings \ --split-manifest tests/adversarial/v037_split.json \ --train-fold train \ --bundle-out src/vaara/data/adversarial_classifier_v8.joblib .venv/bin/python scripts/eval_v037_holdout.py \ --bundle src/vaara/data/adversarial_classifier_v8.joblib \ --json-out bench/v037_eval_v8_holdout.json

</details> <details> <summary>🧰 Tools</summary> <details> <summary>🪛 markdownlint-cli2 (0.22.1)</summary> [warning] 137-137: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @bench/vaara-bench-v0.37.md around lines 137 - 149, The reproduction steps
change into tests/adversarial then run scripts and reference src paths as if at
repo root, which breaks copy-paste execution; update the commands so paths are
consistent from the repository root (e.g. check MANIFEST.sha256 as
tests/adversarial/MANIFEST.sha256 and keep subsequent .venv/bin/python calls and
--split-manifest tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.

</details>  

coderabbitai · 2026-05-26T22:23:07Z

+| 4. Malicious Actors & Misuse | 268 | 267 | 100% |
+| 5. Human-Computer Interaction | 106 | 106 | 100% |
+| 6. Socioeconomic and Environmental | 299 | 61 | 20% |
+| 7. AI System Safety, Failures & Limitations | 421 | 114 | 27% |
+| **Total** | **1,597** | **740** | **46%** |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the Domain 4 percentage value.

At Line 86, 267 / 268 is not exactly 100% (it is ~99.6%). Please update the percentage or add an explicit rounding note to avoid inconsistency with the counts.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/mit_ai_risk_repository_mapping.md` around lines 86 - 90, The percentage for the "4. Malicious Actors & Misuse" row is incorrect (267/268 ≈ 99.6% not 100%); update that table row's percentage cell from "100%" to a correctly rounded value (e.g., "99.6%" or "99.63%") or add an overall rounding note in the document explaining percentages are rounded to one decimal place so counts may not exactly match; make the change where the table row for "4. Malicious Actors & Misuse" and the surrounding summary/Total row are defined.

coderabbitai · 2026-05-26T22:23:07Z

+            cat_prefix = f.name[:2]
+            expected_cat = EXPECTED_CAT[cat_prefix]


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add validation for category prefix before dictionary lookup.

The code assumes the first two characters of the filename are a valid category prefix (TM/PE/DE), but there's no validation before the dictionary lookup. If an unexpected file matches the glob pattern, this will raise a KeyError.

🛡️ Proposed fix to add validation

relpath = str(f.relative_to(ADV)) cat_prefix = f.name[:2] + if cat_prefix not in EXPECTED_CAT: + print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')") + continue expected_cat = EXPECTED_CAT[cat_prefix]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cat_prefix = f.name[:2]

expected_cat = EXPECTED_CAT[cat_prefix]

cat_prefix = f.name[:2]

if cat_prefix not in EXPECTED_CAT:

print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')")

continue

expected_cat = EXPECTED_CAT[cat_prefix]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/build_v037_split.py` around lines 73 - 74, Validate the two-character prefix before using it as a key: check that cat_prefix (derived from f.name[:2]) exists in EXPECTED_CAT and handle the unexpected case (e.g., log a clear warning and skip the file or raise a descriptive exception) instead of doing EXPECTED_CAT[cat_prefix] directly; update the code around cat_prefix and EXPECTED_CAT to perform the membership test and a safe lookup to avoid KeyError.

coderabbitai · 2026-05-26T22:23:07Z

+                try:
+                    e = json.loads(line)
+                except Exception:
+                    schema_dropped += 1
+                    continue


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Catch specific exception type instead of broad Exception.

The bare except Exception: will catch all exceptions, potentially hiding unexpected errors. JSON parsing should specifically catch json.JSONDecodeError.

🐛 Proposed fix to catch specific exception

try: e = json.loads(line) - except Exception: + except json.JSONDecodeError: schema_dropped += 1 continue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/build_v037_split.py` around lines 90 - 94, The JSON parsing try/except around json.loads(line) should catch json.JSONDecodeError instead of a bare Exception: change the except block to except json.JSONDecodeError: increment schema_dropped and continue; for any other unexpected errors re-raise or let them propagate (or log and raise) so they aren't silently swallowed—update the handling around json.loads(line), the variable e, and the schema_dropped increment accordingly.

coderabbitai · 2026-05-26T22:23:08Z

+  docker run -d --rm \
+    --device=/dev/kfd --device=/dev/dri \
+    --group-add video --group-add render \
+    --security-opt seccomp=unconfined \
+    --cap-add=SYS_PTRACE \
+    --shm-size 32G --ipc=host --network host \
+    -e "HF_TOKEN=${HF_TOKEN:-}" \
+    -v "${HF_CACHE}":/root/.cache/huggingface \
+    --name vllm-llama33 \
+    rocm/vllm:latest \
+    vllm serve "${MODEL}" \
+      --host 0.0.0.0 --port "${PORT}" \
+      --max-model-len 8192 \
+      --enforce-eager \
+      --gpu-memory-utilization 0.92 \
+      --quantization fp8 \
+      >"${LOG_DIR}/vllm_llama33.log" 2>&1


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

When using docker run -d ... > logfile, does the redirection capture container runtime logs, or only Docker CLI stdout such as the container ID?

💡 Result:

docker run -d ... > logfile captures only the Docker CLI’s own standard output (typically the container ID printed because -d/--detach “print[s] the new container ID”)—it does not capture the container’s runtime logs. Reason (grounded in docs): in detached mode, docker run prints the new container ID to the client’s STDOUT [1], so > logfile redirects that STDOUT to your file. Container logs are instead retrieved via docker logs, which reads the container’s stored STDOUT/STDERR streams as logged by Docker’s logging mechanism [2]. If you want container runtime logs in a file, you’d redirect the output of docker logs (or use docker logs -f ... > logfile) rather than redirecting docker run itself.

Citations:

1: https://man.archlinux.org/man/docker-container-run.1.en.txt

2: https://docs.docker.com/engine/logging/

Fix vLLM container log capture (redirect docker logs, not docker run -d)

In scripts/v037_droplet_run.sh, redirecting &> "${LOG_DIR}/vllm_llama33.log" on docker run -d only captures Docker CLI stdout (typically the container ID), not the container’s runtime stdout/stderr. The health-check/watcher tailing vllm_llama33.log therefore won’t receive vLLM logs (same issue in the similar block around 77-79). Use docker logs -f vllm-llama33 redirected into the log file instead.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/v037_droplet_run.sh` around lines 51 - 67, The docker run invocation is redirecting the CLI output (container ID) to vllm_llama33.log instead of the container's runtime logs; after creating the container with the existing docker run -d ... --name vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log" redirection and instead launch docker logs -f vllm-llama33 >"${LOG_DIR}/vllm_llama33.log" 2>&1 & (or equivalent backgrounded tail) so the file receives the container stdout/stderr; apply the same change to the similar docker run / docker logs pair around the other block (the one referenced near lines ~77-79) to ensure the watcher/health-check tails real container logs.

coderabbitai · 2026-05-26T22:23:08Z

+    --security-opt seccomp=unconfined \
+    --cap-add=SYS_PTRACE \
+    --shm-size 32G --ipc=host --network host \
+    -e "HF_TOKEN=${HF_TOKEN:-}" \


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify we don't inline-expand HF_TOKEN in docker args. rg -n --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|--env HF_TOKEN' scripts/v037_droplet_run.sh

Repository: vaaraio/vaara

Length of output: 94

🏁 Script executed:

#!/bin/bash set -euo pipefail # 1) Show surrounding Docker command around line 57 sed -n '40,80p' scripts/v037_droplet_run.sh # 2) Search repo for any other HF_TOKEN inlined-expansion or env passthrough rg -n --hidden --no-ignore -S --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|-e\s+"HF_TOKEN=|--env\s+HF_TOKEN\b' .

Repository: vaaraio/vaara

Length of output: 1595

Avoid exposing HF_TOKEN in the Docker command line.

docker run in scripts/v037_droplet_run.sh uses -e "HF_TOKEN=${HF_TOKEN:-}", which interpolates the secret into the command’s process arguments. Use env-name passthrough instead.

🔧 Proposed fix

- -e "HF_TOKEN=${HF_TOKEN:-}" \ + --env HF_TOKEN \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

-e "HF_TOKEN=${HF_TOKEN:-}" \

--env HF_TOKEN \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/v037_droplet_run.sh` at line 57, The docker run command currently interpolates the secret into the command line via -e "HF_TOKEN=${HF_TOKEN:-}", which exposes HF_TOKEN; change it to pass only the environment variable name (e.g., -e HF_TOKEN or --env HF_TOKEN) so Docker pulls the value from the process environment instead of embedding it, or alternatively reference an --env-file if you need to supply multiple vars; update the line containing -e "HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure the runtime environment actually contains HF_TOKEN before invoking the script.

coderabbitai · 2026-05-26T22:23:08Z

+  echo "[v037] waiting for /v1/models (max 30 min)"
+  for i in $(seq 1 180); do
+    if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
+      echo "[v037] vllm healthy after ${i} x 10s"
+      break
+    fi
+    sleep 10
+  done
+  if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
+    echo "[v037] vllm did not become healthy; tail of log:" >&2
+    tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
+    exit 2
+  fi
+fi


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if /v1/models is healthy but serving the wrong model.

Readiness checks only endpoint availability. If another service still owns port 8000, generators can run against the wrong model.

🔧 Proposed fix

if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then echo "[v037] vllm did not become healthy; tail of log:" >&2 tail -40 "${LOG_DIR}/vllm_llama33.log" >&2 exit 2 fi + + actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \ + | grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \ + | head -1 \ + | grep -oE '"[^"]*"$' \ + | tr -d '"')" + if [[ "${actual_model}" != "${MODEL}" ]]; then + echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2 + exit 3 + fi fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

echo "[v037] waiting for /v1/models (max 30 min)"

for i in $(seq 1 180); do

if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then

echo "[v037] vllm healthy after ${i} x 10s"

break

fi

sleep 10

done

if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then

echo "[v037] vllm did not become healthy; tail of log:" >&2

tail -40 "${LOG_DIR}/vllm_llama33.log" >&2

exit 2

fi

fi

echo "[v037] waiting for /v1/models (max 30 min)"

for i in $(seq 1 180); do

if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then

echo "[v037] vllm healthy after ${i} x 10s"

break

fi

sleep 10

done

if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then

echo "[v037] vllm did not become healthy; tail of log:" >&2

tail -40 "${LOG_DIR}/vllm_llama33.log" >&2

exit 2

fi

actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \

| grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \

| head -1 \

| grep -oE '"[^"]*"$' \

| tr -d '"')"

if [[ "${actual_model}" != "${MODEL}" ]]; then

echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2

exit 3

fi

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/v037_droplet_run.sh` around lines 69 - 82, The readiness loop only checks that /v1/models is reachable, not that it serves the expected model; update the script around the PORT loop to, after curl succeeds, issue a second curl to http://localhost:${PORT}/v1/models and inspect the JSON/text for the expected model identifier (use a constant/variable like EXPECTED_MODEL or the literal model name you expect), and if the returned list does not contain that model then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to stderr and exit non-zero; ensure this check runs before breaking out of the wait loop so the script fails fast if another service is answering on the port.

coderabbitai · 2026-05-26T22:23:08Z

+def main() -> int:
+    files = sorted(GEN.glob("*-v037-*.jsonl"))
+    print(f"{len(files)} v037 files")
+    total_in = total_ok = 0
+    for f in files:
+        n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
+        expected = EXPECTED_CAT[f.name[:2]]
+        for line in f.read_text().splitlines():
+            line = line.strip()
+            if not line:
+                continue
+            n_in += 1
+            try:
+                e = json.loads(line)
+            except Exception:
+                n_miss += 1
+                continue
+            if not all(k in e for k in REQUIRED):
+                n_miss += 1
+                continue
+            if "original_task" not in (e.get("context") or {}):
+                n_miss += 1
+                continue
+            if e.get("expected") != "DENY":
+                n_exp += 1
+                continue
+            if e.get("severity") not in VALID_SEV:
+                n_sev += 1
+                continue
+            if e.get("category") != expected:
+                n_cat += 1
+                continue
+            n_ok += 1
+        flags = []
+        if n_miss:
+            flags.append(f"missing={n_miss}")
+        if n_exp:
+            flags.append(f"bad-expected={n_exp}")
+        if n_sev:
+            flags.append(f"bad-sev={n_sev}")
+        if n_cat:
+            flags.append(f"wrong-cat={n_cat}")
+        total_in += n_in
+        total_ok += n_ok
+        print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")
+    print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
+    return 0


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validator exit code doesn't match "fail-loud" claim in docstring.

The docstring states "Fail-loud on any malformed entry," but the function always returns 0 (line 64), even when validation errors are detected. This means downstream steps won't automatically fail if bad entries are found.

Consider returning a non-zero exit code when validation failures are detected to match the fail-loud intent, or update the docstring to clarify that this is a report-only validator.

📋 Proposed fix to fail on validation errors

print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}") - return 0 + return 0 if total_in == total_ok else 1

Or update the docstring:

-"""Validate v037 generated jsonl: schema + required-field + per-cat consistency. - -Mirrors validate_v036.py with the v037 glob. Fail-loud on any malformed entry -so the build_split step gets a clean input. -""" +"""Validate v037 generated jsonl: schema + required-field + per-cat consistency. + +Mirrors validate_v036.py with the v037 glob. Reports validation statistics +for manual review before the build_split step. +"""

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def main() -> int:

files = sorted(GEN.glob("*-v037-*.jsonl"))

print(f"{len(files)} v037 files")

total_in = total_ok = 0

for f in files:

n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0

expected = EXPECTED_CAT[f.name[:2]]

for line in f.read_text().splitlines():

line = line.strip()

if not line:

continue

n_in += 1

try:

e = json.loads(line)

except Exception:

n_miss += 1

continue

if not all(k in e for k in REQUIRED):

n_miss += 1

continue

if "original_task" not in (e.get("context") or {}):

n_miss += 1

continue

if e.get("expected") != "DENY":

n_exp += 1

continue

if e.get("severity") not in VALID_SEV:

n_sev += 1

continue

if e.get("category") != expected:

n_cat += 1

continue

n_ok += 1

flags = []

if n_miss:

flags.append(f"missing={n_miss}")

if n_exp:

flags.append(f"bad-expected={n_exp}")

if n_sev:

flags.append(f"bad-sev={n_sev}")

if n_cat:

flags.append(f"wrong-cat={n_cat}")

total_in += n_in

total_ok += n_ok

print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")

print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")

return 0

def main() -> int:

files = sorted(GEN.glob("*-v037-*.jsonl"))

print(f"{len(files)} v037 files")

total_in = total_ok = 0

for f in files:

n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0

expected = EXPECTED_CAT[f.name[:2]]

for line in f.read_text().splitlines():

line = line.strip()

if not line:

continue

n_in += 1

try:

e = json.loads(line)

except Exception:

n_miss += 1

continue

if not all(k in e for k in REQUIRED):

n_miss += 1

continue

if "original_task" not in (e.get("context") or {}):

n_miss += 1

continue

if e.get("expected") != "DENY":

n_exp += 1

continue

if e.get("severity") not in VALID_SEV:

n_sev += 1

continue

if e.get("category") != expected:

n_cat += 1

continue

n_ok += 1

flags = []

if n_miss:

flags.append(f"missing={n_miss}")

if n_exp:

flags.append(f"bad-expected={n_exp}")

if n_sev:

flags.append(f"bad-sev={n_sev}")

if n_cat:

flags.append(f"wrong-cat={n_cat}")

total_in += n_in

total_ok += n_ok

print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")

print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")

return 0 if total_in == total_ok else 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/validate_v037.py` around lines 18 - 64, The main() function currently always returns 0 despite the module claiming it should "fail-loud on any malformed entry"; change the function to return a non-zero exit code when validation failures are detected (e.g., if total_ok != total_in or any of n_miss/n_exp/n_sev/n_cat > 0 across files) so callers see failure, or alternatively update the module docstring to remove the "fail-loud" claim; update the return at the end of main() (and any caller expectations) to reflect the chosen behavior.

coderabbitai · 2026-05-26T22:23:08Z

+    total_in = total_ok = 0
+    for f in files:
+        n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
+        expected = EXPECTED_CAT[f.name[:2]]


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add defensive check for unexpected filename prefix.

If a file in the directory doesn't start with a recognized prefix (TM/PE/DE), line 24 will raise KeyError when looking up f.name[:2] in EXPECTED_CAT. This would crash the validator.

🛡️ Proposed fix to handle unknown prefixes

+ prefix = f.name[:2] + if prefix not in EXPECTED_CAT: + print(f"SKIP {f.name}: unknown prefix {prefix}") + continue - expected = EXPECTED_CAT[f.name[:2]] + expected = EXPECTED_CAT[prefix]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/validate_v037.py` at line 24, Guard the lookup into EXPECTED_CAT by first extracting the filename prefix (prefix = f.name[:2]) and verifying it exists in EXPECTED_CAT; if the prefix is missing, raise a clear ValueError (or log and skip) that includes the offending filename so the validator fails with a helpful message instead of a KeyError. Replace the direct lookup at the assignment to expected (where EXPECTED_CAT[f.name[:2]] is used) with this check and then assign expected = EXPECTED_CAT[prefix] only after the prefix is validated. Ensure the error message names the file (f.name) and the invalid prefix to aid debugging.

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

vaaraio merged commit 8e284e0 into main May 26, 2026
12 checks passed

vaaraio deleted the release/v0.37.0 branch May 26, 2026 22:23

coderabbitai Bot mentioned this pull request May 27, 2026

release(v0.38.0): PAIR scale-up to n=300 per attacker family, 88.4% recall #147

Merged

5 tasks

		cat_prefix = f.name[:2]
		expected_cat = EXPECTED_CAT[cat_prefix]

Conversation

vaaraio commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline numbers

What ships

What changed

Test plan

Methodology notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaaraio commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading