Skip to content

release(v0.37.0): third attacker family + v8 classifier closes worst v0.36 sub-cell#144

Merged
vaaraio merged 1 commit into
mainfrom
release/v0.37.0
May 26, 2026
Merged

release(v0.37.0): third attacker family + v8 classifier closes worst v0.36 sub-cell#144
vaaraio merged 1 commit into
mainfrom
release/v0.37.0

Conversation

@vaaraio
Copy link
Copy Markdown
Owner

@vaaraio vaaraio commented May 26, 2026

Summary

Third attacker family added to the cross-model held-out fold and v8 classifier ships as the new production default. Closes the worst v0.36 sub-cell by 12.9 pp on the same 700 Claude data_exfil entries (26.0% under v7 to 38.9% under v8), covers a third attacker family at 85.8% overall recall, and holds in-distribution at 86.6% on v035 TEST with FPR inside the prior confidence interval.

Headline numbers

eval v8 v7 (prior production) Δ
v035 TEST recall at T=0.9006 86.6% [84.5, 88.6] 85.3% [83.1, 87.3] +1.3 pp
v035 TEST FPR 5.0% [3.6, 6.8] 4.6% [3.3, 6.3] +0.4 pp (within CI)
v037 holdout overall (n=2,277) 66.8% [64.9, 68.7] n/a new
Llama-3.3 leg (third family, n=887) 85.8% [83.3, 87.9] n/a new
data_exfil x Claude (carry-forward, n=700) 38.9% [35.3, 42.5] 26.0% [22.9, 29.4] +12.9 pp
TRAIN entries 11,287 8,501 +2,786

What ships

  1. v8 classifier. src/vaara/data/adversarial_classifier_v8.joblib, 638 features (254 hand plus 384 MiniLM embedding), trained on v035 TRAIN plus v036 Mixtral and Claude TM/PE folded in. v7 retained on disk for cross-eval reproducibility. Threshold unchanged at 0.9006.
  2. v037 cross-model holdout. 900 entries from RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic on AMD-backed MI300X via rocm/vllm:latest. 13 schema-invalid TM dropped, 887 valid (TM 287, PE 300, DE 300). Combined with v036 DE both legs for the new 2,277-entry v037 holdout fold.
  3. v0.37 split manifest. tests/adversarial/v037_split.json inherits v035_split verbatim, folds v036 TM/PE into train, marks v036 DE plus the full v037 leg as holdout.
  4. SEP-2787 reference implementation. Already on main via PR feat(attestation): add SEP-2787 reference implementation (proposed shape) #139, bundled into the v0.37 release notes for one clean validator-stack line.
  5. MIT AI Risk Repository v4 position document. docs/mit_ai_risk_repository_mapping.md plus the COMPLIANCE.md position paragraph.

What changed

New files: bench/vaara-bench-v0.37.md, bench/v037_eval_v8_holdout.json, tests/adversarial/v037_split.json, src/vaara/data/adversarial_classifier_v8.joblib, three v037 generated jsonl files, seven v037 scripts (_v037_common.py, generate_targeted_v037.py, v037_droplet_run.sh, v037_local_watcher.sh, validate_v037.py, build_v037_split.py, eval_v037_holdout.py), docs/mit_ai_risk_repository_mapping.md.

Updated: tests/adversarial/MANIFEST.sha256 (v037 entries appended, 302 to 305), version bumps across pyproject.toml + clients/ts/package.json + src/vaara/__init__.py (0.36.0 to 0.37.0), CHANGELOG, README bench pointer, COMPLIANCE MIT Risk Repository position section.

Test plan

  • Ruff clean on all new and modified Python files
  • Schema validation: 887 of 900 v037 entries pass (13 TM dropped on bad expected)
  • v8 on v035 TEST: 86.6% recall at 5.0% FPR, no regression vs v7 85.3% at 4.6%
  • v8 on v037 holdout: 66.8% overall, per-category-per-leg cuts published in bench/v037_eval_v8_holdout.json
  • MANIFEST.sha256 verifies all 305 entries (cd tests/adversarial && sha256sum -c MANIFEST.sha256)
  • Droplet poweroff issued post-rsync, no idle compute cost

Methodology notes

  • RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic ships with compressed-tensors quantization in its config. The explicit --quantization fp8 argument conflicts and must be dropped. vLLM auto-detects the scheme from the model config and serves correctly.
  • The data_exfil category remains the hardest cross-model surface across all three attacker families now in scope. PE and TM generalise to Llama-3.3 cleanly. DE generalises unevenly. That is the v0.38 public-benchmark and PAIR-at-scale line of work.
  • Total Llama-3.3 generation: 22 minutes wall clock at steady-state 40 entries/min combined across three parallel category generators on one MI300X SR-IOV droplet.

Summary by CodeRabbit

  • New Features

    • Released v0.37 with upgraded production classifier delivering improved detection capabilities.
    • Expanded evaluation coverage across multiple attack scenarios and model families.
  • Documentation

    • Added MIT AI Risk Repository v4 mapping documentation.
    • Updated benchmark methodology and detailed v0.37 evaluation results.

Review Change Stack

…v0.36 sub-cell

Adds a third attacker family to the cross-model held-out fold and
ships a v8 classifier that trains on the v035 + v036 TM/PE union and
keeps v036 DE both legs plus the full v037 Llama-3.3 leg as the new
held-out evaluation. Closes the worst sub-cell from v0.36 by 12.9 pp
on the same 700 Claude DE entries (26.0% under v7 to 38.9% under v8),
covers a third attacker family at 85.8% overall recall, and holds
in-distribution at 86.6% on v035 TEST with FPR inside the prior
confidence interval.

Generation provenance: 900 entries from
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic on AMD-backed MI300X
SR-IOV under rocm/vllm:latest, three parallel category generators, 22
minutes wall clock, droplet poweroff post-rsync. 13 schema-invalid TM
entries dropped, 887 valid (TM 287, PE 300, DE 300).

v8 production bundle ships at
src/vaara/data/adversarial_classifier_v8.joblib. v7 retained on disk
for cross-eval reproducibility. Threshold unchanged at 0.9006.

Also bundles the MIT AI Risk Repository v4 position document and the
SEP-2787 Tool Call Attestation reference implementation that landed
on main via PR #139. Both fit the v0.37 ship window and add one clean
line each to the validator stack and the public taxonomy positioning.

Full methodology, chain of custody, ship gate, and named limits in
bench/vaara-bench-v0.37.md.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

v0.37 release updates project versions across versioning files, documents benchmark methodology and cross-model evaluation results, implements adversarial data generation infrastructure with Llama-3.3 targeting, adds remote droplet orchestration for large-scale generation, and updates data integrity manifest.

Changes

v0.37 Release Delivery

Layer / File(s) Summary
Release versioning and documentation
CHANGELOG.md, COMPLIANCE.md, README.md, clients/ts/package.json, pyproject.toml, src/vaara/__init__.py
Version bumps from 0.36.0 to 0.37.0 across all versioning files; release notes in CHANGELOG.md detail v0.37 theme including Llama-3.3 generation, split/validation/eval artifacts, and v8 classifier bundle; COMPLIANCE.md adds MIT AI Risk Repository v4 coverage section; README.md updates bench reference and adds MIT mapping doc table entry.
Benchmark methodology, results, and coverage mapping
bench/vaara-bench-v0.37.md, bench/v037_eval_v8_holdout.json, docs/mit_ai_risk_repository_mapping.md
vaara-bench-v0.37.md documents two methodology changes (third held-out attacker family, retraining v8 on union corpus), production headline metrics, cross-model held-out recall breakdowns by category/leg, generation provenance, reproduction recipe, and operational limits; v037_eval_v8_holdout.json records detailed evaluation results including overall and per-group recalls with Wilson confidence intervals; new mapping doc describes Vaara's coverage of MIT AI Risk Repository v4 domains and sub-domains.
V037 adversarial generation shared infrastructure
scripts/_v037_common.py
Shared configuration (category instructions, schema template, prompts) and utilities: load_seeds (sample from JSON file), build_user_message (schema substitution with seed embedding), parse_jsonl (category filtering), fingerprint (SHA-1 dedup over tool_name/parameters/context), load_prior_fingerprints (dedupe set from existing JSONL).
V037 adversarial generation and validation
scripts/generate_targeted_v037.py, scripts/validate_v037.py
generate_targeted_v037.py calls Llama-3.3 vLLM endpoint with category-specific prompts, fingerprints and deduplicates output JSONL, and appends unique entries with batch error handling. validate_v037.py scans generated JSONL files, enforces schema (required fields, context.original_task presence, expected == DENY, severity allowlist, category match from filename), and reports per-file and total validation statistics.
V037 data composition and evaluation
scripts/build_v037_split.py, scripts/eval_v037_holdout.py
build_v037_split.py composes v037_split.json by inheriting v035 assignments, folding v036 TM/PE into TRAIN, keeping v036 DE holdout, and adding v037 Llama entries; deduplicates by fingerprint and drops invalid records. eval_v037_holdout.py loads classifier bundle, builds feature matrices from holdout split, computes recall with Wilson confidence intervals, and aggregates per-category, per-leg, and per-category×leg breakdowns.
Remote generation orchestration
scripts/v037_droplet_run.sh, scripts/v037_local_watcher.sh
v037_droplet_run.sh launches ROCm vLLM container, waits for model endpoint, and starts three parallel generator processes (TM/PE/DE) under nohup with idempotent re-run safety. v037_local_watcher.sh polls remote generator progress via rsync, tracks per-category line counts, verifies vLLM model ID, logs state transitions, and upon target completion, optionally auto-shuts down droplet via doctl or exits with manual shutdown instructions.
Data integrity tracking
tests/adversarial/MANIFEST.sha256
Three new SHA256 manifest entries for generated v037 Llama-3.3 datasets: DE-v037-llama33.jsonl, PE-v037-llama33.jsonl, TM-v037-llama33.jsonl.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • vaaraio/vaara#48: Updates README.md benchmark methodology section; v0.37 further evolves the benchmark scope and reported recall metrics.

Poem

🐇 A v-zero-three-seven hop, so fine,
With Llama's third family in line,
Split, validate, eval with care,
Watch the droplets run with flair!
Data blessed with checksums bright,
Release complete—a fuzzy delight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the primary change: a v0.37 release with a third attacker family (Llama-3.3) and an improved v8 classifier that resolves the worst-performing sub-cell from v0.36.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch release/v0.37.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
README.md (1)

23-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Numbers section mixes stale headline metrics with the new “current” v0.37 benchmark.

Lines 23-27 still present older benchmark headlines while Line 32 declares v0.37 as current methodology. Please either update the top-line metrics to v0.37 or explicitly label the older figures as historical to avoid reader confusion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` around lines 23 - 33, The README's top-line metrics (the initial
headline paragraph and the first few bullets that report recall, FPR, p99
latency, etc.) are inconsistent with the declared current methodology
vaara-bench-v0.37; either replace those headline numbers with the v0.37 values
or explicitly mark the existing headline metrics as "historical (pre-v0.37)" and
add updated v0.37 figures next to or below them; update the classifier
recall/FPR, multi-attacker PAIR, and latency bullets (the items referencing
recall 85.0%, FPR 4.6%, 140 µs p99, and the multi-attacker 0/25 result) so they
match the vaara-bench-v0.37 subsection (or are clearly labeled as older), and
ensure the vaara-bench-v0.37 reference remains accurate.
🧹 Nitpick comments (3)
scripts/build_v037_split.py (1)

31-41: 💤 Low value

Consider SHA-256 for consistency with project standards.

The MANIFEST.sha256 file (mentioned in PR description) uses SHA-256, while this fingerprint function uses SHA-1. While SHA-1 is acceptable for non-cryptographic deduplication, using SHA-256 would provide consistency across the codebase.

♻️ Proposed change to use SHA-256
 def fingerprint(entry: dict) -> str:
     canonical = json.dumps(
         {
             "t": entry.get("tool_name", ""),
             "p": entry.get("parameters", {}),
             "c": entry.get("context", {}),
         },
         sort_keys=True,
         default=str,
     )
-    return hashlib.sha1(canonical.encode()).hexdigest()
+    return hashlib.sha256(canonical.encode()).hexdigest()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_v037_split.py` around lines 31 - 41, The fingerprint function
currently uses SHA-1; change it to SHA-256 for consistency with the project's
MANIFEST.sha256 by replacing the hashlib.sha1(...) call with hashlib.sha256(...)
when hashing the canonical JSON in fingerprint(entry: dict) and keep using
.hexdigest() to return the hex string; ensure canonicalization via
json.dumps(..., sort_keys=True, default=str) remains unchanged so only the hash
algorithm is substituted.
scripts/eval_v037_holdout.py (2)

86-90: 💤 Low value

Consider explicit handling when no positive samples exist.

If tp + fn == 0 (no positive samples), the code returns recall = 0.0 due to max(tp + fn, 1). While this edge case is unlikely in production, it would be clearer to handle it explicitly or add an assertion that tp + fn > 0.

♻️ Proposed clarification
     pos = int((y == 1).sum())
     tp = int(((pred == 1) & (y == 1)).sum())
     fn = int(((pred == 0) & (y == 1)).sum())
+    if tp + fn == 0:
+        print("[error] no positive samples in holdout")
+        return 2
     recall = tp / max(tp + fn, 1)

Or simply add an assertion:

+    assert tp + fn > 0, "holdout must contain positive samples"
     recall = tp / max(tp + fn, 1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval_v037_holdout.py` around lines 86 - 90, Handle the
no-positive-samples edge explicitly: compute total_pos = tp + fn and if
total_pos == 0, set recall = 0.0 and set rlo, rhi to safe defaults (e.g., 0.0,
0.0 or None) instead of calling wilson_ci with zero trials; otherwise compute
recall = tp / total_pos and call wilson_ci(tp, total_pos). Update the block
using the variables pos, tp, fn, recall and the wilson_ci call to implement this
conditional or replace it with an assertion that total_pos > 0 if you prefer
failing fast.

16-23: 💤 Low value

sys.path manipulation is fragile.

The script manipulates sys.path to import from the scripts directory. This works when run from the repository root but may fail if executed from a different working directory. Consider making this a proper package or using relative imports if the scripts directory becomes a module.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/eval_v037_holdout.py` around lines 16 - 23, Currently the script
mutates sys.path (REPO and sys.path.insert) to import
train_adversarial_classifier which is fragile; replace that with a dynamic,
path-based import using importlib (spec_from_file_location / module_from_spec)
to load the file at REPO/"scripts"/"train_adversarial_classifier.py" and then
bind load_corpus_keyed, build_labels, build_features from that module, or
alternatively convert the scripts folder to a proper package and use a normal
package import (e.g., from scripts.train_adversarial_classifier import
load_corpus_keyed, build_labels, build_features) and remove the sys.path
manipulation and REPO logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bench/vaara-bench-v0.37.md`:
- Line 137: The fenced code block using triple backticks in the markdown
contains no language tag; update that opening fence (```) to include an
appropriate language hint (for example, ```bash) so markdownlint passes and the
snippet is syntax-highlighted and more readable.
- Around line 137-149: The reproduction steps change into tests/adversarial then
run scripts and reference src paths as if at repo root, which breaks copy-paste
execution; update the commands so paths are consistent from the repository root
(e.g. check MANIFEST.sha256 as tests/adversarial/MANIFEST.sha256 and keep
subsequent .venv/bin/python calls and --split-manifest
tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.

In `@docs/mit_ai_risk_repository_mapping.md`:
- Around line 86-90: The percentage for the "4. Malicious Actors & Misuse" row
is incorrect (267/268 ≈ 99.6% not 100%); update that table row's percentage cell
from "100%" to a correctly rounded value (e.g., "99.6%" or "99.63%") or add an
overall rounding note in the document explaining percentages are rounded to one
decimal place so counts may not exactly match; make the change where the table
row for "4. Malicious Actors & Misuse" and the surrounding summary/Total row are
defined.

In `@scripts/build_v037_split.py`:
- Around line 90-94: The JSON parsing try/except around json.loads(line) should
catch json.JSONDecodeError instead of a bare Exception: change the except block
to except json.JSONDecodeError: increment schema_dropped and continue; for any
other unexpected errors re-raise or let them propagate (or log and raise) so
they aren't silently swallowed—update the handling around json.loads(line), the
variable e, and the schema_dropped increment accordingly.
- Around line 73-74: Validate the two-character prefix before using it as a key:
check that cat_prefix (derived from f.name[:2]) exists in EXPECTED_CAT and
handle the unexpected case (e.g., log a clear warning and skip the file or raise
a descriptive exception) instead of doing EXPECTED_CAT[cat_prefix] directly;
update the code around cat_prefix and EXPECTED_CAT to perform the membership
test and a safe lookup to avoid KeyError.

In `@scripts/eval_v037_holdout.py`:
- Around line 99-107: The loop aggregating true positives uses e.get("expected")
instead of the precomputed label array y; change the per-bucket true-positive
check in the for (k, e), pr in zip(holdout, pred): block to reference the
corresponding label from y (use the same zip order or index into y) so that TP
counting aligns with build_labels and the recall calculation; update references
around per_cat, per_leg, per_cat_leg where TP is incremented (currently in the
if pr == 1 and e.get("expected") == "DENY" clause) to use y (e.g., y_entry
indicates DENY) instead of e.get("expected").

In `@scripts/generate_targeted_v037.py`:
- Around line 35-52: The call_chat function currently assumes the API returns a
JSON with choices[0].message.content and will raise KeyError/IndexError on
malformed responses; update call_chat to defensively validate the HTTP response
and parsed JSON: check HTTP status / ensure resp.read() is valid JSON, confirm
top-level "choices" is present and a non-empty list, that choices[0] contains a
"message" dict with a "content" key (and that content is a string), and if any
of those checks fail raise a clear exception (or return a descriptive error
string) so callers (e.g., main()) receive an informative error instead of a
cryptic KeyError/IndexError. Ensure these checks reference the call_chat
function and the "choices" → "message" → "content" path.

In `@scripts/v037_droplet_run.sh`:
- Around line 69-82: The readiness loop only checks that /v1/models is
reachable, not that it serves the expected model; update the script around the
PORT loop to, after curl succeeds, issue a second curl to
http://localhost:${PORT}/v1/models and inspect the JSON/text for the expected
model identifier (use a constant/variable like EXPECTED_MODEL or the literal
model name you expect), and if the returned list does not contain that model
then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to
stderr and exit non-zero; ensure this check runs before breaking out of the wait
loop so the script fails fast if another service is answering on the port.
- Line 57: The docker run command currently interpolates the secret into the
command line via -e "HF_TOKEN=${HF_TOKEN:-}", which exposes HF_TOKEN; change it
to pass only the environment variable name (e.g., -e HF_TOKEN or --env HF_TOKEN)
so Docker pulls the value from the process environment instead of embedding it,
or alternatively reference an --env-file if you need to supply multiple vars;
update the line containing -e "HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure
the runtime environment actually contains HF_TOKEN before invoking the script.
- Around line 51-67: The docker run invocation is redirecting the CLI output
(container ID) to vllm_llama33.log instead of the container's runtime logs;
after creating the container with the existing docker run -d ... --name
vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log" redirection and
instead launch docker logs -f vllm-llama33 >"${LOG_DIR}/vllm_llama33.log" 2>&1 &
(or equivalent backgrounded tail) so the file receives the container
stdout/stderr; apply the same change to the similar docker run / docker logs
pair around the other block (the one referenced near lines ~77-79) to ensure the
watcher/health-check tails real container logs.

In `@scripts/validate_v037.py`:
- Line 24: Guard the lookup into EXPECTED_CAT by first extracting the filename
prefix (prefix = f.name[:2]) and verifying it exists in EXPECTED_CAT; if the
prefix is missing, raise a clear ValueError (or log and skip) that includes the
offending filename so the validator fails with a helpful message instead of a
KeyError. Replace the direct lookup at the assignment to expected (where
EXPECTED_CAT[f.name[:2]] is used) with this check and then assign expected =
EXPECTED_CAT[prefix] only after the prefix is validated. Ensure the error
message names the file (f.name) and the invalid prefix to aid debugging.
- Around line 18-64: The main() function currently always returns 0 despite the
module claiming it should "fail-loud on any malformed entry"; change the
function to return a non-zero exit code when validation failures are detected
(e.g., if total_ok != total_in or any of n_miss/n_exp/n_sev/n_cat > 0 across
files) so callers see failure, or alternatively update the module docstring to
remove the "fail-loud" claim; update the return at the end of main() (and any
caller expectations) to reflect the chosen behavior.

---

Outside diff comments:
In `@README.md`:
- Around line 23-33: The README's top-line metrics (the initial headline
paragraph and the first few bullets that report recall, FPR, p99 latency, etc.)
are inconsistent with the declared current methodology vaara-bench-v0.37; either
replace those headline numbers with the v0.37 values or explicitly mark the
existing headline metrics as "historical (pre-v0.37)" and add updated v0.37
figures next to or below them; update the classifier recall/FPR, multi-attacker
PAIR, and latency bullets (the items referencing recall 85.0%, FPR 4.6%, 140 µs
p99, and the multi-attacker 0/25 result) so they match the vaara-bench-v0.37
subsection (or are clearly labeled as older), and ensure the vaara-bench-v0.37
reference remains accurate.

---

Nitpick comments:
In `@scripts/build_v037_split.py`:
- Around line 31-41: The fingerprint function currently uses SHA-1; change it to
SHA-256 for consistency with the project's MANIFEST.sha256 by replacing the
hashlib.sha1(...) call with hashlib.sha256(...) when hashing the canonical JSON
in fingerprint(entry: dict) and keep using .hexdigest() to return the hex
string; ensure canonicalization via json.dumps(..., sort_keys=True, default=str)
remains unchanged so only the hash algorithm is substituted.

In `@scripts/eval_v037_holdout.py`:
- Around line 86-90: Handle the no-positive-samples edge explicitly: compute
total_pos = tp + fn and if total_pos == 0, set recall = 0.0 and set rlo, rhi to
safe defaults (e.g., 0.0, 0.0 or None) instead of calling wilson_ci with zero
trials; otherwise compute recall = tp / total_pos and call wilson_ci(tp,
total_pos). Update the block using the variables pos, tp, fn, recall and the
wilson_ci call to implement this conditional or replace it with an assertion
that total_pos > 0 if you prefer failing fast.
- Around line 16-23: Currently the script mutates sys.path (REPO and
sys.path.insert) to import train_adversarial_classifier which is fragile;
replace that with a dynamic, path-based import using importlib
(spec_from_file_location / module_from_spec) to load the file at
REPO/"scripts"/"train_adversarial_classifier.py" and then bind
load_corpus_keyed, build_labels, build_features from that module, or
alternatively convert the scripts folder to a proper package and use a normal
package import (e.g., from scripts.train_adversarial_classifier import
load_corpus_keyed, build_labels, build_features) and remove the sys.path
manipulation and REPO logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2cdf0f5a-e520-4bf6-be3a-e232db2be1f0

📥 Commits

Reviewing files that changed from the base of the PR and between 660884a and f3d2ec4.

⛔ Files ignored due to path filters (3)
  • tests/adversarial/generated/DE-v037-llama33.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/PE-v037-llama33.jsonl is excluded by !**/generated/**
  • tests/adversarial/generated/TM-v037-llama33.jsonl is excluded by !**/generated/**
📒 Files selected for processing (19)
  • CHANGELOG.md
  • COMPLIANCE.md
  • README.md
  • bench/v037_eval_v8_holdout.json
  • bench/vaara-bench-v0.37.md
  • clients/ts/package.json
  • docs/mit_ai_risk_repository_mapping.md
  • pyproject.toml
  • scripts/_v037_common.py
  • scripts/build_v037_split.py
  • scripts/eval_v037_holdout.py
  • scripts/generate_targeted_v037.py
  • scripts/v037_droplet_run.sh
  • scripts/v037_local_watcher.sh
  • scripts/validate_v037.py
  • src/vaara/__init__.py
  • src/vaara/data/adversarial_classifier_v8.joblib
  • tests/adversarial/MANIFEST.sha256
  • tests/adversarial/v037_split.json


## Reproduction recipe

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced code block.

At Line 137, use a language hint (e.g., bash) to satisfy markdownlint and improve readability.

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 137-137: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@bench/vaara-bench-v0.37.md` at line 137, The fenced code block using triple
backticks in the markdown contains no language tag; update that opening fence
(```) to include an appropriate language hint (for example, ```bash) so
markdownlint passes and the snippet is syntax-highlighted and more readable.

Comment on lines +137 to +149
```
cd tests/adversarial && sha256sum -c MANIFEST.sha256
.venv/bin/python scripts/validate_v037.py
.venv/bin/python scripts/build_v037_split.py
.venv/bin/python scripts/save_classifier_bundle.py \
--version v0.37 --threshold 0.9006 --embeddings \
--split-manifest tests/adversarial/v037_split.json \
--train-fold train \
--bundle-out src/vaara/data/adversarial_classifier_v8.joblib
.venv/bin/python scripts/eval_v037_holdout.py \
--bundle src/vaara/data/adversarial_classifier_v8.joblib \
--json-out bench/v037_eval_v8_holdout.json
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix reproduction command paths to be internally consistent.

At Line 138, the recipe changes directory to tests/adversarial, but subsequent commands at Lines 139-148 call scripts/... and src/... as if run from repo root. This likely makes the copy-paste reproduction flow fail.

Proposed doc fix
-``` 
-cd tests/adversarial && sha256sum -c MANIFEST.sha256
-.venv/bin/python scripts/validate_v037.py
-.venv/bin/python scripts/build_v037_split.py
+```bash
+sha256sum -c tests/adversarial/MANIFEST.sha256
+.venv/bin/python scripts/validate_v037.py
+.venv/bin/python scripts/build_v037_split.py
 .venv/bin/python scripts/save_classifier_bundle.py \
     --version v0.37 --threshold 0.9006 --embeddings \
     --split-manifest tests/adversarial/v037_split.json \
     --train-fold train \
     --bundle-out src/vaara/data/adversarial_classifier_v8.joblib
 .venv/bin/python scripts/eval_v037_holdout.py \
     --bundle src/vaara/data/adversarial_classifier_v8.joblib \
     --json-out bench/v037_eval_v8_holdout.json
</details>

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>

[warning] 137-137: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @bench/vaara-bench-v0.37.md around lines 137 - 149, The reproduction steps
change into tests/adversarial then run scripts and reference src paths as if at
repo root, which breaks copy-paste execution; update the commands so paths are
consistent from the repository root (e.g. check MANIFEST.sha256 as
tests/adversarial/MANIFEST.sha256 and keep subsequent .venv/bin/python calls and
--split-manifest tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.


</details>

<!-- fingerprinting:phantom:triton:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +86 to +90
| 4. Malicious Actors & Misuse | 268 | 267 | 100% |
| 5. Human-Computer Interaction | 106 | 106 | 100% |
| 6. Socioeconomic and Environmental | 299 | 61 | 20% |
| 7. AI System Safety, Failures & Limitations | 421 | 114 | 27% |
| **Total** | **1,597** | **740** | **46%** |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Correct the Domain 4 percentage value.

At Line 86, 267 / 268 is not exactly 100% (it is ~99.6%). Please update the percentage or add an explicit rounding note to avoid inconsistency with the counts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/mit_ai_risk_repository_mapping.md` around lines 86 - 90, The percentage
for the "4. Malicious Actors & Misuse" row is incorrect (267/268 ≈ 99.6% not
100%); update that table row's percentage cell from "100%" to a correctly
rounded value (e.g., "99.6%" or "99.63%") or add an overall rounding note in the
document explaining percentages are rounded to one decimal place so counts may
not exactly match; make the change where the table row for "4. Malicious Actors
& Misuse" and the surrounding summary/Total row are defined.

Comment on lines +73 to +74
cat_prefix = f.name[:2]
expected_cat = EXPECTED_CAT[cat_prefix]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Add validation for category prefix before dictionary lookup.

The code assumes the first two characters of the filename are a valid category prefix (TM/PE/DE), but there's no validation before the dictionary lookup. If an unexpected file matches the glob pattern, this will raise a KeyError.

🛡️ Proposed fix to add validation
             relpath = str(f.relative_to(ADV))
             cat_prefix = f.name[:2]
+            if cat_prefix not in EXPECTED_CAT:
+                print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')")
+                continue
             expected_cat = EXPECTED_CAT[cat_prefix]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cat_prefix = f.name[:2]
expected_cat = EXPECTED_CAT[cat_prefix]
cat_prefix = f.name[:2]
if cat_prefix not in EXPECTED_CAT:
print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')")
continue
expected_cat = EXPECTED_CAT[cat_prefix]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_v037_split.py` around lines 73 - 74, Validate the two-character
prefix before using it as a key: check that cat_prefix (derived from f.name[:2])
exists in EXPECTED_CAT and handle the unexpected case (e.g., log a clear warning
and skip the file or raise a descriptive exception) instead of doing
EXPECTED_CAT[cat_prefix] directly; update the code around cat_prefix and
EXPECTED_CAT to perform the membership test and a safe lookup to avoid KeyError.

Comment on lines +90 to +94
try:
e = json.loads(line)
except Exception:
schema_dropped += 1
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Catch specific exception type instead of broad Exception.

The bare except Exception: will catch all exceptions, potentially hiding unexpected errors. JSON parsing should specifically catch json.JSONDecodeError.

🐛 Proposed fix to catch specific exception
                 try:
                     e = json.loads(line)
-                except Exception:
+                except json.JSONDecodeError:
                     schema_dropped += 1
                     continue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build_v037_split.py` around lines 90 - 94, The JSON parsing
try/except around json.loads(line) should catch json.JSONDecodeError instead of
a bare Exception: change the except block to except json.JSONDecodeError:
increment schema_dropped and continue; for any other unexpected errors re-raise
or let them propagate (or log and raise) so they aren't silently
swallowed—update the handling around json.loads(line), the variable e, and the
schema_dropped increment accordingly.

Comment on lines +51 to +67
docker run -d --rm \
--device=/dev/kfd --device=/dev/dri \
--group-add video --group-add render \
--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size 32G --ipc=host --network host \
-e "HF_TOKEN=${HF_TOKEN:-}" \
-v "${HF_CACHE}":/root/.cache/huggingface \
--name vllm-llama33 \
rocm/vllm:latest \
vllm serve "${MODEL}" \
--host 0.0.0.0 --port "${PORT}" \
--max-model-len 8192 \
--enforce-eager \
--gpu-memory-utilization 0.92 \
--quantization fp8 \
>"${LOG_DIR}/vllm_llama33.log" 2>&1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

When using docker run -d ... > logfile, does the redirection capture container runtime logs, or only Docker CLI stdout such as the container ID?

💡 Result:

docker run -d ... > logfile captures only the Docker CLI’s own standard output (typically the container ID printed because -d/--detach “print[s] the new container ID”)—it does not capture the container’s runtime logs. Reason (grounded in docs): in detached mode, docker run prints the new container ID to the client’s STDOUT [1], so > logfile redirects that STDOUT to your file. Container logs are instead retrieved via docker logs, which reads the container’s stored STDOUT/STDERR streams as logged by Docker’s logging mechanism [2]. If you want container runtime logs in a file, you’d redirect the output of docker logs (or use docker logs -f ... > logfile) rather than redirecting docker run itself.

Citations:


Fix vLLM container log capture (redirect docker logs, not docker run -d)

In scripts/v037_droplet_run.sh, redirecting &> "${LOG_DIR}/vllm_llama33.log" on docker run -d only captures Docker CLI stdout (typically the container ID), not the container’s runtime stdout/stderr. The health-check/watcher tailing vllm_llama33.log therefore won’t receive vLLM logs (same issue in the similar block around 77-79). Use docker logs -f vllm-llama33 redirected into the log file instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/v037_droplet_run.sh` around lines 51 - 67, The docker run invocation
is redirecting the CLI output (container ID) to vllm_llama33.log instead of the
container's runtime logs; after creating the container with the existing docker
run -d ... --name vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log"
redirection and instead launch docker logs -f vllm-llama33
>"${LOG_DIR}/vllm_llama33.log" 2>&1 & (or equivalent backgrounded tail) so the
file receives the container stdout/stderr; apply the same change to the similar
docker run / docker logs pair around the other block (the one referenced near
lines ~77-79) to ensure the watcher/health-check tails real container logs.

--security-opt seccomp=unconfined \
--cap-add=SYS_PTRACE \
--shm-size 32G --ipc=host --network host \
-e "HF_TOKEN=${HF_TOKEN:-}" \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify we don't inline-expand HF_TOKEN in docker args.
rg -n --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|--env HF_TOKEN' scripts/v037_droplet_run.sh

Repository: vaaraio/vaara

Length of output: 94


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Show surrounding Docker command around line 57
sed -n '40,80p' scripts/v037_droplet_run.sh

# 2) Search repo for any other HF_TOKEN inlined-expansion or env passthrough
rg -n --hidden --no-ignore -S --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|-e\s+"HF_TOKEN=|--env\s+HF_TOKEN\b' .

Repository: vaaraio/vaara

Length of output: 1595


Avoid exposing HF_TOKEN in the Docker command line.

docker run in scripts/v037_droplet_run.sh uses -e "HF_TOKEN=${HF_TOKEN:-}", which interpolates the secret into the command’s process arguments. Use env-name passthrough instead.

🔧 Proposed fix
-    -e "HF_TOKEN=${HF_TOKEN:-}" \
+    --env HF_TOKEN \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-e "HF_TOKEN=${HF_TOKEN:-}" \
--env HF_TOKEN \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/v037_droplet_run.sh` at line 57, The docker run command currently
interpolates the secret into the command line via -e "HF_TOKEN=${HF_TOKEN:-}",
which exposes HF_TOKEN; change it to pass only the environment variable name
(e.g., -e HF_TOKEN or --env HF_TOKEN) so Docker pulls the value from the process
environment instead of embedding it, or alternatively reference an --env-file if
you need to supply multiple vars; update the line containing -e
"HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure the runtime environment actually
contains HF_TOKEN before invoking the script.

Comment on lines +69 to +82
echo "[v037] waiting for /v1/models (max 30 min)"
for i in $(seq 1 180); do
if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm healthy after ${i} x 10s"
break
fi
sleep 10
done
if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm did not become healthy; tail of log:" >&2
tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
exit 2
fi
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast if /v1/models is healthy but serving the wrong model.

Readiness checks only endpoint availability. If another service still owns port 8000, generators can run against the wrong model.

🔧 Proposed fix
   if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
     echo "[v037] vllm did not become healthy; tail of log:" >&2
     tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
     exit 2
   fi
+
+  actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \
+    | grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \
+    | head -1 \
+    | grep -oE '"[^"]*"$' \
+    | tr -d '"')"
+  if [[ "${actual_model}" != "${MODEL}" ]]; then
+    echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2
+    exit 3
+  fi
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "[v037] waiting for /v1/models (max 30 min)"
for i in $(seq 1 180); do
if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm healthy after ${i} x 10s"
break
fi
sleep 10
done
if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm did not become healthy; tail of log:" >&2
tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
exit 2
fi
fi
echo "[v037] waiting for /v1/models (max 30 min)"
for i in $(seq 1 180); do
if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm healthy after ${i} x 10s"
break
fi
sleep 10
done
if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm did not become healthy; tail of log:" >&2
tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
exit 2
fi
actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \
| grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \
| head -1 \
| grep -oE '"[^"]*"$' \
| tr -d '"')"
if [[ "${actual_model}" != "${MODEL}" ]]; then
echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2
exit 3
fi
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/v037_droplet_run.sh` around lines 69 - 82, The readiness loop only
checks that /v1/models is reachable, not that it serves the expected model;
update the script around the PORT loop to, after curl succeeds, issue a second
curl to http://localhost:${PORT}/v1/models and inspect the JSON/text for the
expected model identifier (use a constant/variable like EXPECTED_MODEL or the
literal model name you expect), and if the returned list does not contain that
model then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to
stderr and exit non-zero; ensure this check runs before breaking out of the wait
loop so the script fails fast if another service is answering on the port.

Comment thread scripts/validate_v037.py
Comment on lines +18 to +64
def main() -> int:
files = sorted(GEN.glob("*-v037-*.jsonl"))
print(f"{len(files)} v037 files")
total_in = total_ok = 0
for f in files:
n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
expected = EXPECTED_CAT[f.name[:2]]
for line in f.read_text().splitlines():
line = line.strip()
if not line:
continue
n_in += 1
try:
e = json.loads(line)
except Exception:
n_miss += 1
continue
if not all(k in e for k in REQUIRED):
n_miss += 1
continue
if "original_task" not in (e.get("context") or {}):
n_miss += 1
continue
if e.get("expected") != "DENY":
n_exp += 1
continue
if e.get("severity") not in VALID_SEV:
n_sev += 1
continue
if e.get("category") != expected:
n_cat += 1
continue
n_ok += 1
flags = []
if n_miss:
flags.append(f"missing={n_miss}")
if n_exp:
flags.append(f"bad-expected={n_exp}")
if n_sev:
flags.append(f"bad-sev={n_sev}")
if n_cat:
flags.append(f"wrong-cat={n_cat}")
total_in += n_in
total_ok += n_ok
print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")
print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validator exit code doesn't match "fail-loud" claim in docstring.

The docstring states "Fail-loud on any malformed entry," but the function always returns 0 (line 64), even when validation errors are detected. This means downstream steps won't automatically fail if bad entries are found.

Consider returning a non-zero exit code when validation failures are detected to match the fail-loud intent, or update the docstring to clarify that this is a report-only validator.

📋 Proposed fix to fail on validation errors
     print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
-    return 0
+    return 0 if total_in == total_ok else 1

Or update the docstring:

-"""Validate v037 generated jsonl: schema + required-field + per-cat consistency.
-
-Mirrors validate_v036.py with the v037 glob. Fail-loud on any malformed entry
-so the build_split step gets a clean input.
-"""
+"""Validate v037 generated jsonl: schema + required-field + per-cat consistency.
+
+Mirrors validate_v036.py with the v037 glob. Reports validation statistics
+for manual review before the build_split step.
+"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main() -> int:
files = sorted(GEN.glob("*-v037-*.jsonl"))
print(f"{len(files)} v037 files")
total_in = total_ok = 0
for f in files:
n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
expected = EXPECTED_CAT[f.name[:2]]
for line in f.read_text().splitlines():
line = line.strip()
if not line:
continue
n_in += 1
try:
e = json.loads(line)
except Exception:
n_miss += 1
continue
if not all(k in e for k in REQUIRED):
n_miss += 1
continue
if "original_task" not in (e.get("context") or {}):
n_miss += 1
continue
if e.get("expected") != "DENY":
n_exp += 1
continue
if e.get("severity") not in VALID_SEV:
n_sev += 1
continue
if e.get("category") != expected:
n_cat += 1
continue
n_ok += 1
flags = []
if n_miss:
flags.append(f"missing={n_miss}")
if n_exp:
flags.append(f"bad-expected={n_exp}")
if n_sev:
flags.append(f"bad-sev={n_sev}")
if n_cat:
flags.append(f"wrong-cat={n_cat}")
total_in += n_in
total_ok += n_ok
print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")
print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
return 0
def main() -> int:
files = sorted(GEN.glob("*-v037-*.jsonl"))
print(f"{len(files)} v037 files")
total_in = total_ok = 0
for f in files:
n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
expected = EXPECTED_CAT[f.name[:2]]
for line in f.read_text().splitlines():
line = line.strip()
if not line:
continue
n_in += 1
try:
e = json.loads(line)
except Exception:
n_miss += 1
continue
if not all(k in e for k in REQUIRED):
n_miss += 1
continue
if "original_task" not in (e.get("context") or {}):
n_miss += 1
continue
if e.get("expected") != "DENY":
n_exp += 1
continue
if e.get("severity") not in VALID_SEV:
n_sev += 1
continue
if e.get("category") != expected:
n_cat += 1
continue
n_ok += 1
flags = []
if n_miss:
flags.append(f"missing={n_miss}")
if n_exp:
flags.append(f"bad-expected={n_exp}")
if n_sev:
flags.append(f"bad-sev={n_sev}")
if n_cat:
flags.append(f"wrong-cat={n_cat}")
total_in += n_in
total_ok += n_ok
print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}")
print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
return 0 if total_in == total_ok else 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/validate_v037.py` around lines 18 - 64, The main() function currently
always returns 0 despite the module claiming it should "fail-loud on any
malformed entry"; change the function to return a non-zero exit code when
validation failures are detected (e.g., if total_ok != total_in or any of
n_miss/n_exp/n_sev/n_cat > 0 across files) so callers see failure, or
alternatively update the module docstring to remove the "fail-loud" claim;
update the return at the end of main() (and any caller expectations) to reflect
the chosen behavior.

Comment thread scripts/validate_v037.py
total_in = total_ok = 0
for f in files:
n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0
expected = EXPECTED_CAT[f.name[:2]]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add defensive check for unexpected filename prefix.

If a file in the directory doesn't start with a recognized prefix (TM/PE/DE), line 24 will raise KeyError when looking up f.name[:2] in EXPECTED_CAT. This would crash the validator.

🛡️ Proposed fix to handle unknown prefixes
+        prefix = f.name[:2]
+        if prefix not in EXPECTED_CAT:
+            print(f"SKIP {f.name}: unknown prefix {prefix}")
+            continue
-        expected = EXPECTED_CAT[f.name[:2]]
+        expected = EXPECTED_CAT[prefix]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/validate_v037.py` at line 24, Guard the lookup into EXPECTED_CAT by
first extracting the filename prefix (prefix = f.name[:2]) and verifying it
exists in EXPECTED_CAT; if the prefix is missing, raise a clear ValueError (or
log and skip) that includes the offending filename so the validator fails with a
helpful message instead of a KeyError. Replace the direct lookup at the
assignment to expected (where EXPECTED_CAT[f.name[:2]] is used) with this check
and then assign expected = EXPECTED_CAT[prefix] only after the prefix is
validated. Ensure the error message names the file (f.name) and the invalid
prefix to aid debugging.

@vaaraio vaaraio merged commit 8e284e0 into main May 26, 2026
12 checks passed
@vaaraio vaaraio deleted the release/v0.37.0 branch May 26, 2026 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant