release(v0.37.0): third attacker family + v8 classifier closes worst v0.36 sub-cell#144
Conversation
…v0.36 sub-cell Adds a third attacker family to the cross-model held-out fold and ships a v8 classifier that trains on the v035 + v036 TM/PE union and keeps v036 DE both legs plus the full v037 Llama-3.3 leg as the new held-out evaluation. Closes the worst sub-cell from v0.36 by 12.9 pp on the same 700 Claude DE entries (26.0% under v7 to 38.9% under v8), covers a third attacker family at 85.8% overall recall, and holds in-distribution at 86.6% on v035 TEST with FPR inside the prior confidence interval. Generation provenance: 900 entries from RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic on AMD-backed MI300X SR-IOV under rocm/vllm:latest, three parallel category generators, 22 minutes wall clock, droplet poweroff post-rsync. 13 schema-invalid TM entries dropped, 887 valid (TM 287, PE 300, DE 300). v8 production bundle ships at src/vaara/data/adversarial_classifier_v8.joblib. v7 retained on disk for cross-eval reproducibility. Threshold unchanged at 0.9006. Also bundles the MIT AI Risk Repository v4 position document and the SEP-2787 Tool Call Attestation reference implementation that landed on main via PR #139. Both fit the v0.37 ship window and add one clean line each to the validator stack and the public taxonomy positioning. Full methodology, chain of custody, ship gate, and named limits in bench/vaara-bench-v0.37.md.
📝 WalkthroughWalkthroughv0.37 release updates project versions across versioning files, documents benchmark methodology and cross-model evaluation results, implements adversarial data generation infrastructure with Llama-3.3 targeting, adds remote droplet orchestration for large-scale generation, and updates data integrity manifest. Changesv0.37 Release Delivery
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
23-33:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winNumbers section mixes stale headline metrics with the new “current” v0.37 benchmark.
Lines 23-27 still present older benchmark headlines while Line 32 declares v0.37 as current methodology. Please either update the top-line metrics to v0.37 or explicitly label the older figures as historical to avoid reader confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 23 - 33, The README's top-line metrics (the initial headline paragraph and the first few bullets that report recall, FPR, p99 latency, etc.) are inconsistent with the declared current methodology vaara-bench-v0.37; either replace those headline numbers with the v0.37 values or explicitly mark the existing headline metrics as "historical (pre-v0.37)" and add updated v0.37 figures next to or below them; update the classifier recall/FPR, multi-attacker PAIR, and latency bullets (the items referencing recall 85.0%, FPR 4.6%, 140 µs p99, and the multi-attacker 0/25 result) so they match the vaara-bench-v0.37 subsection (or are clearly labeled as older), and ensure the vaara-bench-v0.37 reference remains accurate.
🧹 Nitpick comments (3)
scripts/build_v037_split.py (1)
31-41: 💤 Low valueConsider SHA-256 for consistency with project standards.
The MANIFEST.sha256 file (mentioned in PR description) uses SHA-256, while this fingerprint function uses SHA-1. While SHA-1 is acceptable for non-cryptographic deduplication, using SHA-256 would provide consistency across the codebase.
♻️ Proposed change to use SHA-256
def fingerprint(entry: dict) -> str: canonical = json.dumps( { "t": entry.get("tool_name", ""), "p": entry.get("parameters", {}), "c": entry.get("context", {}), }, sort_keys=True, default=str, ) - return hashlib.sha1(canonical.encode()).hexdigest() + return hashlib.sha256(canonical.encode()).hexdigest()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/build_v037_split.py` around lines 31 - 41, The fingerprint function currently uses SHA-1; change it to SHA-256 for consistency with the project's MANIFEST.sha256 by replacing the hashlib.sha1(...) call with hashlib.sha256(...) when hashing the canonical JSON in fingerprint(entry: dict) and keep using .hexdigest() to return the hex string; ensure canonicalization via json.dumps(..., sort_keys=True, default=str) remains unchanged so only the hash algorithm is substituted.scripts/eval_v037_holdout.py (2)
86-90: 💤 Low valueConsider explicit handling when no positive samples exist.
If
tp + fn == 0(no positive samples), the code returnsrecall = 0.0due tomax(tp + fn, 1). While this edge case is unlikely in production, it would be clearer to handle it explicitly or add an assertion thattp + fn > 0.♻️ Proposed clarification
pos = int((y == 1).sum()) tp = int(((pred == 1) & (y == 1)).sum()) fn = int(((pred == 0) & (y == 1)).sum()) + if tp + fn == 0: + print("[error] no positive samples in holdout") + return 2 recall = tp / max(tp + fn, 1)Or simply add an assertion:
+ assert tp + fn > 0, "holdout must contain positive samples" recall = tp / max(tp + fn, 1)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/eval_v037_holdout.py` around lines 86 - 90, Handle the no-positive-samples edge explicitly: compute total_pos = tp + fn and if total_pos == 0, set recall = 0.0 and set rlo, rhi to safe defaults (e.g., 0.0, 0.0 or None) instead of calling wilson_ci with zero trials; otherwise compute recall = tp / total_pos and call wilson_ci(tp, total_pos). Update the block using the variables pos, tp, fn, recall and the wilson_ci call to implement this conditional or replace it with an assertion that total_pos > 0 if you prefer failing fast.
16-23: 💤 Low valuesys.path manipulation is fragile.
The script manipulates
sys.pathto import from the scripts directory. This works when run from the repository root but may fail if executed from a different working directory. Consider making this a proper package or using relative imports if the scripts directory becomes a module.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/eval_v037_holdout.py` around lines 16 - 23, Currently the script mutates sys.path (REPO and sys.path.insert) to import train_adversarial_classifier which is fragile; replace that with a dynamic, path-based import using importlib (spec_from_file_location / module_from_spec) to load the file at REPO/"scripts"/"train_adversarial_classifier.py" and then bind load_corpus_keyed, build_labels, build_features from that module, or alternatively convert the scripts folder to a proper package and use a normal package import (e.g., from scripts.train_adversarial_classifier import load_corpus_keyed, build_labels, build_features) and remove the sys.path manipulation and REPO logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bench/vaara-bench-v0.37.md`:
- Line 137: The fenced code block using triple backticks in the markdown
contains no language tag; update that opening fence (```) to include an
appropriate language hint (for example, ```bash) so markdownlint passes and the
snippet is syntax-highlighted and more readable.
- Around line 137-149: The reproduction steps change into tests/adversarial then
run scripts and reference src paths as if at repo root, which breaks copy-paste
execution; update the commands so paths are consistent from the repository root
(e.g. check MANIFEST.sha256 as tests/adversarial/MANIFEST.sha256 and keep
subsequent .venv/bin/python calls and --split-manifest
tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.
In `@docs/mit_ai_risk_repository_mapping.md`:
- Around line 86-90: The percentage for the "4. Malicious Actors & Misuse" row
is incorrect (267/268 ≈ 99.6% not 100%); update that table row's percentage cell
from "100%" to a correctly rounded value (e.g., "99.6%" or "99.63%") or add an
overall rounding note in the document explaining percentages are rounded to one
decimal place so counts may not exactly match; make the change where the table
row for "4. Malicious Actors & Misuse" and the surrounding summary/Total row are
defined.
In `@scripts/build_v037_split.py`:
- Around line 90-94: The JSON parsing try/except around json.loads(line) should
catch json.JSONDecodeError instead of a bare Exception: change the except block
to except json.JSONDecodeError: increment schema_dropped and continue; for any
other unexpected errors re-raise or let them propagate (or log and raise) so
they aren't silently swallowed—update the handling around json.loads(line), the
variable e, and the schema_dropped increment accordingly.
- Around line 73-74: Validate the two-character prefix before using it as a key:
check that cat_prefix (derived from f.name[:2]) exists in EXPECTED_CAT and
handle the unexpected case (e.g., log a clear warning and skip the file or raise
a descriptive exception) instead of doing EXPECTED_CAT[cat_prefix] directly;
update the code around cat_prefix and EXPECTED_CAT to perform the membership
test and a safe lookup to avoid KeyError.
In `@scripts/eval_v037_holdout.py`:
- Around line 99-107: The loop aggregating true positives uses e.get("expected")
instead of the precomputed label array y; change the per-bucket true-positive
check in the for (k, e), pr in zip(holdout, pred): block to reference the
corresponding label from y (use the same zip order or index into y) so that TP
counting aligns with build_labels and the recall calculation; update references
around per_cat, per_leg, per_cat_leg where TP is incremented (currently in the
if pr == 1 and e.get("expected") == "DENY" clause) to use y (e.g., y_entry
indicates DENY) instead of e.get("expected").
In `@scripts/generate_targeted_v037.py`:
- Around line 35-52: The call_chat function currently assumes the API returns a
JSON with choices[0].message.content and will raise KeyError/IndexError on
malformed responses; update call_chat to defensively validate the HTTP response
and parsed JSON: check HTTP status / ensure resp.read() is valid JSON, confirm
top-level "choices" is present and a non-empty list, that choices[0] contains a
"message" dict with a "content" key (and that content is a string), and if any
of those checks fail raise a clear exception (or return a descriptive error
string) so callers (e.g., main()) receive an informative error instead of a
cryptic KeyError/IndexError. Ensure these checks reference the call_chat
function and the "choices" → "message" → "content" path.
In `@scripts/v037_droplet_run.sh`:
- Around line 69-82: The readiness loop only checks that /v1/models is
reachable, not that it serves the expected model; update the script around the
PORT loop to, after curl succeeds, issue a second curl to
http://localhost:${PORT}/v1/models and inspect the JSON/text for the expected
model identifier (use a constant/variable like EXPECTED_MODEL or the literal
model name you expect), and if the returned list does not contain that model
then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to
stderr and exit non-zero; ensure this check runs before breaking out of the wait
loop so the script fails fast if another service is answering on the port.
- Line 57: The docker run command currently interpolates the secret into the
command line via -e "HF_TOKEN=${HF_TOKEN:-}", which exposes HF_TOKEN; change it
to pass only the environment variable name (e.g., -e HF_TOKEN or --env HF_TOKEN)
so Docker pulls the value from the process environment instead of embedding it,
or alternatively reference an --env-file if you need to supply multiple vars;
update the line containing -e "HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure
the runtime environment actually contains HF_TOKEN before invoking the script.
- Around line 51-67: The docker run invocation is redirecting the CLI output
(container ID) to vllm_llama33.log instead of the container's runtime logs;
after creating the container with the existing docker run -d ... --name
vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log" redirection and
instead launch docker logs -f vllm-llama33 >"${LOG_DIR}/vllm_llama33.log" 2>&1 &
(or equivalent backgrounded tail) so the file receives the container
stdout/stderr; apply the same change to the similar docker run / docker logs
pair around the other block (the one referenced near lines ~77-79) to ensure the
watcher/health-check tails real container logs.
In `@scripts/validate_v037.py`:
- Line 24: Guard the lookup into EXPECTED_CAT by first extracting the filename
prefix (prefix = f.name[:2]) and verifying it exists in EXPECTED_CAT; if the
prefix is missing, raise a clear ValueError (or log and skip) that includes the
offending filename so the validator fails with a helpful message instead of a
KeyError. Replace the direct lookup at the assignment to expected (where
EXPECTED_CAT[f.name[:2]] is used) with this check and then assign expected =
EXPECTED_CAT[prefix] only after the prefix is validated. Ensure the error
message names the file (f.name) and the invalid prefix to aid debugging.
- Around line 18-64: The main() function currently always returns 0 despite the
module claiming it should "fail-loud on any malformed entry"; change the
function to return a non-zero exit code when validation failures are detected
(e.g., if total_ok != total_in or any of n_miss/n_exp/n_sev/n_cat > 0 across
files) so callers see failure, or alternatively update the module docstring to
remove the "fail-loud" claim; update the return at the end of main() (and any
caller expectations) to reflect the chosen behavior.
---
Outside diff comments:
In `@README.md`:
- Around line 23-33: The README's top-line metrics (the initial headline
paragraph and the first few bullets that report recall, FPR, p99 latency, etc.)
are inconsistent with the declared current methodology vaara-bench-v0.37; either
replace those headline numbers with the v0.37 values or explicitly mark the
existing headline metrics as "historical (pre-v0.37)" and add updated v0.37
figures next to or below them; update the classifier recall/FPR, multi-attacker
PAIR, and latency bullets (the items referencing recall 85.0%, FPR 4.6%, 140 µs
p99, and the multi-attacker 0/25 result) so they match the vaara-bench-v0.37
subsection (or are clearly labeled as older), and ensure the vaara-bench-v0.37
reference remains accurate.
---
Nitpick comments:
In `@scripts/build_v037_split.py`:
- Around line 31-41: The fingerprint function currently uses SHA-1; change it to
SHA-256 for consistency with the project's MANIFEST.sha256 by replacing the
hashlib.sha1(...) call with hashlib.sha256(...) when hashing the canonical JSON
in fingerprint(entry: dict) and keep using .hexdigest() to return the hex
string; ensure canonicalization via json.dumps(..., sort_keys=True, default=str)
remains unchanged so only the hash algorithm is substituted.
In `@scripts/eval_v037_holdout.py`:
- Around line 86-90: Handle the no-positive-samples edge explicitly: compute
total_pos = tp + fn and if total_pos == 0, set recall = 0.0 and set rlo, rhi to
safe defaults (e.g., 0.0, 0.0 or None) instead of calling wilson_ci with zero
trials; otherwise compute recall = tp / total_pos and call wilson_ci(tp,
total_pos). Update the block using the variables pos, tp, fn, recall and the
wilson_ci call to implement this conditional or replace it with an assertion
that total_pos > 0 if you prefer failing fast.
- Around line 16-23: Currently the script mutates sys.path (REPO and
sys.path.insert) to import train_adversarial_classifier which is fragile;
replace that with a dynamic, path-based import using importlib
(spec_from_file_location / module_from_spec) to load the file at
REPO/"scripts"/"train_adversarial_classifier.py" and then bind
load_corpus_keyed, build_labels, build_features from that module, or
alternatively convert the scripts folder to a proper package and use a normal
package import (e.g., from scripts.train_adversarial_classifier import
load_corpus_keyed, build_labels, build_features) and remove the sys.path
manipulation and REPO logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 2cdf0f5a-e520-4bf6-be3a-e232db2be1f0
⛔ Files ignored due to path filters (3)
tests/adversarial/generated/DE-v037-llama33.jsonlis excluded by!**/generated/**tests/adversarial/generated/PE-v037-llama33.jsonlis excluded by!**/generated/**tests/adversarial/generated/TM-v037-llama33.jsonlis excluded by!**/generated/**
📒 Files selected for processing (19)
CHANGELOG.mdCOMPLIANCE.mdREADME.mdbench/v037_eval_v8_holdout.jsonbench/vaara-bench-v0.37.mdclients/ts/package.jsondocs/mit_ai_risk_repository_mapping.mdpyproject.tomlscripts/_v037_common.pyscripts/build_v037_split.pyscripts/eval_v037_holdout.pyscripts/generate_targeted_v037.pyscripts/v037_droplet_run.shscripts/v037_local_watcher.shscripts/validate_v037.pysrc/vaara/__init__.pysrc/vaara/data/adversarial_classifier_v8.joblibtests/adversarial/MANIFEST.sha256tests/adversarial/v037_split.json
|
|
||
| ## Reproduction recipe | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced code block.
At Line 137, use a language hint (e.g., bash) to satisfy markdownlint and improve readability.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 137-137: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bench/vaara-bench-v0.37.md` at line 137, The fenced code block using triple
backticks in the markdown contains no language tag; update that opening fence
(```) to include an appropriate language hint (for example, ```bash) so
markdownlint passes and the snippet is syntax-highlighted and more readable.
| ``` | ||
| cd tests/adversarial && sha256sum -c MANIFEST.sha256 | ||
| .venv/bin/python scripts/validate_v037.py | ||
| .venv/bin/python scripts/build_v037_split.py | ||
| .venv/bin/python scripts/save_classifier_bundle.py \ | ||
| --version v0.37 --threshold 0.9006 --embeddings \ | ||
| --split-manifest tests/adversarial/v037_split.json \ | ||
| --train-fold train \ | ||
| --bundle-out src/vaara/data/adversarial_classifier_v8.joblib | ||
| .venv/bin/python scripts/eval_v037_holdout.py \ | ||
| --bundle src/vaara/data/adversarial_classifier_v8.joblib \ | ||
| --json-out bench/v037_eval_v8_holdout.json | ||
| ``` |
There was a problem hiding this comment.
Fix reproduction command paths to be internally consistent.
At Line 138, the recipe changes directory to tests/adversarial, but subsequent commands at Lines 139-148 call scripts/... and src/... as if run from repo root. This likely makes the copy-paste reproduction flow fail.
Proposed doc fix
-```
-cd tests/adversarial && sha256sum -c MANIFEST.sha256
-.venv/bin/python scripts/validate_v037.py
-.venv/bin/python scripts/build_v037_split.py
+```bash
+sha256sum -c tests/adversarial/MANIFEST.sha256
+.venv/bin/python scripts/validate_v037.py
+.venv/bin/python scripts/build_v037_split.py
.venv/bin/python scripts/save_classifier_bundle.py \
--version v0.37 --threshold 0.9006 --embeddings \
--split-manifest tests/adversarial/v037_split.json \
--train-fold train \
--bundle-out src/vaara/data/adversarial_classifier_v8.joblib
.venv/bin/python scripts/eval_v037_holdout.py \
--bundle src/vaara/data/adversarial_classifier_v8.joblib \
--json-out bench/v037_eval_v8_holdout.json</details>
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.1)</summary>
[warning] 137-137: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @bench/vaara-bench-v0.37.md around lines 137 - 149, The reproduction steps
change into tests/adversarial then run scripts and reference src paths as if at
repo root, which breaks copy-paste execution; update the commands so paths are
consistent from the repository root (e.g. check MANIFEST.sha256 as
tests/adversarial/MANIFEST.sha256 and keep subsequent .venv/bin/python calls and
--split-manifest tests/adversarial/v037_split.json and --bundle-out
src/vaara/data/adversarial_classifier_v8.joblib referenced from the repo root)
so users can run the listed commands without changing directory first.
</details>
<!-- fingerprinting:phantom:triton:hawk -->
<!-- This is an auto-generated comment by CodeRabbit -->
| | 4. Malicious Actors & Misuse | 268 | 267 | 100% | | ||
| | 5. Human-Computer Interaction | 106 | 106 | 100% | | ||
| | 6. Socioeconomic and Environmental | 299 | 61 | 20% | | ||
| | 7. AI System Safety, Failures & Limitations | 421 | 114 | 27% | | ||
| | **Total** | **1,597** | **740** | **46%** | |
There was a problem hiding this comment.
Correct the Domain 4 percentage value.
At Line 86, 267 / 268 is not exactly 100% (it is ~99.6%). Please update the percentage or add an explicit rounding note to avoid inconsistency with the counts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/mit_ai_risk_repository_mapping.md` around lines 86 - 90, The percentage
for the "4. Malicious Actors & Misuse" row is incorrect (267/268 ≈ 99.6% not
100%); update that table row's percentage cell from "100%" to a correctly
rounded value (e.g., "99.6%" or "99.63%") or add an overall rounding note in the
document explaining percentages are rounded to one decimal place so counts may
not exactly match; make the change where the table row for "4. Malicious Actors
& Misuse" and the surrounding summary/Total row are defined.
| cat_prefix = f.name[:2] | ||
| expected_cat = EXPECTED_CAT[cat_prefix] |
There was a problem hiding this comment.
Add validation for category prefix before dictionary lookup.
The code assumes the first two characters of the filename are a valid category prefix (TM/PE/DE), but there's no validation before the dictionary lookup. If an unexpected file matches the glob pattern, this will raise a KeyError.
🛡️ Proposed fix to add validation
relpath = str(f.relative_to(ADV))
cat_prefix = f.name[:2]
+ if cat_prefix not in EXPECTED_CAT:
+ print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')")
+ continue
expected_cat = EXPECTED_CAT[cat_prefix]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cat_prefix = f.name[:2] | |
| expected_cat = EXPECTED_CAT[cat_prefix] | |
| cat_prefix = f.name[:2] | |
| if cat_prefix not in EXPECTED_CAT: | |
| print(f"warning: skipping {f.name} (unexpected prefix '{cat_prefix}')") | |
| continue | |
| expected_cat = EXPECTED_CAT[cat_prefix] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/build_v037_split.py` around lines 73 - 74, Validate the two-character
prefix before using it as a key: check that cat_prefix (derived from f.name[:2])
exists in EXPECTED_CAT and handle the unexpected case (e.g., log a clear warning
and skip the file or raise a descriptive exception) instead of doing
EXPECTED_CAT[cat_prefix] directly; update the code around cat_prefix and
EXPECTED_CAT to perform the membership test and a safe lookup to avoid KeyError.
| try: | ||
| e = json.loads(line) | ||
| except Exception: | ||
| schema_dropped += 1 | ||
| continue |
There was a problem hiding this comment.
Catch specific exception type instead of broad Exception.
The bare except Exception: will catch all exceptions, potentially hiding unexpected errors. JSON parsing should specifically catch json.JSONDecodeError.
🐛 Proposed fix to catch specific exception
try:
e = json.loads(line)
- except Exception:
+ except json.JSONDecodeError:
schema_dropped += 1
continue🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/build_v037_split.py` around lines 90 - 94, The JSON parsing
try/except around json.loads(line) should catch json.JSONDecodeError instead of
a bare Exception: change the except block to except json.JSONDecodeError:
increment schema_dropped and continue; for any other unexpected errors re-raise
or let them propagate (or log and raise) so they aren't silently
swallowed—update the handling around json.loads(line), the variable e, and the
schema_dropped increment accordingly.
| docker run -d --rm \ | ||
| --device=/dev/kfd --device=/dev/dri \ | ||
| --group-add video --group-add render \ | ||
| --security-opt seccomp=unconfined \ | ||
| --cap-add=SYS_PTRACE \ | ||
| --shm-size 32G --ipc=host --network host \ | ||
| -e "HF_TOKEN=${HF_TOKEN:-}" \ | ||
| -v "${HF_CACHE}":/root/.cache/huggingface \ | ||
| --name vllm-llama33 \ | ||
| rocm/vllm:latest \ | ||
| vllm serve "${MODEL}" \ | ||
| --host 0.0.0.0 --port "${PORT}" \ | ||
| --max-model-len 8192 \ | ||
| --enforce-eager \ | ||
| --gpu-memory-utilization 0.92 \ | ||
| --quantization fp8 \ | ||
| >"${LOG_DIR}/vllm_llama33.log" 2>&1 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
When using docker run -d ... > logfile, does the redirection capture container runtime logs, or only Docker CLI stdout such as the container ID?
💡 Result:
docker run -d ... > logfile captures only the Docker CLI’s own standard output (typically the container ID printed because -d/--detach “print[s] the new container ID”)—it does not capture the container’s runtime logs. Reason (grounded in docs): in detached mode, docker run prints the new container ID to the client’s STDOUT [1], so > logfile redirects that STDOUT to your file. Container logs are instead retrieved via docker logs, which reads the container’s stored STDOUT/STDERR streams as logged by Docker’s logging mechanism [2]. If you want container runtime logs in a file, you’d redirect the output of docker logs (or use docker logs -f ... > logfile) rather than redirecting docker run itself.
Citations:
- 1: https://man.archlinux.org/man/docker-container-run.1.en.txt
- 2: https://docs.docker.com/engine/logging/
Fix vLLM container log capture (redirect docker logs, not docker run -d)
In scripts/v037_droplet_run.sh, redirecting &> "${LOG_DIR}/vllm_llama33.log" on docker run -d only captures Docker CLI stdout (typically the container ID), not the container’s runtime stdout/stderr. The health-check/watcher tailing vllm_llama33.log therefore won’t receive vLLM logs (same issue in the similar block around 77-79). Use docker logs -f vllm-llama33 redirected into the log file instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/v037_droplet_run.sh` around lines 51 - 67, The docker run invocation
is redirecting the CLI output (container ID) to vllm_llama33.log instead of the
container's runtime logs; after creating the container with the existing docker
run -d ... --name vllm-llama33 call, remove the >"${LOG_DIR}/vllm_llama33.log"
redirection and instead launch docker logs -f vllm-llama33
>"${LOG_DIR}/vllm_llama33.log" 2>&1 & (or equivalent backgrounded tail) so the
file receives the container stdout/stderr; apply the same change to the similar
docker run / docker logs pair around the other block (the one referenced near
lines ~77-79) to ensure the watcher/health-check tails real container logs.
| --security-opt seccomp=unconfined \ | ||
| --cap-add=SYS_PTRACE \ | ||
| --shm-size 32G --ipc=host --network host \ | ||
| -e "HF_TOKEN=${HF_TOKEN:-}" \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify we don't inline-expand HF_TOKEN in docker args.
rg -n --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|--env HF_TOKEN' scripts/v037_droplet_run.shRepository: vaaraio/vaara
Length of output: 94
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Show surrounding Docker command around line 57
sed -n '40,80p' scripts/v037_droplet_run.sh
# 2) Search repo for any other HF_TOKEN inlined-expansion or env passthrough
rg -n --hidden --no-ignore -S --type=sh 'HF_TOKEN=\$\{HF_TOKEN:-\}|-e\s+"HF_TOKEN=|--env\s+HF_TOKEN\b' .Repository: vaaraio/vaara
Length of output: 1595
Avoid exposing HF_TOKEN in the Docker command line.
docker run in scripts/v037_droplet_run.sh uses -e "HF_TOKEN=${HF_TOKEN:-}", which interpolates the secret into the command’s process arguments. Use env-name passthrough instead.
🔧 Proposed fix
- -e "HF_TOKEN=${HF_TOKEN:-}" \
+ --env HF_TOKEN \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| -e "HF_TOKEN=${HF_TOKEN:-}" \ | |
| --env HF_TOKEN \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/v037_droplet_run.sh` at line 57, The docker run command currently
interpolates the secret into the command line via -e "HF_TOKEN=${HF_TOKEN:-}",
which exposes HF_TOKEN; change it to pass only the environment variable name
(e.g., -e HF_TOKEN or --env HF_TOKEN) so Docker pulls the value from the process
environment instead of embedding it, or alternatively reference an --env-file if
you need to supply multiple vars; update the line containing -e
"HF_TOKEN=${HF_TOKEN:-}" accordingly and ensure the runtime environment actually
contains HF_TOKEN before invoking the script.
| echo "[v037] waiting for /v1/models (max 30 min)" | ||
| for i in $(seq 1 180); do | ||
| if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | ||
| echo "[v037] vllm healthy after ${i} x 10s" | ||
| break | ||
| fi | ||
| sleep 10 | ||
| done | ||
| if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | ||
| echo "[v037] vllm did not become healthy; tail of log:" >&2 | ||
| tail -40 "${LOG_DIR}/vllm_llama33.log" >&2 | ||
| exit 2 | ||
| fi | ||
| fi |
There was a problem hiding this comment.
Fail fast if /v1/models is healthy but serving the wrong model.
Readiness checks only endpoint availability. If another service still owns port 8000, generators can run against the wrong model.
🔧 Proposed fix
if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then
echo "[v037] vllm did not become healthy; tail of log:" >&2
tail -40 "${LOG_DIR}/vllm_llama33.log" >&2
exit 2
fi
+
+ actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \
+ | grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \
+ | head -1 \
+ | grep -oE '"[^"]*"$' \
+ | tr -d '"')"
+ if [[ "${actual_model}" != "${MODEL}" ]]; then
+ echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2
+ exit 3
+ fi
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo "[v037] waiting for /v1/models (max 30 min)" | |
| for i in $(seq 1 180); do | |
| if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | |
| echo "[v037] vllm healthy after ${i} x 10s" | |
| break | |
| fi | |
| sleep 10 | |
| done | |
| if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | |
| echo "[v037] vllm did not become healthy; tail of log:" >&2 | |
| tail -40 "${LOG_DIR}/vllm_llama33.log" >&2 | |
| exit 2 | |
| fi | |
| fi | |
| echo "[v037] waiting for /v1/models (max 30 min)" | |
| for i in $(seq 1 180); do | |
| if curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | |
| echo "[v037] vllm healthy after ${i} x 10s" | |
| break | |
| fi | |
| sleep 10 | |
| done | |
| if ! curl -sf "http://localhost:${PORT}/v1/models" >/dev/null 2>&1; then | |
| echo "[v037] vllm did not become healthy; tail of log:" >&2 | |
| tail -40 "${LOG_DIR}/vllm_llama33.log" >&2 | |
| exit 2 | |
| fi | |
| actual_model="$(curl -sf "http://localhost:${PORT}/v1/models" \ | |
| | grep -oE '"id"[[:space:]]*:[[:space:]]*"[^"]*"' \ | |
| | head -1 \ | |
| | grep -oE '"[^"]*"$' \ | |
| | tr -d '"')" | |
| if [[ "${actual_model}" != "${MODEL}" ]]; then | |
| echo "[v037] FATAL: endpoint is healthy but serving '${actual_model}' (expected '${MODEL}')" >&2 | |
| exit 3 | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/v037_droplet_run.sh` around lines 69 - 82, The readiness loop only
checks that /v1/models is reachable, not that it serves the expected model;
update the script around the PORT loop to, after curl succeeds, issue a second
curl to http://localhost:${PORT}/v1/models and inspect the JSON/text for the
expected model identifier (use a constant/variable like EXPECTED_MODEL or the
literal model name you expect), and if the returned list does not contain that
model then print the same failure message, tail "${LOG_DIR}/vllm_llama33.log" to
stderr and exit non-zero; ensure this check runs before breaking out of the wait
loop so the script fails fast if another service is answering on the port.
| def main() -> int: | ||
| files = sorted(GEN.glob("*-v037-*.jsonl")) | ||
| print(f"{len(files)} v037 files") | ||
| total_in = total_ok = 0 | ||
| for f in files: | ||
| n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0 | ||
| expected = EXPECTED_CAT[f.name[:2]] | ||
| for line in f.read_text().splitlines(): | ||
| line = line.strip() | ||
| if not line: | ||
| continue | ||
| n_in += 1 | ||
| try: | ||
| e = json.loads(line) | ||
| except Exception: | ||
| n_miss += 1 | ||
| continue | ||
| if not all(k in e for k in REQUIRED): | ||
| n_miss += 1 | ||
| continue | ||
| if "original_task" not in (e.get("context") or {}): | ||
| n_miss += 1 | ||
| continue | ||
| if e.get("expected") != "DENY": | ||
| n_exp += 1 | ||
| continue | ||
| if e.get("severity") not in VALID_SEV: | ||
| n_sev += 1 | ||
| continue | ||
| if e.get("category") != expected: | ||
| n_cat += 1 | ||
| continue | ||
| n_ok += 1 | ||
| flags = [] | ||
| if n_miss: | ||
| flags.append(f"missing={n_miss}") | ||
| if n_exp: | ||
| flags.append(f"bad-expected={n_exp}") | ||
| if n_sev: | ||
| flags.append(f"bad-sev={n_sev}") | ||
| if n_cat: | ||
| flags.append(f"wrong-cat={n_cat}") | ||
| total_in += n_in | ||
| total_ok += n_ok | ||
| print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}") | ||
| print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}") | ||
| return 0 |
There was a problem hiding this comment.
Validator exit code doesn't match "fail-loud" claim in docstring.
The docstring states "Fail-loud on any malformed entry," but the function always returns 0 (line 64), even when validation errors are detected. This means downstream steps won't automatically fail if bad entries are found.
Consider returning a non-zero exit code when validation failures are detected to match the fail-loud intent, or update the docstring to clarify that this is a report-only validator.
📋 Proposed fix to fail on validation errors
print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}")
- return 0
+ return 0 if total_in == total_ok else 1Or update the docstring:
-"""Validate v037 generated jsonl: schema + required-field + per-cat consistency.
-
-Mirrors validate_v036.py with the v037 glob. Fail-loud on any malformed entry
-so the build_split step gets a clean input.
-"""
+"""Validate v037 generated jsonl: schema + required-field + per-cat consistency.
+
+Mirrors validate_v036.py with the v037 glob. Reports validation statistics
+for manual review before the build_split step.
+"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def main() -> int: | |
| files = sorted(GEN.glob("*-v037-*.jsonl")) | |
| print(f"{len(files)} v037 files") | |
| total_in = total_ok = 0 | |
| for f in files: | |
| n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0 | |
| expected = EXPECTED_CAT[f.name[:2]] | |
| for line in f.read_text().splitlines(): | |
| line = line.strip() | |
| if not line: | |
| continue | |
| n_in += 1 | |
| try: | |
| e = json.loads(line) | |
| except Exception: | |
| n_miss += 1 | |
| continue | |
| if not all(k in e for k in REQUIRED): | |
| n_miss += 1 | |
| continue | |
| if "original_task" not in (e.get("context") or {}): | |
| n_miss += 1 | |
| continue | |
| if e.get("expected") != "DENY": | |
| n_exp += 1 | |
| continue | |
| if e.get("severity") not in VALID_SEV: | |
| n_sev += 1 | |
| continue | |
| if e.get("category") != expected: | |
| n_cat += 1 | |
| continue | |
| n_ok += 1 | |
| flags = [] | |
| if n_miss: | |
| flags.append(f"missing={n_miss}") | |
| if n_exp: | |
| flags.append(f"bad-expected={n_exp}") | |
| if n_sev: | |
| flags.append(f"bad-sev={n_sev}") | |
| if n_cat: | |
| flags.append(f"wrong-cat={n_cat}") | |
| total_in += n_in | |
| total_ok += n_ok | |
| print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}") | |
| print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}") | |
| return 0 | |
| def main() -> int: | |
| files = sorted(GEN.glob("*-v037-*.jsonl")) | |
| print(f"{len(files)} v037 files") | |
| total_in = total_ok = 0 | |
| for f in files: | |
| n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0 | |
| expected = EXPECTED_CAT[f.name[:2]] | |
| for line in f.read_text().splitlines(): | |
| line = line.strip() | |
| if not line: | |
| continue | |
| n_in += 1 | |
| try: | |
| e = json.loads(line) | |
| except Exception: | |
| n_miss += 1 | |
| continue | |
| if not all(k in e for k in REQUIRED): | |
| n_miss += 1 | |
| continue | |
| if "original_task" not in (e.get("context") or {}): | |
| n_miss += 1 | |
| continue | |
| if e.get("expected") != "DENY": | |
| n_exp += 1 | |
| continue | |
| if e.get("severity") not in VALID_SEV: | |
| n_sev += 1 | |
| continue | |
| if e.get("category") != expected: | |
| n_cat += 1 | |
| continue | |
| n_ok += 1 | |
| flags = [] | |
| if n_miss: | |
| flags.append(f"missing={n_miss}") | |
| if n_exp: | |
| flags.append(f"bad-expected={n_exp}") | |
| if n_sev: | |
| flags.append(f"bad-sev={n_sev}") | |
| if n_cat: | |
| flags.append(f"wrong-cat={n_cat}") | |
| total_in += n_in | |
| total_ok += n_ok | |
| print(f"{f.name:40s} in={n_in:4d} ok={n_ok:4d} bad={n_in - n_ok:3d} {' '.join(flags)}") | |
| print(f"TOTAL in={total_in} ok={total_ok} bad={total_in - total_ok}") | |
| return 0 if total_in == total_ok else 1 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/validate_v037.py` around lines 18 - 64, The main() function currently
always returns 0 despite the module claiming it should "fail-loud on any
malformed entry"; change the function to return a non-zero exit code when
validation failures are detected (e.g., if total_ok != total_in or any of
n_miss/n_exp/n_sev/n_cat > 0 across files) so callers see failure, or
alternatively update the module docstring to remove the "fail-loud" claim;
update the return at the end of main() (and any caller expectations) to reflect
the chosen behavior.
| total_in = total_ok = 0 | ||
| for f in files: | ||
| n_in = n_ok = n_miss = n_exp = n_sev = n_cat = 0 | ||
| expected = EXPECTED_CAT[f.name[:2]] |
There was a problem hiding this comment.
Add defensive check for unexpected filename prefix.
If a file in the directory doesn't start with a recognized prefix (TM/PE/DE), line 24 will raise KeyError when looking up f.name[:2] in EXPECTED_CAT. This would crash the validator.
🛡️ Proposed fix to handle unknown prefixes
+ prefix = f.name[:2]
+ if prefix not in EXPECTED_CAT:
+ print(f"SKIP {f.name}: unknown prefix {prefix}")
+ continue
- expected = EXPECTED_CAT[f.name[:2]]
+ expected = EXPECTED_CAT[prefix]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/validate_v037.py` at line 24, Guard the lookup into EXPECTED_CAT by
first extracting the filename prefix (prefix = f.name[:2]) and verifying it
exists in EXPECTED_CAT; if the prefix is missing, raise a clear ValueError (or
log and skip) that includes the offending filename so the validator fails with a
helpful message instead of a KeyError. Replace the direct lookup at the
assignment to expected (where EXPECTED_CAT[f.name[:2]] is used) with this check
and then assign expected = EXPECTED_CAT[prefix] only after the prefix is
validated. Ensure the error message names the file (f.name) and the invalid
prefix to aid debugging.
Summary
Third attacker family added to the cross-model held-out fold and v8 classifier ships as the new production default. Closes the worst v0.36 sub-cell by 12.9 pp on the same 700 Claude data_exfil entries (26.0% under v7 to 38.9% under v8), covers a third attacker family at 85.8% overall recall, and holds in-distribution at 86.6% on v035 TEST with FPR inside the prior confidence interval.
Headline numbers
What ships
src/vaara/data/adversarial_classifier_v8.joblib, 638 features (254 hand plus 384 MiniLM embedding), trained on v035 TRAIN plus v036 Mixtral and Claude TM/PE folded in. v7 retained on disk for cross-eval reproducibility. Threshold unchanged at 0.9006.RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicon AMD-backed MI300X viarocm/vllm:latest. 13 schema-invalid TM dropped, 887 valid (TM 287, PE 300, DE 300). Combined with v036 DE both legs for the new 2,277-entry v037 holdout fold.tests/adversarial/v037_split.jsoninherits v035_split verbatim, folds v036 TM/PE into train, marks v036 DE plus the full v037 leg as holdout.docs/mit_ai_risk_repository_mapping.mdplus the COMPLIANCE.md position paragraph.What changed
New files:
bench/vaara-bench-v0.37.md,bench/v037_eval_v8_holdout.json,tests/adversarial/v037_split.json,src/vaara/data/adversarial_classifier_v8.joblib, three v037 generated jsonl files, seven v037 scripts (_v037_common.py,generate_targeted_v037.py,v037_droplet_run.sh,v037_local_watcher.sh,validate_v037.py,build_v037_split.py,eval_v037_holdout.py),docs/mit_ai_risk_repository_mapping.md.Updated:
tests/adversarial/MANIFEST.sha256(v037 entries appended, 302 to 305), version bumps acrosspyproject.toml+clients/ts/package.json+src/vaara/__init__.py(0.36.0 to 0.37.0), CHANGELOG, README bench pointer, COMPLIANCE MIT Risk Repository position section.Test plan
expected)bench/v037_eval_v8_holdout.jsoncd tests/adversarial && sha256sum -c MANIFEST.sha256)Methodology notes
RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamicships withcompressed-tensorsquantization in its config. The explicit--quantization fp8argument conflicts and must be dropped. vLLM auto-detects the scheme from the model config and serves correctly.Summary by CodeRabbit
New Features
Documentation