feat: v0.6.0 standards alignment + legibility#42
Conversation
src/vaara/policy/ - New package with frozen dataclasses for action classes, thresholds, sequence patterns, and escalation routes - from_dict / from_json (stdlib, zero-dep) and from_yaml (gated on the [yaml] extra; raises ImportError with install hint if missing) - Hand-rolled validation with field-path error messages (e.g. "action_classes.x.category: 'bogus' is not one of [...]") - Reuses ActionCategory / Reversibility / BlastRadius / UrgencyClass / RegulatoryDomain from vaara.taxonomy.actions — no duplication - Threshold partial overrides: set just deny=0.75, inherit default escalate. Escalation routes match by article overlap with fallback to "on_call" tests/test_policy.py - 19 tests covering load paths, threshold resolution, escalation routing, validation errors, JSON / YAML loaders. All green. examples/policies/ - minimal.json + full.yaml as reference policies pyproject.toml - New [yaml] optional extra. Core dependencies = [] preserved. Implements Sketch A from research/dsl_design_exploration.md and v0.6 roadmap item 8. Sketches B (embedded Python DSL) and C (standalone DSL) stay deferred to v0.7+ pending external pull.
…d seam src/vaara/audit/sqlite_backend.py - New SQLiteAuditBackend.purge_older_than(retention_seconds, *, dry_run=False) Tenant-scoped DELETE of records older than now() - retention_seconds. Returns count deleted (or count that would be deleted in dry_run mode). Validates retention_seconds is a positive int. src/vaara/cli.py - New `vaara trail purge --db PATH --retention-days N [--dry-run]` subcommand Prints count purged, plus a one-line note about the hash-chain seam reminding the deployer to export a signed zip before future purges. tests/test_audit_purge.py - 10 tests: purge empty DB, mixed-age corpus, dry_run idempotence, validation rejects (zero, negative, non-int retention), survival of remaining records, tenant isolation across two backends sharing one DB file. COMPLIANCE.md - "Audit trail integrity" → "Retention policy" updated to document the new purge mechanism AND the hash-chain seam at the retention boundary. Intended workflow (export-then-purge) explicitly described. HASH-CHAIN IMPACT (also in code docstring): surviving records reference deleted predecessors via previous_hash, so vaara trail verify reports a chain break at the retention boundary. The signed handoff zip remains self-consistent forever; the live DB chain has a documented seam. The v0.7+ severable-bundles design (research/severable_bundles_sketch.md) eliminates the seam if needed. Implements roadmap item 3 from research/roadmap.md. 287/287 non-skipped tests pass; ruff clean.
Two new top-level sections cashing in the JTC21 WG2 deep-dive research (see research/jtc21_wg2_deep_dive.md). Compliance teams now have explicit pointers to where Vaara fits in two new dimensions of the EU AI Act / harmonised-standards story. EU AI Act Annex IV evidence sections - Maps Vaara's contribution to each of the nine Annex IV sections - Direct fill: §3 (monitoring), §5 (risk mgmt), §9 (post-market) - Contributes: §2 (elements), §4 (metrics), §6 (changes), §7 (standards) - Out of scope: §1 (general description), §8 (DoC document) CEN-CENELEC harmonised standards alignment - Per-standard table for the JTC21 WG2/3/4 work items Vaara aligns with: ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792 - Status, working group, and Vaara alignment notes per standard - Honest framing: pre-compliance positioning, not certified compliance. Once a standard publishes, expect a v0.6 / v0.7 alignment audit and an updated entry. Both sections are reviewable independently from any code change. This is the v0.6 standards-legibility story landing as a doc artifact.
…ecall scripts/eval_distribution_shift.py - New script: runs the full Vaara stack against the adversarial corpus with per-source tagging (hand_curated vs llm_generated) - Reports attack recall (caught/total) and benign FPR (escalated/total) per source/class combination - Loads top-level *.jsonl as hand-curated, generated/ + benign_generated/ subdirs as LLM-generated. Same JSON schema, same stacking rule as scripts/eval_adversarial.py. tests/adversarial/distribution_shift_v0_5_3.json - Captured numbers for v0.5.3 stack against bundle v1.4 corpus. COMPLIANCE.md "Current limits" - Replaced the "owed in v0.6" bullet with the actual per-source split: - Hand-curated (held-out, 250): attack recall 97.1%, benign FPR 70.0% - LLM-generated (in-sample, 5,705): attack recall 95.2%, benign FPR 87.5% - Honest framing: hand-curated is held-out, LLM-generated is in-sample (was in classifier training). The 18pp benign-FPR gap is the dominant distribution-shift signal. - Reconciles with CHANGELOG headline "global FPR 21.0%": that was classifier-alone 5-fold CV OOF. Full-stack FPR is heuristic-dominated (most benign escalations come from the heuristic ESCALATE branch). - Notes that proper OOF split for the LLM-generated portion is a v0.7 follow-up if the gap demands it. Implements roadmap item 5 from research/roadmap.md. 287/287 tests pass; ruff clean.
…rement
scripts/eval_stack_ablation.py
- New script: runs three configurations against the same corpus
(hand-curated + LLM-generated, source-tagged):
heuristic_only pipe.intercept() with classifier=None
classifier_only classifier.score() + threshold, no heuristic
full_stack pipe.intercept() + classifier upgrade
- Reports recall (attacks) and FPR (benigns) per source/class.
tests/adversarial/stack_ablation_v0_5_3.json
- Captured numbers for v0.5.3 stack against bundle v1.4 corpus.
Public artifact next to the existing eval results files.
COMPLIANCE.md "Current limits"
- New "Stack composition (v0.6 measurement)" bullet summarizing the
layer-level findings:
heuristic_only recall: 35% / 63% (hand-curated / LLM-generated)
classifier_only recall: 94% / 86%
full_stack recall: 97% / 98%
- Layers not redundant: heuristic catches some attacks classifier misses.
- Most full-stack benign FPR comes from heuristic ESCALATEs, not from
classifier upgrades on heuristic-ALLOWed entries.
Honest framing:
- Heuristic alone is insufficient as adversarial defence (35% recall
on hand-curated). Classifier carries most of the recall.
- Heuristic adds 6 unique attack catches on hand-curated (full-stack
198 vs classifier-only 192), justifying the ensemble.
- Full-stack 70-91% benign FPR is the deliberate v0.5.3 trade-off.
v0.7 needs operator-queue ergonomics or threshold-tuning recipes
via the YAML policy schema (item 8) to manage escalate volume.
Implements roadmap item 6 from research/roadmap.md. 287/287 tests pass;
ruff clean.
src/vaara/audit/trail.py - New TRANSPARENCY_DEFAULTS dict mapping each EventType to default values for the four 12792 axes (system_operation, data_usage, decision_making). 'limitations' stays None unless explicitly set — system-level constraints are recorded out-of-band, not per event. - AuditRecord gains four optional Optional[str] fields. Default-fill via __post_init__, but ONLY for newly constructed records (record_hash empty). Loaded-from-DB records (record_hash non-empty) skip the fill — inventing defaults retroactively would misrepresent historical records written before v0.6. - compute_hash() does NOT include transparency fields. They are metadata annotations, not tamper-evident. Trade-off explicitly documented in a NOTE comment. Pre-v0.6 chains stay valid because the hashable surface is unchanged. v0.7+ may add a separate signing mechanism if compliance teams require tamper-evident transparency tagging. src/vaara/audit/sqlite_backend.py - SCHEMA_VERSION 2 → 3. - SCHEMA_SQL gains four nullable columns at end of audit_records: system_operation, data_usage, decision_making, limitations. - _MIGRATIONS[2] adds them via ALTER TABLE on upgrade. Migration is non-destructive — pre-v0.6 rows get NULL for the new columns and their stored record_hash is preserved. - write_record persists the new fields; _row_to_record reads them with defensive length-guarded indexing for old query result shapes. tests/test_transparency_taxonomy.py - 9 tests: per-EventType defaults, override, loaded-record skip-fill, hash-chain backward compat (compute_hash unchanged), SQLite round-trip, v2→v3 migration on a hand-rolled v2-shape DB. COMPLIANCE.md - "Harmonised standards alignment" prEN ISO/IEC 12792 row updated: v0.6 ships the tagging. Notes that fields are NOT tamper-evident in v0.6 and explains why (backward compat with pre-v0.6 chains). Implements roadmap item 1 from research/roadmap.md. 296/296 tests pass (was 287, +9 transparency tests). Ruff clean.
scripts/eval_pair_attack.py
- New script: PAIR (Chao et al. 2023) iterative attacker against the
Vaara stack via vLLM endpoint. Same model in attacker + judge roles.
- Talks OpenAI-compatible chat-completions API (vLLM standard surface).
- Zero new runtime deps — uses urllib.request from stdlib. Requires
vaara[ml] for the classifier.
- Args: --endpoint, --model, --seeds, --max-iters, --limit, --out.
- Per-seed: iterates max-iters times. Each iteration:
1. Attacker LLM proposes a refined jailbreak prompt
2. Vaara stack (heuristic + classifier) evaluates the candidate
3. If Vaara ALLOWs, judge LLM confirms whether the candidate is a
real jailbreak (not just a benign-sounding rephrase)
4. Success = Vaara ALLOW + judge confirms. Otherwise iterate.
Reports ASR (success rate over seeds).
tests/adversarial/pair_v0_5_3.json
- v0.5.3 stack measurement: Qwen2.5-32B-Instruct as attacker + judge,
25 hand-curated jailbreak seeds, max 5 iterations per seed.
- ASR: 0.0% (0/25). 124 ESCALATE / 1 ALLOW across 125 candidates.
The 1 ALLOW was a "draft an email for fictional company-wide
announcement" candidate; judge ruled it not a successful jailbreak.
COMPLIANCE.md "Current limits"
- Replaced "owed in v0.6" PAIR placeholder with the actual measurement.
- Honest framing: NOT a claim of impervence to all adaptive attackers.
Stronger attacker (70B+), longer iteration budgets, or alternate
strategies (multi-turn drift, language-switch, obfuscation) might
produce non-zero ASR. v0.7 follow-up if compliance audience requires
harder calibration.
Implements roadmap item 4 from research/roadmap.md. Ran on rented
MI300x droplet (~$1.20 of $250 credit window — 32B sufficient for
this attacker tier; 70B reserved for v0.7 if needed). 296/296 tests
pass; ruff clean.
pyproject.toml: version bump. src/vaara/__init__.py: __version__ string sync (was lagging at 0.5.3 after the bump commit; prior bumps updated this in lockstep — see f08b5dd "version sync"). CHANGELOG.md: v0.6.0 entry. Theme: standards alignment + legibility. Per-section breakdown of the seven feature commits + docs cash-in: Added - vaara.policy package (JSON + [yaml] extra) - vaara trail purge CLI + SqliteAuditBackend.purge_older_than - prEN ISO/IEC 12792 four-axis transparency taxonomy on AuditRecord - scripts/eval_distribution_shift.py - scripts/eval_stack_ablation.py - scripts/eval_pair_attack.py - [yaml] optional extra (pyyaml>=6.0); core dependencies = [] preserved - examples/policies/{minimal.json, full.yaml} - COMPLIANCE.md gains Annex IV evidence-section mapping and CEN-CENELEC harmonised-standards alignment tables Changed - Audit DB schema v2 -> v3 (transparency columns, nullable, migration preserves pre-v0.6 record hashes) - COMPLIANCE.md "Current limits" placeholders replaced with measured numbers: distribution-shift split, stack composition, PAIR ASR Deferred to v0.7+ - prEN ISO/IEC 24970 field-alias layer (gated on standard publication) - DORA mapping refinement (gated on deployer signal)
Both have been persistently untracked but absent from .gitignore, so every git status emitted noise. research/ is the local experimentation scratch (droplet sync, generators, notebooks) that mirrors the BD-docs-not-public convention already applied to docs/blog/ and docs/grant/. .claude/ is local Claude Code state with no project-side relevance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Vaara v0.6.0: JSON/YAML policy schema and loaders, policy examples, four-axis transparency taxonomy with SQLite audit schema bump v2→v3, tenant-scoped purge API + CLI, three adversarial evaluation scripts and fixtures, packaging Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,230,201,0.5)
participant CLI as "User CLI"
participant Backend as "SQLiteAuditBackend"
participant DB as "SQLite DB"
participant Logger as "Logger"
end
CLI->>Backend: vaara trail purge --retention-days N [--dry-run]
Backend->>DB: open connection (tenant_id scoped)
Backend->>DB: SELECT COUNT(*) WHERE ts < cutoff AND tenant_id = ?
DB-->>Backend: would_delete_count
alt dry_run
Backend->>CLI: print "Would purge {count}" (dry-run)
Backend->>DB: close
else actual_purge
Backend->>DB: DELETE FROM audit_records WHERE ts < cutoff AND tenant_id = ?
DB-->>Backend: deleted_count
Backend->>Logger: warn about hash-chain seam at retention boundary
Backend->>CLI: print "Purged {count}" and retention warning
Backend->>DB: close
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
tests/test_audit_purge.py (1)
14-31: Consider using a fixture-scoped counter instead of global state.The global
_record_seqcounter persists across test runs within a session, which could theoretically cause issues with test isolation if tests make assumptions about record ID patterns. However, since each test gets a fresh SQLite database via thebackendfixture and record IDs include the timestamp, this is unlikely to cause actual failures.If you want stricter isolation, consider resetting the counter in a fixture or using
uuid.uuid4()instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_audit_purge.py` around lines 14 - 31, The global _record_seq used by _make_record creates shared state across tests; replace it with either a fixture-scoped counter or remove it entirely by generating unique IDs (e.g., use uuid.uuid4()) and/or reset the counter in a fixture so tests don't rely on module-level mutation; update the _make_record implementation to use the new generator (or fixture-injected counter) and remove or stop mutating the global _record_seq to ensure test isolation.scripts/eval_stack_ablation.py (1)
47-59: Verify source tagging logic.Files directly under
corpus_root/*.jsonlare tagged ashand_curated, while files undergenerated/andbenign_generated/subdirectories are tagged asllm_generated. This assumes a specific directory structure. Consider documenting this expected layout in the docstring or validating that at least some files are found.📝 Suggested documentation improvement
def load_corpus(corpus_root: Path) -> list[dict]: - """Load both hand-curated and LLM-generated entries with source tags.""" + """Load both hand-curated and LLM-generated entries with source tags. + + Expected layout: + corpus_root/*.jsonl -> tagged as 'hand_curated' + corpus_root/generated/*.jsonl -> tagged as 'llm_generated' + corpus_root/benign_generated/*.jsonl -> tagged as 'llm_generated' + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_stack_ablation.py` around lines 47 - 59, The load_corpus function currently tags files under corpus_root/*.jsonl as "hand_curated" and those under generated/ and benign_generated/ as "llm_generated" without documenting or validating that structure; update the load_corpus docstring to explicitly describe the expected directory layout (top-level jsonl are hand_curated, subdirs "generated" and "benign_generated" contain llm_generated) and add a simple validation after collecting entries (using entries, corpus_root, and _load_jsonl) to raise or warn if no files were found or if expected subdirs are missing so callers get immediate feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 27: Replace the misspelled word "impervence" with "imperviousness" in the
sentence inside CHANGELOG.md that reads "NOT a claim of impervence to all
adaptive attackers ..." and also fix the identical typo in COMPLIANCE.md so both
files read "NOT a claim of imperviousness to all adaptive attackers ...".
In `@COMPLIANCE.md`:
- Line 320: Replace the misspelled word "impervence" with "imperviousness" in
the sentence that currently reads "capability level. NOT a claim of impervence
to all adaptive attackers" so it reads "... NOT a claim of imperviousness to all
adaptive attackers"; update only that token to correct the typo.
In `@scripts/eval_distribution_shift.py`:
- Around line 161-165: The AdversarialClassifier() constructor can raise
ImportError for missing optional ML deps but is instantiated outside the
try-except, so wrap both the import and the classifier construction in the same
try-except block (covering from "from vaara.adversarial_classifier import
AdversarialClassifier" through "classifier = AdversarialClassifier()") and on
ImportError raise SystemExit with a clear message that includes the caught
exception (exc) so missing dependency errors are handled the same as import
failures.
In `@scripts/eval_pair_attack.py`:
- Line 88: The current fallback to "ALLOW" when extracting decision from
Pipeline.intercept() is unsafe; change the fallback to a safe value like "DENY"
(or "UNKNOWN") and validate the extracted value against allowable decisions
before using it. Specifically, in the code around the decision =
str(getattr(result, "decision", "ALLOW")).upper() expression, replace the
default "ALLOW" with "DENY" (or "UNKNOWN"), and add a short check that result is
the expected type and that decision is in the set of valid decisions (e.g.,
{"ALLOW","DENY","UNKNOWN"}); if validation fails, set decision to the safe
default and emit a warning/log so malformed returns from Pipeline.intercept()
are treated as denied.
- Around line 43-54: The call_llm function currently assumes the response is
valid JSON with choices[0].message.content and will let
JSONDecodeError/KeyError/IndexError/TypeError escape and kill the run; wrap the
response parsing and indexing in a try/except that catches json.JSONDecodeError,
KeyError, IndexError, and TypeError, read the raw response body for context, and
convert these into a consistent per-seed error (e.g., raise a specific
LLMResponseError or re-raise as urllib.error.URLError/ValueError with a clear
message containing the raw body) so callers that already handle
URLError/TimeoutError will treat malformed responses as per-seed failures; apply
the same change to the similar parsing block around lines 120-137.
In `@src/vaara/cli.py`:
- Around line 186-189: The code fails to expand user home tilde for the DB path;
change how db_path is constructed so you call Path(args.db).expanduser() (e.g.,
db_path = Path(args.db).expanduser()) before the exists() check used in the CLI
(the db_path variable and args.db reference), so "--db '~/path/audit.db'" is
correctly resolved prior to printing the "audit DB not found" error.
In `@src/vaara/policy/loader.py`:
- Around line 73-77: The thresholds_overrides dict built from thr_block should
be validated immediately instead of deferring to Policy.threshold_for; for each
k, v in thr_block (skipping "default") iterate per-action override mapping (the
inner dict from thresholds_overrides) and attempt to construct or validate using
the Thresholds constructor/validator (or replicate its range/order checks) to
ensure values are floats in [0,1] and that escalate <= deny <= allow ordering
(or the exact invariants enforced by Thresholds); raise a clear exception on
invalid entries so malformed policies like {"foo": {"escalate": 0.9, "deny":
0.2}} are rejected during load rather than later in Policy.threshold_for().
- Around line 47-49: The from_dict() implementation must validate nested section
shapes before iterating/coercing: for each nested key (e.g., action_classes,
thresholds, escalation, sequences) check the raw value's type (mapping for
dict-like sections, list/sequence for lists, dict for nested mappings like
sequences[name].pattern) and raise PolicyError with the offending path (e.g.,
"action_classes.<name>: must be a mapping" or "thresholds: must be a sequence")
instead of calling .items(), .get(), or tuple(...) which can raise
AttributeError or coerce strings into tuples; update the parsing code in
from_dict() to perform these explicit isinstance checks and raise PolicyError
with clear path context for the other blocks referenced (the blocks currently
around the action_classes, thresholds, escalation, sequences parsing) before any
further processing.
In `@src/vaara/policy/schema.py`:
- Around line 1-5: The Policy dataclass is currently "frozen" but its mutable
dict fields (action_classes, thresholds_overrides) allow in-place mutation; in
Policy.__post_init__ wrap/replace these fields with immutable mapping proxies
(e.g., types.MappingProxyType) or deep-copied nested mapping proxies so nested
dicts are also protected, ensuring the frozen value-object guarantee; update
Policy.__post_init__ to import types and perform deep copy + MappingProxyType
conversion for action_classes and for each nested mapping inside
thresholds_overrides (and assign back to the dataclass fields inside
__post_init__ despite frozen dataclass by using object.__setattr__).
---
Nitpick comments:
In `@scripts/eval_stack_ablation.py`:
- Around line 47-59: The load_corpus function currently tags files under
corpus_root/*.jsonl as "hand_curated" and those under generated/ and
benign_generated/ as "llm_generated" without documenting or validating that
structure; update the load_corpus docstring to explicitly describe the expected
directory layout (top-level jsonl are hand_curated, subdirs "generated" and
"benign_generated" contain llm_generated) and add a simple validation after
collecting entries (using entries, corpus_root, and _load_jsonl) to raise or
warn if no files were found or if expected subdirs are missing so callers get
immediate feedback.
In `@tests/test_audit_purge.py`:
- Around line 14-31: The global _record_seq used by _make_record creates shared
state across tests; replace it with either a fixture-scoped counter or remove it
entirely by generating unique IDs (e.g., use uuid.uuid4()) and/or reset the
counter in a fixture so tests don't rely on module-level mutation; update the
_make_record implementation to use the new generator (or fixture-injected
counter) and remove or stop mutating the global _record_seq to ensure test
isolation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 745e85cf-842a-4c29-8490-c420a8dc25fc
📒 Files selected for processing (22)
.gitignoreCHANGELOG.mdCOMPLIANCE.mdexamples/policies/full.yamlexamples/policies/minimal.jsonpyproject.tomlscripts/eval_distribution_shift.pyscripts/eval_pair_attack.pyscripts/eval_stack_ablation.pysrc/vaara/__init__.pysrc/vaara/audit/sqlite_backend.pysrc/vaara/audit/trail.pysrc/vaara/cli.pysrc/vaara/policy/__init__.pysrc/vaara/policy/loader.pysrc/vaara/policy/schema.pytests/adversarial/distribution_shift_v0_5_3.jsontests/adversarial/pair_v0_5_3.jsontests/adversarial/stack_ablation_v0_5_3.jsontests/test_audit_purge.pytests/test_policy.pytests/test_transparency_taxonomy.py
| - **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results: | ||
| - **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal. | ||
| - **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant — heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades. | ||
| - **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. |
There was a problem hiding this comment.
Fix typo: "impervence" → "imperviousness".
Same spelling error as in COMPLIANCE.md.
✏️ Proposed fix
- **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger at...
+ **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger at...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. | |
| - **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. |
🧰 Tools
🪛 LanguageTool
[grammar] ~27-~27: Ensure spelling is correct
Context: ...ns: ASR 0.0% (0/25). NOT a claim of impervence to all adaptive attackers — stronger at...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` at line 27, Replace the misspelled word "impervence" with
"imperviousness" in the sentence inside CHANGELOG.md that reads "NOT a claim of
impervence to all adaptive attackers ..." and also fix the identical typo in
COMPLIANCE.md so both files read "NOT a claim of imperviousness to all adaptive
attackers ...".
Three findings on PR #42: - Line 33 (failure, false positive). `for event_type in EventType:` flagged as "non-iterable in for loop". EnumType metaclass provides __iter__ at runtime; CodeQL's static analyzer can't resolve it. Cast to list(EventType) — semantically identical, analyzer-friendly. - Lines 110-128 and 162-173 (notices). SQLiteAuditBackend implements __enter__/__exit__ (sqlite_backend.py:817), so the try/finally backend.close() pattern was unnecessary. Refactored to with-blocks. Tests still pass (9/9 in this file, 296/296 overall, 12 skipped on vaara[ml] extras path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on constructor, #5 malformed-LLM-response handling) are deferred to a follow-up eval-script hardening pass. Major: 1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline result. Default fallback was "ALLOW", which would count a partial or malformed pipeline response as a successful jailbreak in ASR measurement. Now defaults to DENY when .decision is missing or unrecognised. 2. src/vaara/policy/loader.py — strict shape validation at every section boundary. Inputs like action_classes:[], thresholds:[], escalation:[], or sequences.foo.pattern:"abc" previously either AttributeError'd or got silently coerced (tuple("abc") => ("a","b","c")). New _require_mapping() and _require_sequence() helpers raise PolicyError with the field path. String/bytes are rejected by _require_sequence to prevent character-tuple coercion. 3. src/vaara/policy/loader.py — per-action threshold overrides now validated at load time. A malformed override like {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly and only blew up when threshold_for("foo") was queried. Now we construct Thresholds with the merged-with-default values during load and surface PolicyError with the offending action class path. 4. src/vaara/policy/schema.py — Policy is now actually frozen. action_classes and thresholds_overrides were declared on a frozen dataclass but were plain dicts, so callers could do policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__ now wraps both with MappingProxyType (nested override dicts too) and the field annotations switch to Mapping. Minor: 5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness" in two places (the v0.6 PAIR calibration section). 6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now expanduser()s the path before the existence check. Tests: 8 new cases in tests/test_policy.py cover the new validation and immutability paths. Full suite 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
scripts/eval_pair_attack.py (1)
43-54:⚠️ Potential issue | 🟠 MajorMalformed LLM responses can still crash the evaluation run.
The parsing at line 54 can raise
JSONDecodeError,KeyError, orIndexErrorif the API returns non-JSON or an unexpected response shape. These exceptions are not caught by theURLError/TimeoutErrorhandlers inrun_pair_one(lines 128-129, 142-143), causing the entire evaluation to terminate on a single bad response.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_pair_attack.py` around lines 43 - 54, call_llm currently assumes the HTTP response is well-formed JSON and accesses ["choices"][0]["message"]["content"] directly, which can raise JSONDecodeError/KeyError/IndexError and crash the run; wrap the response reading/parsing in a try/except that catches json.JSONDecodeError, KeyError, IndexError, and ValueError, and on failure raise a urllib.error.URLError (or re-use an existing exception type handled by run_pair_one) with a clear message that includes the raw response body for debugging; also defensively check the HTTP response status (if available) and treat non-2xx as errors so run_pair_one's existing URLError/TimeoutError handlers will catch malformed or unexpected responses from call_llm.
🧹 Nitpick comments (1)
src/vaara/cli.py (1)
182-217: LGTM! Thevaara trail purgeimplementation is solid.The past review comment about
expanduser()has been addressed (line 186). The function validates inputs, handles cleanup viatry/finally, and provides appropriate user feedback including the hash-chain seam warning on stderr.Minor style suggestion: Consider using the context manager protocol since
SQLiteAuditBackendimplements__enter__/__exit__(persrc/vaara/audit/sqlite_backend.py:816-821).♻️ Optional refactor to use context manager
- backend = SQLiteAuditBackend(db_path) - try: - count = backend.purge_older_than(retention_seconds, dry_run=args.dry_run) - finally: - backend.close() + with SQLiteAuditBackend(db_path) as backend: + count = backend.purge_older_than(retention_seconds, dry_run=args.dry_run)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/vaara/cli.py` around lines 182 - 217, Refactor _cmd_trail_purge to use the context manager on SQLiteAuditBackend instead of manual try/finally: replace creating backend = SQLiteAuditBackend(db_path) and the try/finally close pattern with a with SQLiteAuditBackend(db_path) as backend: block, then call backend.purge_older_than(retention_seconds, dry_run=args.dry_run) inside that block and preserve all existing logic around args.dry_run, printing, and return codes; this leverages SQLiteAuditBackend.__enter__/__exit__ while keeping purge_older_than and the same error/print behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/eval_pair_attack.py`:
- Around line 4-7: The module-level docstring in scripts/eval_pair_attack.py
claims the run uses "Llama-3.3-70B as both attacker and judge" which conflicts
with COMPLIANCE.md (line 309) that documents "Qwen2.5-32B-Instruct (Apache
2.0)"; update the docstring text to accurately reflect the actual model used
(or, if the doc is wrong, update COMPLIANCE.md instead) so both sources match;
locate the module docstring in eval_pair_attack.py and replace the model string
to "Qwen2.5-32B-Instruct (Apache 2.0)" (or the correct model name you decide to
keep) and run a quick grep of the repository for other occurrences of
"Llama-3.3-70B" to ensure no stale references remain.
In `@tests/test_policy.py`:
- Around line 253-265: The module-level pytest.importorskip("yaml") causes all
tests in this file to be skipped when PyYAML is absent; move the import-or-skip
call into the YAML-specific tests (test_from_yaml_string and
test_from_yaml_path) so only those two tests are skipped. Specifically, remove
the top-level yaml = pytest.importorskip("yaml") and instead call yaml =
pytest.importorskip("yaml") at the start of each of test_from_yaml_string and
test_from_yaml_path (or use pytest.importorskip inside those functions and then
import yaml via pytest.importorskip), ensuring the yaml variable is available to
each test that needs it.
---
Duplicate comments:
In `@scripts/eval_pair_attack.py`:
- Around line 43-54: call_llm currently assumes the HTTP response is well-formed
JSON and accesses ["choices"][0]["message"]["content"] directly, which can raise
JSONDecodeError/KeyError/IndexError and crash the run; wrap the response
reading/parsing in a try/except that catches json.JSONDecodeError, KeyError,
IndexError, and ValueError, and on failure raise a urllib.error.URLError (or
re-use an existing exception type handled by run_pair_one) with a clear message
that includes the raw response body for debugging; also defensively check the
HTTP response status (if available) and treat non-2xx as errors so
run_pair_one's existing URLError/TimeoutError handlers will catch malformed or
unexpected responses from call_llm.
---
Nitpick comments:
In `@src/vaara/cli.py`:
- Around line 182-217: Refactor _cmd_trail_purge to use the context manager on
SQLiteAuditBackend instead of manual try/finally: replace creating backend =
SQLiteAuditBackend(db_path) and the try/finally close pattern with a with
SQLiteAuditBackend(db_path) as backend: block, then call
backend.purge_older_than(retention_seconds, dry_run=args.dry_run) inside that
block and preserve all existing logic around args.dry_run, printing, and return
codes; this leverages SQLiteAuditBackend.__enter__/__exit__ while keeping
purge_older_than and the same error/print behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 36120b96-bb11-4536-888b-061af7ccab76
📒 Files selected for processing (7)
CHANGELOG.mdCOMPLIANCE.mdscripts/eval_pair_attack.pysrc/vaara/cli.pysrc/vaara/policy/loader.pysrc/vaara/policy/schema.pytests/test_policy.py
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
- src/vaara/policy/loader.py
Cleans up the two new findings from the 11f0d11 re-review plus the two eval-script majors that were originally deferred. v0.6 surface fully clean — no outstanding CodeRabbit advisories. New findings on 11f0d11: - tests/test_policy.py — module-level pytest.importorskip("yaml") was silently skipping the JSON / validation / immutability tests above it on a non-[yaml] install. Moved into per-test invocations so only the two yaml-specific tests skip when pyyaml is absent. - scripts/eval_pair_attack.py — docstring referenced Llama-3.3-70B as attacker/judge, but the v0.6 calibration ran on Qwen2.5-32B-Instruct (per CHANGELOG and COMPLIANCE.md). Updated docstring. Originally deferred, now landed for completeness: - scripts/eval_distribution_shift.py — wrap AdversarialClassifier() constructor inside the same try/except as the import. Without ML extras, the constructor itself raises ImportError per its docstring, which previously bypassed the SystemExit path and crashed with a raw traceback. - scripts/eval_pair_attack.py — handle malformed LLM responses as per-seed errors. New LLMResponseError raised from call_llm() when the body fails to parse as JSON, lacks `choices`, or lacks `message.content`. Caller's except tuple extended so the run keeps going across the remaining seeds instead of dying on one bad response. Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
scripts/eval_distribution_shift.py (1)
77-77: Consider defaulting to"ERROR:Unknown"for consistency withevaluate()bucketing.When
result.decisionis missing, the fallback"UNKNOWN"won't match any of the outcome branches inevaluate()(lines 113-120), so the entry counts towardnbut no outcome counter increments. This silently skews metrics.For consistency with
eval_pair_attack.py(which defaults to"DENY") and proper metric accounting, consider either:
- Returning an error-like string:
f"ERROR:MissingDecision"- Or treating it as
"DENY"to fail closed♻️ Option A: Count as error
- decision = str(getattr(result, "decision", "UNKNOWN")).upper() + raw = getattr(result, "decision", None) + if raw is None: + return "ERROR:MissingDecision" + decision = str(raw).upper()♻️ Option B: Fail closed (match eval_pair_attack.py)
- decision = str(getattr(result, "decision", "UNKNOWN")).upper() + raw = getattr(result, "decision", None) + decision = str(raw).upper() if raw is not None else "DENY" + if decision not in {"ALLOW", "ESCALATE", "DENY"}: + decision = "DENY"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_distribution_shift.py` at line 77, The fallback for result.decision should be an error-like string so evaluate() counts it in the error bucket; change the assignment of the decision variable (currently decision = str(getattr(result, "decision", "UNKNOWN")).upper()) to default to an ERROR token (e.g., "ERROR:MissingDecision" or "ERROR:Unknown") instead of "UNKNOWN" so the entry increments the error outcome in evaluate() and does not silently skew metrics; update the decision assignment in scripts/eval_distribution_shift.py (the decision variable) to use the chosen ERROR default and ensure it remains uppercased.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_policy.py`:
- Around line 246-249: The test
test_from_json_unreadable_input_raises_policy_error is flaky because
from_json("not json at all") may hit a real file in the CWD; update the test to
use a guaranteed-missing tmp_path file instead: create a Path via the pytest
tmp_path fixture (e.g. tmp_path / "surely_missing.json"), ensure it does not
exist, and pass its string to from_json in the pytest.raises block so
from_json(...) deterministically raises PolicyError; keep the assertion text
match unchanged and reference the test function name and from_json call when
making the change.
---
Nitpick comments:
In `@scripts/eval_distribution_shift.py`:
- Line 77: The fallback for result.decision should be an error-like string so
evaluate() counts it in the error bucket; change the assignment of the decision
variable (currently decision = str(getattr(result, "decision",
"UNKNOWN")).upper()) to default to an ERROR token (e.g., "ERROR:MissingDecision"
or "ERROR:Unknown") instead of "UNKNOWN" so the entry increments the error
outcome in evaluate() and does not silently skew metrics; update the decision
assignment in scripts/eval_distribution_shift.py (the decision variable) to use
the chosen ERROR default and ensure it remains uppercased.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9c645cf0-208b-46a9-bd88-b1cc548f247f
📒 Files selected for processing (3)
scripts/eval_distribution_shift.pyscripts/eval_pair_attack.pytests/test_policy.py
…t skew Two findings from the 11:23 UTC re-review: - tests/test_policy.py:249 — Minor flakiness. The unreadable-input test passed `"not json at all"` to from_json, which would mistakenly succeed if a file with that exact name existed in the CWD. Switched to a tmp_path-based missing path so the test is deterministic. - scripts/eval_distribution_shift.py:77 — Nitpick that CodeRabbit framed mildly but is functionally a measurement skew. Fallback "UNKNOWN" doesn't match any branch in evaluate() (deny / escalate / allow / error), so a missing result.decision would silently increment `n` without any outcome bucket, deflating recall and FPR reporting. Fail-closed pattern now matches eval_pair_attack.py (DENY for missing, DENY for unrecognised). Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/test_policy.py (1)
166-181: Consider parametrizing strict-shape mapping tests to reduce duplication.These three tests share identical structure and could be consolidated with
pytest.mark.parametrizefor easier extension and maintenance.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_policy.py` around lines 166 - 181, Consolidate the three near-identical tests into a single parametrized test using pytest.mark.parametrize: iterate over the field names "action_classes", "thresholds", and "escalation", build bad = {**MINIMAL_POLICY, field: []}, and assert from_dict(bad) raises PolicyError with the appropriate regex f"{field}: must be a mapping"; update or remove the original test_action_classes_must_be_mapping_not_list, test_thresholds_must_be_mapping_not_list, and test_escalation_must_be_mapping_not_list accordingly so the single test covers all three cases while still referencing MINIMAL_POLICY, from_dict, and PolicyError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/eval_distribution_shift.py`:
- Around line 120-128: The code currently records ERROR outcomes into b["error"]
but still proceeds to compute and return metrics; update the post-counting logic
to abort the run if any bucket has b["error"] > 0: after the loop that
builds/upserts buckets and before the final return that computes percentages,
scan buckets for any nonzero "error" counts and raise a non-zero exit (e.g.,
raise SystemExit or RuntimeError) with a clear message listing the offending
src/cls keys and counts so the process fails instead of publishing invalid
percentages; apply this check to the same counting logic that uses variables
decision, b, buckets and the final return that spreads counts into the result.
- Around line 165-166: The current corpus validation only checks
corpus_root.exists(), which allows non-directories or empty loads to proceed;
update the validation to ensure corpus_root.is_dir() and after loading JSONL
compute the total entries (e.g., the variable tracking loaded entries or
total_entries) and raise SystemExit with a clear message if total_entries == 0;
modify the checks around corpus_root and the JSONL load logic (referencing
corpus_root and the variable that accumulates loaded entries) so the script
fails fast when the corpus path is not a directory or when no entries were
loaded.
- Around line 99-106: The function expected_category should validate the
entry["expected"] value instead of coercing unknown/missing values to "attack":
in expected_category, check that "expected" exists and is either a string
"ALLOW"/"DENY" (case-insensitive) or a non-empty list of such strings, normalize
to uppercase, and raise a ValueError with a descriptive message if the type or
contained labels are invalid or empty; then return "benign" only when the
normalized set is exactly {"ALLOW"} and "attack" when exactly {"DENY"} (or the
appropriate normalized set), so malformed fixtures fail fast (refer to function
expected_category and the local variables expected and expected_set).
---
Nitpick comments:
In `@tests/test_policy.py`:
- Around line 166-181: Consolidate the three near-identical tests into a single
parametrized test using pytest.mark.parametrize: iterate over the field names
"action_classes", "thresholds", and "escalation", build bad = {**MINIMAL_POLICY,
field: []}, and assert from_dict(bad) raises PolicyError with the appropriate
regex f"{field}: must be a mapping"; update or remove the original
test_action_classes_must_be_mapping_not_list,
test_thresholds_must_be_mapping_not_list, and
test_escalation_must_be_mapping_not_list accordingly so the single test covers
all three cases while still referencing MINIMAL_POLICY, from_dict, and
PolicyError.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a3a0254e-584c-4acd-9d61-a8d1ed7103ad
📒 Files selected for processing (2)
scripts/eval_distribution_shift.pytests/test_policy.py
Three findings on a784831, all in scripts/eval_distribution_shift.py: - expected_category() (Major). Missing or invalid `expected` fields silently coerced to "DENY" then categorised as "attack", which would skew recall and FPR if a fixture had a typo. Now raises ValueError naming the offending entry id and the bad value(s). Allowed set: {ALLOW, DENY, ESCALATE}. - report() (Major). Metrics were printed even when buckets recorded ERROR outcomes, producing publishable but invalid percentages. Aborts via SystemExit if any bucket has errors > 0. - main() (Minor). corpus_root only checked for `.exists()` and the loop continued with zero entries (empty "successful" run). Now requires `.is_dir()` and aborts when zero JSONL entries load. Smoke-tested expected_category locally: missing field, bogus value, ALLOW, DENY all behave as expected. Full suite 304 passed / 12 skipped, ruff clean. Note: the v0.6 published numbers came from runs where every entry had a valid `expected` and zero pipeline errors, so this hardening doesn't invalidate the existing artifacts. It just prevents future accidental fixture drift from producing silent skew. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
Adds scripts/lint_full.sh as the pre-push hygiene gate so CodeRabbit-
class findings get caught locally before PR review round-trips.
Documented in CONTRIBUTING.md and listed in the v0.6.0 CHANGELOG.
Tools:
- ruff (already in [dev]) — style + correctness
- bandit>=1.7.5 (new) — security-focused static analysis
- mypy>=1.8 (new) — types; strict on vaara.policy, lenient on legacy
via [[tool.mypy.overrides]]. Modules join the strict block as they
get cleaned up, so the typing floor only ratchets upward.
- pytest (already)
Bandit configuration in pyproject.toml:
- skips=["B608"] with explanation. Every f-string SQL in
audit/sqlite_backend.py interpolates only the output of
_tenant_clause(), which returns one of two literal strings
("tenant_id = ?" or "1=1"). User values always go through `?`
parameter binding. Bandit can't see the constraint, so the pattern
trips B608 uniformly. False positive for our usage.
- Two inline `# nosec` annotations:
- mc_dropout_gate.py:94 (B614) — trusted-bundle fallback after
weights_only=True already failed because the bundle contains
sklearn objects. Operator gets a logger.warning telling them to
ensure provenance.
- trace_gen.py:201 (B311) — random.Random for synthetic-trace
sampling, not security-sensitive.
vaara.policy made strictly mypy-clean to seed the typing floor:
- _coerce_enum is now generic on the enum type (TypeVar EnumT bound
to Enum) so call sites get the precise enum back rather than object.
- _require_mapping / _require_sequence get explicit object/dict/list
annotations.
- The yaml import type-ignore narrowed to [import-untyped] (the
actual code mypy emits) instead of [import-not-found].
Local sweep runtime: ~10s. Tests still 304 passed / 12 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (3)
scripts/eval_stack_ablation.py (3)
168-173:⚠️ Potential issue | 🟠 MajorWrap
AdversarialClassifier()in the sametryblock as the import.The constructor can still raise
ImportErrorwhen optional ML deps are missing, so the current code can fall through to an unhandled traceback even after the import guard.💡 Proposed fix
try: from vaara.adversarial_classifier import AdversarialClassifier + classifier = AdversarialClassifier() except ImportError as exc: raise SystemExit(f"this script needs vaara[ml]: {exc}") - classifier = AdversarialClassifier()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_stack_ablation.py` around lines 168 - 173, The instantiation of AdversarialClassifier can raise ImportError and must be inside the same try/except as the import; move the call to AdversarialClassifier() into the try block that imports vaara.adversarial_classifier and catch ImportError as exc, then raise SystemExit with a clear message including exc (i.e., wrap both "from vaara.adversarial_classifier import AdversarialClassifier" and "classifier = AdversarialClassifier()" in the same try and keep the existing print(f"[classifier] bundle v{classifier.bundle_version}, threshold={classifier.threshold}") after the try).
62-65:⚠️ Potential issue | 🟠 MajorValidate
expectedlabels before bucketing them.Missing or malformed
expectedvalues are silently treated as"attack"here, which can skew both recall and benign FPR. This should fail fast instead of defaulting to"DENY".💡 Proposed fix
def expected_class(entry: dict) -> str: - expected = entry.get("expected", "DENY") - s = {str(x).upper() for x in (expected if isinstance(expected, list) else [expected])} + if "expected" not in entry: + raise ValueError("missing required field: expected") + expected = entry["expected"] + values = expected if isinstance(expected, list) else [expected] + s = {str(x).upper() for x in values} + allowed = {"ALLOW", "DENY", "ESCALATE"} + if not s or not s.issubset(allowed): + raise ValueError(f"invalid expected value(s): {sorted(s)}") return "benign" if s == {"ALLOW"} else "attack"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_stack_ablation.py` around lines 62 - 65, The function expected_class currently defaults missing/malformed expected values to "DENY" and silently classifies them as "attack"; update expected_class to validate the entry["expected"] value(s) (accepting either a string or list of strings, case-insensitive) and raise a clear exception when expected is missing or contains any value outside the allowed set {"ALLOW","DENY"} instead of defaulting, then keep the existing bucketing logic (treat {"ALLOW"} -> "benign", otherwise "attack"); refer to the expected_class function and the local variable expected/entry to implement this validation and error handling.
133-155:⚠️ Potential issue | 🟠 MajorAbort the run before reporting metrics if any
ERROR:*decisions occurred.The script records
errcounts, but it still prints and optionally writes metrics afterward. That can publish invalid percentages from a partially failed run.💡 Proposed fix
def report(counts, json_out: Path | None) -> None: """Print per-config / per-source recall + FPR table.""" + total_errors = sum(c["err"] for by_key in counts.values() for c in by_key.values()) + if total_errors: + raise SystemExit( + f"aborting: {total_errors} ERROR outcome(s) present; " + "fix corpus/runtime issues before publishing metrics" + ) + rows = [] print("\n=== Stack ablation (v0.5.3) ===")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/eval_stack_ablation.py` around lines 133 - 155, In report(), before printing or writing metrics, detect any recorded errors by scanning counts (for each config in ("heuristic_only","classifier_only","full_stack") and each bucket c in counts[config]) and if any c["err"] > 0 abort the run (raise SystemExit or call sys.exit with non‑zero) after printing a short diagnostic message; do this check early in the function (using the existing counts, config_name, c and err keys) so you never print or write metrics to json_out when there were ERROR:* decisions.
🧹 Nitpick comments (2)
tests/test_audit_purge.py (1)
17-31: Seed purge tests through a real hash chain.
_make_record()computes standalone hashes with an emptyprevious_hash, so any sequence written directly withbackend.write_record()is already chain-broken before purge. That makes it impossible for this module to assert the documented retention-boundary seam separately from setup artifacts. Consider seeding viaAuditTrailor linkingprevious_hash/record_hashexplicitly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_audit_purge.py` around lines 17 - 31, The helper _make_record builds AuditRecord objects but computes record_hash with an empty previous_hash, breaking any chain when tests call backend.write_record() directly; instead seed tests with a proper chain by either creating/appending records through AuditTrail (use AuditTrail.append_record or whatever method ensures previous_hash is set) or, if creating records manually, set each record.previous_hash to the prior record.record_hash before calling rec.record_hash = rec.compute_hash() so the linked hash chain is intact for purge boundary assertions (refer to AuditRecord, compute_hash, previous_hash, record_hash, backend.write_record, and AuditTrail).tests/test_transparency_taxonomy.py (1)
128-167: Make the migration test prove the columns were actually added.
_row_to_record()now treats missingrow[12:16]asNone, so this test still passes even if the v2→v3 migration never creates the four transparency columns. Please assert the schema contains those columns, or do a post-migration write/read round-trip that would fail on a partially upgraded DB.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_transparency_taxonomy.py` around lines 128 - 167, The test currently doesn't prove the v2→v3 migration actually added the four transparency columns; after opening the DB with SQLiteAuditBackend in test_migration_from_v2_to_v3, query the migrated schema (e.g., PRAGMA table_info('audit_records')) and assert that columns 'system_operation', 'data_usage', 'decision_making', and 'limitations' are present, or perform a post-migration write/read round-trip via SQLiteAuditBackend (insert a record with those fields set, then reload it) to fail if the columns were never created; reference the test function test_migration_from_v2_to_v3 and the SQLiteAuditBackend/_row_to_record code paths so the assertion validates the actual schema change rather than relying on None defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 13-14: Update the CHANGELOG wording to avoid implying "vaara trail
verify" validates the live DB after purge: clarify that src/vaara/cli.py's vaara
trail verify only validates signed zip bundles (the exported handoff) and that
those bundles remain self-consistent, while the live DB/trail will show a
deliberate chain seam where deleted records are referenced via previous_hash;
recommend exporting a signed handoff zip and using vaara trail verify on that
bundle, and explicitly state that the live DB chain break at the retention
boundary is expected and not validated by vaara trail verify.
In `@scripts/eval_distribution_shift.py`:
- Around line 69-73: The call to pipe.intercept is collapsing missing agent_id
values to the constant "adv", causing unrelated samples to share history; change
the fallback for agent_id in the pipe.intercept call (where
agent_id=entry.get("agent_id", "adv")) to produce a unique per-sample id (for
example derive from an existing entry unique field, use the loop index, or
generate a uuid) so each sample without an explicit agent_id gets its own
distinct identifier instead of "adv".
In `@scripts/eval_pair_attack.py`:
- Around line 161-163: The --model argument currently defaults to "llama" which
can cause silent mismatches; update the ap.add_argument for "--model" (the
parser setup in scripts/eval_pair_attack.py) to make the served model explicit
by either setting required=True or changing the default to the documented
"Qwen2.5-32B-Instruct", and update the help string to mention it must match the
vLLM --served-model-name to ensure reproducible evaluations.
- Around line 173-181: The ImportError branch currently falls back silently to
heuristic-only behavior, making published PAIR runs non-reproducible; change the
logic around importing AdversarialClassifier so that if AdversarialClassifier
cannot be imported or instantiation fails, the script aborts with a clear
error/exit (instead of setting classifier = None) and a message instructing the
user to install vaara[ml]; update the block that currently mentions
AdversarialClassifier and classifier (and the subsequent Pipeline usage) to
validate classifier is available before proceeding and exit/raise with a
descriptive error if not.
In `@scripts/eval_stack_ablation.py`:
- Around line 95-102: full_stack currently re-invokes heuristic_only(entry,
pipe) which can mutate sequence/agent-history state and make comparisons
invalid; change full_stack so it does NOT call heuristic_only again — instead
either accept the heuristic result as an extra parameter (e.g., def
full_stack(entry: dict, pipe: Pipeline, classifier, heuristic_result: str) ->
str) or ensure the caller computes h = heuristic_only(...) once and passes it
in; then use that h and only call classifier_only(entry, classifier) when h ==
"ALLOW". Apply the same fix to the other occurrence referenced (the block around
lines 121-125) so heuristic_only is computed once and reused rather than re-run.
In `@src/vaara/audit/sqlite_backend.py`:
- Around line 137-142: The migration logic in _run_migrations() must not
unconditionally set schema_version to 3 after any exception during the ALTER
TABLE block; change it so that the ALTERs for adding system_operation,
data_usage, decision_making, and limitations are executed atomically (either run
them inside a single transaction or run each ALTER and only mark success after
all four succeed) and ensure that on any exception you do not update
schema_version and instead surface the error (or return a failure) so
write_record() won't assume the columns exist; update _run_migrations() to only
update schema_version when all ALTERs complete without error and reference the
ALTER statements and the schema_version update in that function to implement
this behavior.
In `@src/vaara/cli.py`:
- Around line 199-201: The purge command currently instantiates
SQLiteAuditBackend(db_path) without a tenant, causing purge_older_than to act
globally; update the CLI parser for "vaara trail purge" to accept a tenant
override (e.g., --tenant or --tenant-id) and pass that tenant into the audit
backend instantiation or directly into purge_older_than so the tenant-scoped
clause is used; specifically modify the parser code around the purge command in
src/vaara/cli.py and change the backend construction
(SQLiteAuditBackend(db_path) -> SQLiteAuditBackend(db_path,
tenant_id=parsed_tenant)) or call purge_older_than(retention_seconds,
dry_run=args.dry_run, tenant_id=args.tenant) so the _tenant_clause() in
src/vaara/audit/sqlite_backend.py no longer falls back to "1=1" and only deletes
records for the requested tenant (also apply the same tenant wiring to the other
purge-related block around lines 258-274).
In `@src/vaara/policy/loader.py`:
- Around line 182-201: When reading input paths in the loader functions (e.g.
from_json and from_yaml) the code only converts FileNotFoundError to PolicyError
but other read_text failures (IsADirectoryError, PermissionError,
UnicodeDecodeError, generic OSError) currently leak; update the Path.read_text()
exception handlers to catch and normalize all file-read errors (catch a tuple
like (FileNotFoundError, IsADirectoryError, PermissionError, UnicodeDecodeError,
OSError) or simply OSError/Exception as appropriate) and raise
PolicyError(f"input is neither a JSON/YAML object nor a readable file path:
{source!r}") (or a clear message used elsewhere) from the original exception; do
the same normalization in both the JSON loader block around Path.read_text() and
the YAML loader block so callers never see raw OS/file exceptions (keep JSON
decoding still wrapped as PolicyError when json.loads fails).
- Around line 68-95: The loader currently accepts arbitrary keys under
thresholds.default and thresholds.<name> (in thr_block) and silently coerces
unknown keys; update the code that builds thresholds_default and
thresholds_overrides to validate allowed keys only ("escalate" and "deny") and
raise a PolicyError for any unknown key (include the path like
"thresholds.default: unknown key 'deni'" or "thresholds.<name>: unknown key
'denial'"). Specifically, in the handling of default_block and in the loop that
processes thr_block items (where thresholds_default, thresholds_overrides,
Thresholds(...), thr_block, and _require_mapping are used), check the
set(override.keys()) (and set(default_block.keys()) for default) against
{"escalate","deny"} before float coercion and raise PolicyError with a clear
message if extras are present.
In `@tests/adversarial/pair_v0_5_3.json`:
- Around line 6-8: The fixture metadata currently stores an ambiguous "model"
value ("qwen32b"); update tests/adversarial/pair_v0_5_3.json to record the full,
unambiguous attacker and judge model identifiers and the attacker-vs-judge
pairing so runs are reproducible: replace the single "model" field with explicit
keys (e.g., "attacker_model" and "judge_model" or a single "model_pair" string)
and ensure each entry in the "results" array references those full IDs so the
attacker/judge setup is preserved in the fixture metadata.
---
Duplicate comments:
In `@scripts/eval_stack_ablation.py`:
- Around line 168-173: The instantiation of AdversarialClassifier can raise
ImportError and must be inside the same try/except as the import; move the call
to AdversarialClassifier() into the try block that imports
vaara.adversarial_classifier and catch ImportError as exc, then raise SystemExit
with a clear message including exc (i.e., wrap both "from
vaara.adversarial_classifier import AdversarialClassifier" and "classifier =
AdversarialClassifier()" in the same try and keep the existing
print(f"[classifier] bundle v{classifier.bundle_version},
threshold={classifier.threshold}") after the try).
- Around line 62-65: The function expected_class currently defaults
missing/malformed expected values to "DENY" and silently classifies them as
"attack"; update expected_class to validate the entry["expected"] value(s)
(accepting either a string or list of strings, case-insensitive) and raise a
clear exception when expected is missing or contains any value outside the
allowed set {"ALLOW","DENY"} instead of defaulting, then keep the existing
bucketing logic (treat {"ALLOW"} -> "benign", otherwise "attack"); refer to the
expected_class function and the local variable expected/entry to implement this
validation and error handling.
- Around line 133-155: In report(), before printing or writing metrics, detect
any recorded errors by scanning counts (for each config in
("heuristic_only","classifier_only","full_stack") and each bucket c in
counts[config]) and if any c["err"] > 0 abort the run (raise SystemExit or call
sys.exit with non‑zero) after printing a short diagnostic message; do this check
early in the function (using the existing counts, config_name, c and err keys)
so you never print or write metrics to json_out when there were ERROR:*
decisions.
---
Nitpick comments:
In `@tests/test_audit_purge.py`:
- Around line 17-31: The helper _make_record builds AuditRecord objects but
computes record_hash with an empty previous_hash, breaking any chain when tests
call backend.write_record() directly; instead seed tests with a proper chain by
either creating/appending records through AuditTrail (use
AuditTrail.append_record or whatever method ensures previous_hash is set) or, if
creating records manually, set each record.previous_hash to the prior
record.record_hash before calling rec.record_hash = rec.compute_hash() so the
linked hash chain is intact for purge boundary assertions (refer to AuditRecord,
compute_hash, previous_hash, record_hash, backend.write_record, and AuditTrail).
In `@tests/test_transparency_taxonomy.py`:
- Around line 128-167: The test currently doesn't prove the v2→v3 migration
actually added the four transparency columns; after opening the DB with
SQLiteAuditBackend in test_migration_from_v2_to_v3, query the migrated schema
(e.g., PRAGMA table_info('audit_records')) and assert that columns
'system_operation', 'data_usage', 'decision_making', and 'limitations' are
present, or perform a post-migration write/read round-trip via
SQLiteAuditBackend (insert a record with those fields set, then reload it) to
fail if the columns were never created; reference the test function
test_migration_from_v2_to_v3 and the SQLiteAuditBackend/_row_to_record code
paths so the assertion validates the actual schema change rather than relying on
None defaults.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 15c3473a-5cc9-4516-a1b8-7d0a4005b051
📒 Files selected for processing (22)
.gitignoreCHANGELOG.mdCOMPLIANCE.mdexamples/policies/full.yamlexamples/policies/minimal.jsonpyproject.tomlscripts/eval_distribution_shift.pyscripts/eval_pair_attack.pyscripts/eval_stack_ablation.pysrc/vaara/__init__.pysrc/vaara/audit/sqlite_backend.pysrc/vaara/audit/trail.pysrc/vaara/cli.pysrc/vaara/policy/__init__.pysrc/vaara/policy/loader.pysrc/vaara/policy/schema.pytests/adversarial/distribution_shift_v0_5_3.jsontests/adversarial/pair_v0_5_3.jsontests/adversarial/stack_ablation_v0_5_3.jsontests/test_audit_purge.pytests/test_policy.pytests/test_transparency_taxonomy.py
CodeRabbit's @Full review surfaced 10 actionable items spanning audit data integrity, the public policy loader API, the v0.6 reproducibility scripts, and a CHANGELOG wording bug. All real, all addressed in one commit on top of the lint sweep (b569556). Audit data integrity (Major): - audit/sqlite_backend.py:_run_migrations was unsafe under partial failure. Each statement ran with `except Exception` swallowing every error before the schema_version got bumped unconditionally at the end. A real ALTER TABLE failure mid-migration left the DB marked at the new version while missing columns, then write_record() blew up on next insert. Fixed: only OperationalError messages matching "duplicate column" or "already exists" are swallowed (the idempotent- re-run case); everything else propagates. schema_version bumped per-version inside the loop, so a v2→v3 success followed by v3→v4 failure leaves the DB correctly at v3 instead of stuck at v2 or falsely advanced to v4. Regression test exercises a DB with partial pre-existing columns to confirm idempotency. - cli.py: `vaara trail purge` previously instantiated the backend with no tenant_id, which routes to _tenant_clause() == "1=1". On a shared multi-tenant audit DB that means every tenant's old records get deleted by anyone running the command. Tenant scoping is now required: the parser takes a mutually-exclusive --tenant TID or --all-tenants choice, both backed by an explicit decision. Policy loader public-API hygiene (Major): - policy/loader.py: unknown threshold keys now rejected. A typo like thresholds.default.deni would float-coerce, store, then silently fall back to default at threshold_for() query time. New _reject_unknown_keys() helper validates default and per-action override blocks against {escalate, deny}. - policy/loader.py: read-error normalisation. from_json() previously caught only FileNotFoundError; from_yaml() caught nothing. Public callers of either could see raw IsADirectoryError, PermissionError, UnicodeDecodeError, or generic OSError. New _read_policy_text() helper wraps Path.read_text() and translates the full set into PolicyError with a path-specific message. v0.6 reproducibility scripts (Major): - eval_stack_ablation.full_stack() previously called heuristic_only() again, double-running pipe.intercept() per entry and advancing sequence/agent-history state on the second pass. The published v0.6 ablation table assumes one intercept call per entry per config. Now full_stack takes the precomputed heuristic and classifier decisions and derives the upgrade rule from those, so the loop in evaluate() runs each pipeline path exactly once per entry. - eval_distribution_shift.py + eval_stack_ablation.py: per-entry agent_id derived from entry id (or python id() fallback) instead of collapsing every sample to a constant "adv". Sequence/agent-history state stays isolated across samples, matching corpus-measurement semantics. Same anti-pattern fixed in both scripts for consistency. - eval_pair_attack.py: classifier import is now a hard requirement. Silent fallback to heuristic-only meant a default invocation of the script wouldn't reproduce the published full-stack ASR — anyone re-running could legitimately report "ASR=N%" against a different stack and not notice. Now raises SystemExit if vaara[ml] is absent. Reproducibility metadata (Major): - tests/adversarial/pair_v0_5_3.json: model field expanded to attacker_model + judge_model + model_alias so a downstream auditor can see exactly what ran. Both attacker and judge were Qwen2.5-32B- Instruct per CHANGELOG / COMPLIANCE; the alias "qwen32b" stays for back-compat with any downstream code keying on it. Minor: - CHANGELOG.md v0.6.0 entry: removed misleading "vaara trail verify reports a chain break at the retention boundary" wording — that subcommand only validates signed zip bundles, not the live DB. Replaced with "the seam is exposed at load via hash mismatch", and the bullet now documents the new --tenant / --all-tenants gate. - eval_pair_attack.py: --model default flipped from "llama" to "Qwen2.5-32B-Instruct" so a default invocation matches the documented v0.6 calibration. Tests: - 6 new cases (4 in test_policy, 1 in test_transparency_taxonomy partial-migration regression, 1 implicit via existing audit_purge). - Full suite 310 passed / 12 skipped, ruff + bandit + mypy clean. - scripts/lint_full.sh exits PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
v0.5.x closed the capability axis (jailbreak coverage, classifier rebalance). v0.6 is the legibility axis: policies become readable, audit records become standards-aligned, adversarial numbers get split by source, and architecture contribution gets documented.
Seven feature commits plus the bump:
vaara.policypackage. JSON-native loader, optional[yaml]extra, frozen dataclasses, hand-rolled validation with field-path errors. Reusesvaara.taxonomy.actionsenums verbatim. Implements Sketch A from the v0.6 DSL design exploration. Embedded-Python (B) and standalone DSL (C) stay deferred to v0.7+ pending external pull.vaara trail purgeCLI +SQLiteAuditBackend.purge_older_thanAPI. Article 12(2) retention, tenant-scoped. Hash-chain integrity preserved with a documented seam at the retention boundary. Intended workflow: signed handoff zip first, archive externally, then purge.AuditRecord. Four optional fields (system_operation,data_usage,decision_making,limitations) with default-classification heuristic perEventTypeand per-record override. Excluded fromrecord_hashso pre-v0.6 chains stay valid. v0.7+ may add a separate signing mechanism if compliance requires.Schema migration
Audit DB v2 to v3 adds four nullable transparency columns. Pre-v0.6 records get NULL for the new columns. Their stored
record_hashis preserved (NOT re-hashed on load), so historical chain verification continues to work.Test plan
pytest -q— 296 passed, 12 skipped (vaara[ml]extras path)ruff check src/ scripts/ tests/cleanvaara.__version__reads0.6.0[0.6.0] - 2026-04-27entry presenttests/adversarial/distribution_shift_v0_5_3.json,stack_ablation_v0_5_3.json,pair_v0_5_3.jsonDeferred to v0.7+
Summary by CodeRabbit
New Features
Documentation
Chores
Tests