Skip to content

feat: v0.6.0 standards alignment + legibility#42

Merged
vaaraio merged 16 commits into
mainfrom
feat/v0.6-policy-schema
Apr 27, 2026
Merged

feat: v0.6.0 standards alignment + legibility#42
vaaraio merged 16 commits into
mainfrom
feat/v0.6-policy-schema

Conversation

@vaaraio
Copy link
Copy Markdown
Owner

@vaaraio vaaraio commented Apr 27, 2026

Summary

v0.5.x closed the capability axis (jailbreak coverage, classifier rebalance). v0.6 is the legibility axis: policies become readable, audit records become standards-aligned, adversarial numbers get split by source, and architecture contribution gets documented.

Seven feature commits plus the bump:

  • vaara.policy package. JSON-native loader, optional [yaml] extra, frozen dataclasses, hand-rolled validation with field-path errors. Reuses vaara.taxonomy.actions enums verbatim. Implements Sketch A from the v0.6 DSL design exploration. Embedded-Python (B) and standalone DSL (C) stay deferred to v0.7+ pending external pull.
  • vaara trail purge CLI + SQLiteAuditBackend.purge_older_than API. Article 12(2) retention, tenant-scoped. Hash-chain integrity preserved with a documented seam at the retention boundary. Intended workflow: signed handoff zip first, archive externally, then purge.
  • prEN ISO/IEC 12792 four-axis transparency taxonomy on AuditRecord. Four optional fields (system_operation, data_usage, decision_making, limitations) with default-classification heuristic per EventType and per-record override. Excluded from record_hash so pre-v0.6 chains stay valid. v0.7+ may add a separate signing mechanism if compliance requires.
  • Distribution-shift split. Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
  • Stack ablation. Heuristic-only 35% / 63%. Classifier-only 94% / 86%. Full-stack 97% / 98%. Layers not redundant. Heuristic catches a small attack set the classifier misses, which justifies the ensemble. Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
  • PAIR adaptive-attacker calibration. Qwen2.5-32B-Instruct as attacker + judge, 25 hand-curated jailbreak seeds, max 5 iterations: ASR 0.0% (0/25). NOT a claim of impervence to all adaptive attackers. A stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR.
  • COMPLIANCE.md gains EU AI Act Annex IV evidence-section mapping and CEN-CENELEC harmonised-standards alignment across ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792.

Schema migration

Audit DB v2 to v3 adds four nullable transparency columns. Pre-v0.6 records get NULL for the new columns. Their stored record_hash is preserved (NOT re-hashed on load), so historical chain verification continues to work.

Test plan

  • pytest -q — 296 passed, 12 skipped (vaara[ml] extras path)
  • ruff check src/ scripts/ tests/ clean
  • vaara.__version__ reads 0.6.0
  • CHANGELOG [0.6.0] - 2026-04-27 entry present
  • Reproducible artifacts: tests/adversarial/distribution_shift_v0_5_3.json, stack_ablation_v0_5_3.json, pair_v0_5_3.json
  • CI green
  • Reviewer eyeball pass

Deferred to v0.7+

  • prEN ISO/IEC 24970 field-alias layer (gated on standard publication)
  • DORA mapping refinement (gated on deployer signal)

Summary by CodeRabbit

  • New Features

    • JSON-native policy loader with optional YAML support; example policies added
    • Tenant-scoped audit-trail retention purge (CLI + backend) with dry-run
    • Expanded audit-record transparency metadata (four-axis taxonomy; excluded from hashing)
    • New evaluation scripts for distribution-shift, stack-ablation, and PAIR attacks
  • Documentation

    • Updated compliance docs with Annex IV mapping, standards alignment, measured v0.6 results, and migration guidance
    • Added CHANGELOG entry for v0.6.0
  • Chores

    • Version bumped to v0.6.0; optional YAML extra added; audit DB schema advanced (backward-compatible)
    • .gitignore updated to ignore research/ and .claude/
  • Tests

    • Extensive new test suites and reproducible evaluation artifacts added

vaaraio and others added 9 commits April 27, 2026 11:39
src/vaara/policy/
- New package with frozen dataclasses for action classes, thresholds,
  sequence patterns, and escalation routes
- from_dict / from_json (stdlib, zero-dep) and from_yaml (gated on
  the [yaml] extra; raises ImportError with install hint if missing)
- Hand-rolled validation with field-path error messages
  (e.g. "action_classes.x.category: 'bogus' is not one of [...]")
- Reuses ActionCategory / Reversibility / BlastRadius / UrgencyClass /
  RegulatoryDomain from vaara.taxonomy.actions — no duplication
- Threshold partial overrides: set just deny=0.75, inherit default
  escalate. Escalation routes match by article overlap with fallback
  to "on_call"

tests/test_policy.py
- 19 tests covering load paths, threshold resolution, escalation
  routing, validation errors, JSON / YAML loaders. All green.

examples/policies/
- minimal.json + full.yaml as reference policies

pyproject.toml
- New [yaml] optional extra. Core dependencies = [] preserved.

Implements Sketch A from research/dsl_design_exploration.md and
v0.6 roadmap item 8. Sketches B (embedded Python DSL) and C
(standalone DSL) stay deferred to v0.7+ pending external pull.
…d seam

src/vaara/audit/sqlite_backend.py
- New SQLiteAuditBackend.purge_older_than(retention_seconds, *, dry_run=False)
  Tenant-scoped DELETE of records older than now() - retention_seconds.
  Returns count deleted (or count that would be deleted in dry_run mode).
  Validates retention_seconds is a positive int.

src/vaara/cli.py
- New `vaara trail purge --db PATH --retention-days N [--dry-run]` subcommand
  Prints count purged, plus a one-line note about the hash-chain seam
  reminding the deployer to export a signed zip before future purges.

tests/test_audit_purge.py
- 10 tests: purge empty DB, mixed-age corpus, dry_run idempotence,
  validation rejects (zero, negative, non-int retention), survival
  of remaining records, tenant isolation across two backends sharing
  one DB file.

COMPLIANCE.md
- "Audit trail integrity" → "Retention policy" updated to document the
  new purge mechanism AND the hash-chain seam at the retention boundary.
  Intended workflow (export-then-purge) explicitly described.

HASH-CHAIN IMPACT (also in code docstring): surviving records reference
deleted predecessors via previous_hash, so vaara trail verify reports a
chain break at the retention boundary. The signed handoff zip remains
self-consistent forever; the live DB chain has a documented seam. The
v0.7+ severable-bundles design (research/severable_bundles_sketch.md)
eliminates the seam if needed.

Implements roadmap item 3 from research/roadmap.md. 287/287 non-skipped
tests pass; ruff clean.
Two new top-level sections cashing in the JTC21 WG2 deep-dive research
(see research/jtc21_wg2_deep_dive.md). Compliance teams now have
explicit pointers to where Vaara fits in two new dimensions of the
EU AI Act / harmonised-standards story.

EU AI Act Annex IV evidence sections
- Maps Vaara's contribution to each of the nine Annex IV sections
- Direct fill: §3 (monitoring), §5 (risk mgmt), §9 (post-market)
- Contributes: §2 (elements), §4 (metrics), §6 (changes), §7 (standards)
- Out of scope: §1 (general description), §8 (DoC document)

CEN-CENELEC harmonised standards alignment
- Per-standard table for the JTC21 WG2/3/4 work items Vaara aligns with:
  ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC
  24970, prEN 18229-1, prEN ISO/IEC 12792
- Status, working group, and Vaara alignment notes per standard
- Honest framing: pre-compliance positioning, not certified compliance.
  Once a standard publishes, expect a v0.6 / v0.7 alignment audit and
  an updated entry.

Both sections are reviewable independently from any code change.
This is the v0.6 standards-legibility story landing as a doc artifact.
…ecall

scripts/eval_distribution_shift.py
- New script: runs the full Vaara stack against the adversarial corpus
  with per-source tagging (hand_curated vs llm_generated)
- Reports attack recall (caught/total) and benign FPR (escalated/total)
  per source/class combination
- Loads top-level *.jsonl as hand-curated, generated/ + benign_generated/
  subdirs as LLM-generated. Same JSON schema, same stacking rule as
  scripts/eval_adversarial.py.

tests/adversarial/distribution_shift_v0_5_3.json
- Captured numbers for v0.5.3 stack against bundle v1.4 corpus.

COMPLIANCE.md "Current limits"
- Replaced the "owed in v0.6" bullet with the actual per-source split:
  - Hand-curated (held-out, 250):     attack recall 97.1%, benign FPR 70.0%
  - LLM-generated (in-sample, 5,705): attack recall 95.2%, benign FPR 87.5%
- Honest framing: hand-curated is held-out, LLM-generated is in-sample
  (was in classifier training). The 18pp benign-FPR gap is the dominant
  distribution-shift signal.
- Reconciles with CHANGELOG headline "global FPR 21.0%": that was
  classifier-alone 5-fold CV OOF. Full-stack FPR is heuristic-dominated
  (most benign escalations come from the heuristic ESCALATE branch).
- Notes that proper OOF split for the LLM-generated portion is a v0.7
  follow-up if the gap demands it.

Implements roadmap item 5 from research/roadmap.md. 287/287 tests pass;
ruff clean.
…rement

scripts/eval_stack_ablation.py
- New script: runs three configurations against the same corpus
  (hand-curated + LLM-generated, source-tagged):
    heuristic_only   pipe.intercept() with classifier=None
    classifier_only  classifier.score() + threshold, no heuristic
    full_stack       pipe.intercept() + classifier upgrade
- Reports recall (attacks) and FPR (benigns) per source/class.

tests/adversarial/stack_ablation_v0_5_3.json
- Captured numbers for v0.5.3 stack against bundle v1.4 corpus.
  Public artifact next to the existing eval results files.

COMPLIANCE.md "Current limits"
- New "Stack composition (v0.6 measurement)" bullet summarizing the
  layer-level findings:
    heuristic_only recall: 35% / 63% (hand-curated / LLM-generated)
    classifier_only recall: 94% / 86%
    full_stack recall: 97% / 98%
- Layers not redundant: heuristic catches some attacks classifier misses.
- Most full-stack benign FPR comes from heuristic ESCALATEs, not from
  classifier upgrades on heuristic-ALLOWed entries.

Honest framing:
- Heuristic alone is insufficient as adversarial defence (35% recall
  on hand-curated). Classifier carries most of the recall.
- Heuristic adds 6 unique attack catches on hand-curated (full-stack
  198 vs classifier-only 192), justifying the ensemble.
- Full-stack 70-91% benign FPR is the deliberate v0.5.3 trade-off.
  v0.7 needs operator-queue ergonomics or threshold-tuning recipes
  via the YAML policy schema (item 8) to manage escalate volume.

Implements roadmap item 6 from research/roadmap.md. 287/287 tests pass;
ruff clean.
src/vaara/audit/trail.py
- New TRANSPARENCY_DEFAULTS dict mapping each EventType to default
  values for the four 12792 axes (system_operation, data_usage,
  decision_making). 'limitations' stays None unless explicitly set —
  system-level constraints are recorded out-of-band, not per event.
- AuditRecord gains four optional Optional[str] fields. Default-fill
  via __post_init__, but ONLY for newly constructed records (record_hash
  empty). Loaded-from-DB records (record_hash non-empty) skip the fill —
  inventing defaults retroactively would misrepresent historical records
  written before v0.6.
- compute_hash() does NOT include transparency fields. They are metadata
  annotations, not tamper-evident. Trade-off explicitly documented in a
  NOTE comment. Pre-v0.6 chains stay valid because the hashable surface
  is unchanged. v0.7+ may add a separate signing mechanism if compliance
  teams require tamper-evident transparency tagging.

src/vaara/audit/sqlite_backend.py
- SCHEMA_VERSION 2 → 3.
- SCHEMA_SQL gains four nullable columns at end of audit_records:
  system_operation, data_usage, decision_making, limitations.
- _MIGRATIONS[2] adds them via ALTER TABLE on upgrade. Migration is
  non-destructive — pre-v0.6 rows get NULL for the new columns and
  their stored record_hash is preserved.
- write_record persists the new fields; _row_to_record reads them with
  defensive length-guarded indexing for old query result shapes.

tests/test_transparency_taxonomy.py
- 9 tests: per-EventType defaults, override, loaded-record skip-fill,
  hash-chain backward compat (compute_hash unchanged), SQLite round-trip,
  v2→v3 migration on a hand-rolled v2-shape DB.

COMPLIANCE.md
- "Harmonised standards alignment" prEN ISO/IEC 12792 row updated:
  v0.6 ships the tagging. Notes that fields are NOT tamper-evident in
  v0.6 and explains why (backward compat with pre-v0.6 chains).

Implements roadmap item 1 from research/roadmap.md. 296/296 tests pass
(was 287, +9 transparency tests). Ruff clean.
scripts/eval_pair_attack.py
- New script: PAIR (Chao et al. 2023) iterative attacker against the
  Vaara stack via vLLM endpoint. Same model in attacker + judge roles.
- Talks OpenAI-compatible chat-completions API (vLLM standard surface).
- Zero new runtime deps — uses urllib.request from stdlib. Requires
  vaara[ml] for the classifier.
- Args: --endpoint, --model, --seeds, --max-iters, --limit, --out.
- Per-seed: iterates max-iters times. Each iteration:
    1. Attacker LLM proposes a refined jailbreak prompt
    2. Vaara stack (heuristic + classifier) evaluates the candidate
    3. If Vaara ALLOWs, judge LLM confirms whether the candidate is a
       real jailbreak (not just a benign-sounding rephrase)
    4. Success = Vaara ALLOW + judge confirms. Otherwise iterate.
  Reports ASR (success rate over seeds).

tests/adversarial/pair_v0_5_3.json
- v0.5.3 stack measurement: Qwen2.5-32B-Instruct as attacker + judge,
  25 hand-curated jailbreak seeds, max 5 iterations per seed.
- ASR: 0.0% (0/25). 124 ESCALATE / 1 ALLOW across 125 candidates.
  The 1 ALLOW was a "draft an email for fictional company-wide
  announcement" candidate; judge ruled it not a successful jailbreak.

COMPLIANCE.md "Current limits"
- Replaced "owed in v0.6" PAIR placeholder with the actual measurement.
- Honest framing: NOT a claim of impervence to all adaptive attackers.
  Stronger attacker (70B+), longer iteration budgets, or alternate
  strategies (multi-turn drift, language-switch, obfuscation) might
  produce non-zero ASR. v0.7 follow-up if compliance audience requires
  harder calibration.

Implements roadmap item 4 from research/roadmap.md. Ran on rented
MI300x droplet (~$1.20 of $250 credit window — 32B sufficient for
this attacker tier; 70B reserved for v0.7 if needed). 296/296 tests
pass; ruff clean.
pyproject.toml: version bump.

src/vaara/__init__.py: __version__ string sync (was lagging at 0.5.3
after the bump commit; prior bumps updated this in lockstep — see
f08b5dd "version sync").

CHANGELOG.md: v0.6.0 entry. Theme: standards alignment + legibility.
Per-section breakdown of the seven feature commits + docs cash-in:

  Added
  - vaara.policy package (JSON + [yaml] extra)
  - vaara trail purge CLI + SqliteAuditBackend.purge_older_than
  - prEN ISO/IEC 12792 four-axis transparency taxonomy on AuditRecord
  - scripts/eval_distribution_shift.py
  - scripts/eval_stack_ablation.py
  - scripts/eval_pair_attack.py
  - [yaml] optional extra (pyyaml>=6.0); core dependencies = [] preserved
  - examples/policies/{minimal.json, full.yaml}
  - COMPLIANCE.md gains Annex IV evidence-section mapping and
    CEN-CENELEC harmonised-standards alignment tables

  Changed
  - Audit DB schema v2 -> v3 (transparency columns, nullable, migration
    preserves pre-v0.6 record hashes)
  - COMPLIANCE.md "Current limits" placeholders replaced with measured
    numbers: distribution-shift split, stack composition, PAIR ASR

  Deferred to v0.7+
  - prEN ISO/IEC 24970 field-alias layer (gated on standard publication)
  - DORA mapping refinement (gated on deployer signal)
Both have been persistently untracked but absent from .gitignore, so
every git status emitted noise. research/ is the local experimentation
scratch (droplet sync, generators, notebooks) that mirrors the
BD-docs-not-public convention already applied to docs/blog/ and
docs/grant/. .claude/ is local Claude Code state with no project-side
relevance.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds Vaara v0.6.0: JSON/YAML policy schema and loaders, policy examples, four-axis transparency taxonomy with SQLite audit schema bump v2→v3, tenant-scoped purge API + CLI, three adversarial evaluation scripts and fixtures, packaging yaml extra, compliance doc updates, and comprehensive tests.

Changes

Cohort / File(s) Summary
Version & Packaging
pyproject.toml, src/vaara/__init__.py
Bump package version to 0.6.0 and add optional yaml extra (pyyaml>=6.0).
Policy System
src/vaara/policy/schema.py, src/vaara/policy/loader.py, src/vaara/policy/__init__.py, examples/policies/minimal.json, examples/policies/full.yaml
Add frozen dataclass policy schema, PolicyError, Thresholds/Sequence/Escalation types, loader APIs (from_dict, from_json, from_yaml), JSON/YAML loading behavior, and example policies.
Audit Trail & CLI
src/vaara/audit/trail.py, src/vaara/audit/sqlite_backend.py, src/vaara/cli.py, tests/test_audit_purge.py, tests/test_transparency_taxonomy.py
Introduce four transparency fields on AuditRecord with per-EventType defaults, bump SQLite SCHEMA_VERSION 2→3 (nullable columns), add purge_older_than(...) (tenant-scoped, dry-run) and vaara trail purge CLI; include migration, semantics, and tenant-isolation tests.
Evaluation Scripts & Helpers
scripts/eval_distribution_shift.py, scripts/eval_pair_attack.py, scripts/eval_stack_ablation.py
Add three evaluator CLIs for distribution-shift, PAIR-style attacks, and stack ablation; integrate with Pipeline and optional AdversarialClassifier (requires vaara[ml]), produce JSON reports, and provide attacker/judge and scoring helpers.
Tests & Fixtures
tests/test_policy.py, tests/adversarial/*.json, tests/test_*
Add comprehensive policy loader tests (validation, immutability, JSON/YAML coverage) and adversarial evaluation fixtures/results; multiple new test modules for purge, transparency, and policy behavior.
Docs & Misc
CHANGELOG.md, COMPLIANCE.md, .gitignore
Add v0.6.0 changelog entry; expand COMPLIANCE.md with Annex IV mapping, retention/purge guidance and measured evaluation limits; update .gitignore to exclude research/ and .claude/.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(200,230,201,0.5)
    participant CLI as "User CLI"
    participant Backend as "SQLiteAuditBackend"
    participant DB as "SQLite DB"
    participant Logger as "Logger"
  end

  CLI->>Backend: vaara trail purge --retention-days N [--dry-run]
  Backend->>DB: open connection (tenant_id scoped)
  Backend->>DB: SELECT COUNT(*) WHERE ts < cutoff AND tenant_id = ?
  DB-->>Backend: would_delete_count
  alt dry_run
    Backend->>CLI: print "Would purge {count}" (dry-run)
    Backend->>DB: close
  else actual_purge
    Backend->>DB: DELETE FROM audit_records WHERE ts < cutoff AND tenant_id = ?
    DB-->>Backend: deleted_count
    Backend->>Logger: warn about hash-chain seam at retention boundary
    Backend->>CLI: print "Purged {count}" and retention warning
    Backend->>DB: close
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hop through policies, JSON and YAML in paw,
I tuck transparency fields where auditors awe,
I count what to purge (tenant-bound, tidy and dry),
I run three small scripts where crafty prompts try,
A carrot for tests — a tiny celebratory hop!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.01% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: v0.6.0 standards alignment + legibility' accurately reflects the main changes: a standards-aligned v0.6.0 release with policy schema, audit retention, transparency taxonomy, and compliance documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/v0.6-policy-schema

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread tests/test_transparency_taxonomy.py Fixed
Comment thread tests/test_transparency_taxonomy.py Fixed
Comment thread tests/test_transparency_taxonomy.py Fixed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
tests/test_audit_purge.py (1)

14-31: Consider using a fixture-scoped counter instead of global state.

The global _record_seq counter persists across test runs within a session, which could theoretically cause issues with test isolation if tests make assumptions about record ID patterns. However, since each test gets a fresh SQLite database via the backend fixture and record IDs include the timestamp, this is unlikely to cause actual failures.

If you want stricter isolation, consider resetting the counter in a fixture or using uuid.uuid4() instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_audit_purge.py` around lines 14 - 31, The global _record_seq used
by _make_record creates shared state across tests; replace it with either a
fixture-scoped counter or remove it entirely by generating unique IDs (e.g., use
uuid.uuid4()) and/or reset the counter in a fixture so tests don't rely on
module-level mutation; update the _make_record implementation to use the new
generator (or fixture-injected counter) and remove or stop mutating the global
_record_seq to ensure test isolation.
scripts/eval_stack_ablation.py (1)

47-59: Verify source tagging logic.

Files directly under corpus_root/*.jsonl are tagged as hand_curated, while files under generated/ and benign_generated/ subdirectories are tagged as llm_generated. This assumes a specific directory structure. Consider documenting this expected layout in the docstring or validating that at least some files are found.

📝 Suggested documentation improvement
 def load_corpus(corpus_root: Path) -> list[dict]:
-    """Load both hand-curated and LLM-generated entries with source tags."""
+    """Load both hand-curated and LLM-generated entries with source tags.
+    
+    Expected layout:
+        corpus_root/*.jsonl           -> tagged as 'hand_curated'
+        corpus_root/generated/*.jsonl -> tagged as 'llm_generated'
+        corpus_root/benign_generated/*.jsonl -> tagged as 'llm_generated'
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_stack_ablation.py` around lines 47 - 59, The load_corpus
function currently tags files under corpus_root/*.jsonl as "hand_curated" and
those under generated/ and benign_generated/ as "llm_generated" without
documenting or validating that structure; update the load_corpus docstring to
explicitly describe the expected directory layout (top-level jsonl are
hand_curated, subdirs "generated" and "benign_generated" contain llm_generated)
and add a simple validation after collecting entries (using entries,
corpus_root, and _load_jsonl) to raise or warn if no files were found or if
expected subdirs are missing so callers get immediate feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 27: Replace the misspelled word "impervence" with "imperviousness" in the
sentence inside CHANGELOG.md that reads "NOT a claim of impervence to all
adaptive attackers ..." and also fix the identical typo in COMPLIANCE.md so both
files read "NOT a claim of imperviousness to all adaptive attackers ...".

In `@COMPLIANCE.md`:
- Line 320: Replace the misspelled word "impervence" with "imperviousness" in
the sentence that currently reads "capability level. NOT a claim of impervence
to all adaptive attackers" so it reads "... NOT a claim of imperviousness to all
adaptive attackers"; update only that token to correct the typo.

In `@scripts/eval_distribution_shift.py`:
- Around line 161-165: The AdversarialClassifier() constructor can raise
ImportError for missing optional ML deps but is instantiated outside the
try-except, so wrap both the import and the classifier construction in the same
try-except block (covering from "from vaara.adversarial_classifier import
AdversarialClassifier" through "classifier = AdversarialClassifier()") and on
ImportError raise SystemExit with a clear message that includes the caught
exception (exc) so missing dependency errors are handled the same as import
failures.

In `@scripts/eval_pair_attack.py`:
- Line 88: The current fallback to "ALLOW" when extracting decision from
Pipeline.intercept() is unsafe; change the fallback to a safe value like "DENY"
(or "UNKNOWN") and validate the extracted value against allowable decisions
before using it. Specifically, in the code around the decision =
str(getattr(result, "decision", "ALLOW")).upper() expression, replace the
default "ALLOW" with "DENY" (or "UNKNOWN"), and add a short check that result is
the expected type and that decision is in the set of valid decisions (e.g.,
{"ALLOW","DENY","UNKNOWN"}); if validation fails, set decision to the safe
default and emit a warning/log so malformed returns from Pipeline.intercept()
are treated as denied.
- Around line 43-54: The call_llm function currently assumes the response is
valid JSON with choices[0].message.content and will let
JSONDecodeError/KeyError/IndexError/TypeError escape and kill the run; wrap the
response parsing and indexing in a try/except that catches json.JSONDecodeError,
KeyError, IndexError, and TypeError, read the raw response body for context, and
convert these into a consistent per-seed error (e.g., raise a specific
LLMResponseError or re-raise as urllib.error.URLError/ValueError with a clear
message containing the raw body) so callers that already handle
URLError/TimeoutError will treat malformed responses as per-seed failures; apply
the same change to the similar parsing block around lines 120-137.

In `@src/vaara/cli.py`:
- Around line 186-189: The code fails to expand user home tilde for the DB path;
change how db_path is constructed so you call Path(args.db).expanduser() (e.g.,
db_path = Path(args.db).expanduser()) before the exists() check used in the CLI
(the db_path variable and args.db reference), so "--db '~/path/audit.db'" is
correctly resolved prior to printing the "audit DB not found" error.

In `@src/vaara/policy/loader.py`:
- Around line 73-77: The thresholds_overrides dict built from thr_block should
be validated immediately instead of deferring to Policy.threshold_for; for each
k, v in thr_block (skipping "default") iterate per-action override mapping (the
inner dict from thresholds_overrides) and attempt to construct or validate using
the Thresholds constructor/validator (or replicate its range/order checks) to
ensure values are floats in [0,1] and that escalate <= deny <= allow ordering
(or the exact invariants enforced by Thresholds); raise a clear exception on
invalid entries so malformed policies like {"foo": {"escalate": 0.9, "deny":
0.2}} are rejected during load rather than later in Policy.threshold_for().
- Around line 47-49: The from_dict() implementation must validate nested section
shapes before iterating/coercing: for each nested key (e.g., action_classes,
thresholds, escalation, sequences) check the raw value's type (mapping for
dict-like sections, list/sequence for lists, dict for nested mappings like
sequences[name].pattern) and raise PolicyError with the offending path (e.g.,
"action_classes.<name>: must be a mapping" or "thresholds: must be a sequence")
instead of calling .items(), .get(), or tuple(...) which can raise
AttributeError or coerce strings into tuples; update the parsing code in
from_dict() to perform these explicit isinstance checks and raise PolicyError
with clear path context for the other blocks referenced (the blocks currently
around the action_classes, thresholds, escalation, sequences parsing) before any
further processing.

In `@src/vaara/policy/schema.py`:
- Around line 1-5: The Policy dataclass is currently "frozen" but its mutable
dict fields (action_classes, thresholds_overrides) allow in-place mutation; in
Policy.__post_init__ wrap/replace these fields with immutable mapping proxies
(e.g., types.MappingProxyType) or deep-copied nested mapping proxies so nested
dicts are also protected, ensuring the frozen value-object guarantee; update
Policy.__post_init__ to import types and perform deep copy + MappingProxyType
conversion for action_classes and for each nested mapping inside
thresholds_overrides (and assign back to the dataclass fields inside
__post_init__ despite frozen dataclass by using object.__setattr__).

---

Nitpick comments:
In `@scripts/eval_stack_ablation.py`:
- Around line 47-59: The load_corpus function currently tags files under
corpus_root/*.jsonl as "hand_curated" and those under generated/ and
benign_generated/ as "llm_generated" without documenting or validating that
structure; update the load_corpus docstring to explicitly describe the expected
directory layout (top-level jsonl are hand_curated, subdirs "generated" and
"benign_generated" contain llm_generated) and add a simple validation after
collecting entries (using entries, corpus_root, and _load_jsonl) to raise or
warn if no files were found or if expected subdirs are missing so callers get
immediate feedback.

In `@tests/test_audit_purge.py`:
- Around line 14-31: The global _record_seq used by _make_record creates shared
state across tests; replace it with either a fixture-scoped counter or remove it
entirely by generating unique IDs (e.g., use uuid.uuid4()) and/or reset the
counter in a fixture so tests don't rely on module-level mutation; update the
_make_record implementation to use the new generator (or fixture-injected
counter) and remove or stop mutating the global _record_seq to ensure test
isolation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 745e85cf-842a-4c29-8490-c420a8dc25fc

📥 Commits

Reviewing files that changed from the base of the PR and between 45c3ee3 and 4c3e3c4.

📒 Files selected for processing (22)
  • .gitignore
  • CHANGELOG.md
  • COMPLIANCE.md
  • examples/policies/full.yaml
  • examples/policies/minimal.json
  • pyproject.toml
  • scripts/eval_distribution_shift.py
  • scripts/eval_pair_attack.py
  • scripts/eval_stack_ablation.py
  • src/vaara/__init__.py
  • src/vaara/audit/sqlite_backend.py
  • src/vaara/audit/trail.py
  • src/vaara/cli.py
  • src/vaara/policy/__init__.py
  • src/vaara/policy/loader.py
  • src/vaara/policy/schema.py
  • tests/adversarial/distribution_shift_v0_5_3.json
  • tests/adversarial/pair_v0_5_3.json
  • tests/adversarial/stack_ablation_v0_5_3.json
  • tests/test_audit_purge.py
  • tests/test_policy.py
  • tests/test_transparency_taxonomy.py

Comment thread CHANGELOG.md Outdated
- **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results:
- **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
- **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant — heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
- **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "impervence" → "imperviousness".

Same spelling error as in COMPLIANCE.md.

✏️ Proposed fix
-  **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger at...
+  **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger at...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of impervence to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR.
- **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR.
🧰 Tools
🪛 LanguageTool

[grammar] ~27-~27: Ensure spelling is correct
Context: ...ns: ASR 0.0% (0/25). NOT a claim of impervence to all adaptive attackers — stronger at...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` at line 27, Replace the misspelled word "impervence" with
"imperviousness" in the sentence inside CHANGELOG.md that reads "NOT a claim of
impervence to all adaptive attackers ..." and also fix the identical typo in
COMPLIANCE.md so both files read "NOT a claim of imperviousness to all adaptive
attackers ...".

Comment thread COMPLIANCE.md Outdated
Comment thread scripts/eval_distribution_shift.py Outdated
Comment thread scripts/eval_pair_attack.py Outdated
Comment thread scripts/eval_pair_attack.py Outdated
Comment thread src/vaara/cli.py Outdated
Comment thread src/vaara/policy/loader.py Outdated
Comment thread src/vaara/policy/loader.py Outdated
Comment thread src/vaara/policy/schema.py
vaaraio and others added 2 commits April 27, 2026 13:52
Three findings on PR #42:

- Line 33 (failure, false positive). `for event_type in EventType:`
  flagged as "non-iterable in for loop". EnumType metaclass provides
  __iter__ at runtime; CodeQL's static analyzer can't resolve it.
  Cast to list(EventType) — semantically identical, analyzer-friendly.
- Lines 110-128 and 162-173 (notices). SQLiteAuditBackend implements
  __enter__/__exit__ (sqlite_backend.py:817), so the try/finally
  backend.close() pattern was unnecessary. Refactored to with-blocks.

Tests still pass (9/9 in this file, 296/296 overall, 12 skipped on
vaara[ml] extras path).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the
v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on
constructor, #5 malformed-LLM-response handling) are deferred to a
follow-up eval-script hardening pass.

Major:

1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline
   result. Default fallback was "ALLOW", which would count a partial
   or malformed pipeline response as a successful jailbreak in ASR
   measurement. Now defaults to DENY when .decision is missing or
   unrecognised.

2. src/vaara/policy/loader.py — strict shape validation at every
   section boundary. Inputs like action_classes:[], thresholds:[],
   escalation:[], or sequences.foo.pattern:"abc" previously either
   AttributeError'd or got silently coerced (tuple("abc") =>
   ("a","b","c")). New _require_mapping() and _require_sequence()
   helpers raise PolicyError with the field path. String/bytes are
   rejected by _require_sequence to prevent character-tuple coercion.

3. src/vaara/policy/loader.py — per-action threshold overrides now
   validated at load time. A malformed override like
   {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly
   and only blew up when threshold_for("foo") was queried. Now we
   construct Thresholds with the merged-with-default values during
   load and surface PolicyError with the offending action class path.

4. src/vaara/policy/schema.py — Policy is now actually frozen.
   action_classes and thresholds_overrides were declared on a
   frozen dataclass but were plain dicts, so callers could do
   policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__
   now wraps both with MappingProxyType (nested override dicts too)
   and the field annotations switch to Mapping.

Minor:

5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness"
   in two places (the v0.6 PAIR calibration section).

6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now
   expanduser()s the path before the existence check.

Tests: 8 new cases in tests/test_policy.py cover the new validation
and immutability paths. Full suite 304 passed / 12 skipped, ruff
clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/eval_pair_attack.py (1)

43-54: ⚠️ Potential issue | 🟠 Major

Malformed LLM responses can still crash the evaluation run.

The parsing at line 54 can raise JSONDecodeError, KeyError, or IndexError if the API returns non-JSON or an unexpected response shape. These exceptions are not caught by the URLError/TimeoutError handlers in run_pair_one (lines 128-129, 142-143), causing the entire evaluation to terminate on a single bad response.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_pair_attack.py` around lines 43 - 54, call_llm currently assumes
the HTTP response is well-formed JSON and accesses
["choices"][0]["message"]["content"] directly, which can raise
JSONDecodeError/KeyError/IndexError and crash the run; wrap the response
reading/parsing in a try/except that catches json.JSONDecodeError, KeyError,
IndexError, and ValueError, and on failure raise a urllib.error.URLError (or
re-use an existing exception type handled by run_pair_one) with a clear message
that includes the raw response body for debugging; also defensively check the
HTTP response status (if available) and treat non-2xx as errors so
run_pair_one's existing URLError/TimeoutError handlers will catch malformed or
unexpected responses from call_llm.
🧹 Nitpick comments (1)
src/vaara/cli.py (1)

182-217: LGTM! The vaara trail purge implementation is solid.

The past review comment about expanduser() has been addressed (line 186). The function validates inputs, handles cleanup via try/finally, and provides appropriate user feedback including the hash-chain seam warning on stderr.

Minor style suggestion: Consider using the context manager protocol since SQLiteAuditBackend implements __enter__/__exit__ (per src/vaara/audit/sqlite_backend.py:816-821).

♻️ Optional refactor to use context manager
-    backend = SQLiteAuditBackend(db_path)
-    try:
-        count = backend.purge_older_than(retention_seconds, dry_run=args.dry_run)
-    finally:
-        backend.close()
+    with SQLiteAuditBackend(db_path) as backend:
+        count = backend.purge_older_than(retention_seconds, dry_run=args.dry_run)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/vaara/cli.py` around lines 182 - 217, Refactor _cmd_trail_purge to use
the context manager on SQLiteAuditBackend instead of manual try/finally: replace
creating backend = SQLiteAuditBackend(db_path) and the try/finally close pattern
with a with SQLiteAuditBackend(db_path) as backend: block, then call
backend.purge_older_than(retention_seconds, dry_run=args.dry_run) inside that
block and preserve all existing logic around args.dry_run, printing, and return
codes; this leverages SQLiteAuditBackend.__enter__/__exit__ while keeping
purge_older_than and the same error/print behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/eval_pair_attack.py`:
- Around line 4-7: The module-level docstring in scripts/eval_pair_attack.py
claims the run uses "Llama-3.3-70B as both attacker and judge" which conflicts
with COMPLIANCE.md (line 309) that documents "Qwen2.5-32B-Instruct (Apache
2.0)"; update the docstring text to accurately reflect the actual model used
(or, if the doc is wrong, update COMPLIANCE.md instead) so both sources match;
locate the module docstring in eval_pair_attack.py and replace the model string
to "Qwen2.5-32B-Instruct (Apache 2.0)" (or the correct model name you decide to
keep) and run a quick grep of the repository for other occurrences of
"Llama-3.3-70B" to ensure no stale references remain.

In `@tests/test_policy.py`:
- Around line 253-265: The module-level pytest.importorskip("yaml") causes all
tests in this file to be skipped when PyYAML is absent; move the import-or-skip
call into the YAML-specific tests (test_from_yaml_string and
test_from_yaml_path) so only those two tests are skipped. Specifically, remove
the top-level yaml = pytest.importorskip("yaml") and instead call yaml =
pytest.importorskip("yaml") at the start of each of test_from_yaml_string and
test_from_yaml_path (or use pytest.importorskip inside those functions and then
import yaml via pytest.importorskip), ensuring the yaml variable is available to
each test that needs it.

---

Duplicate comments:
In `@scripts/eval_pair_attack.py`:
- Around line 43-54: call_llm currently assumes the HTTP response is well-formed
JSON and accesses ["choices"][0]["message"]["content"] directly, which can raise
JSONDecodeError/KeyError/IndexError and crash the run; wrap the response
reading/parsing in a try/except that catches json.JSONDecodeError, KeyError,
IndexError, and ValueError, and on failure raise a urllib.error.URLError (or
re-use an existing exception type handled by run_pair_one) with a clear message
that includes the raw response body for debugging; also defensively check the
HTTP response status (if available) and treat non-2xx as errors so
run_pair_one's existing URLError/TimeoutError handlers will catch malformed or
unexpected responses from call_llm.

---

Nitpick comments:
In `@src/vaara/cli.py`:
- Around line 182-217: Refactor _cmd_trail_purge to use the context manager on
SQLiteAuditBackend instead of manual try/finally: replace creating backend =
SQLiteAuditBackend(db_path) and the try/finally close pattern with a with
SQLiteAuditBackend(db_path) as backend: block, then call
backend.purge_older_than(retention_seconds, dry_run=args.dry_run) inside that
block and preserve all existing logic around args.dry_run, printing, and return
codes; this leverages SQLiteAuditBackend.__enter__/__exit__ while keeping
purge_older_than and the same error/print behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 36120b96-bb11-4536-888b-061af7ccab76

📥 Commits

Reviewing files that changed from the base of the PR and between 902c5d5 and 11f0d11.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • COMPLIANCE.md
  • scripts/eval_pair_attack.py
  • src/vaara/cli.py
  • src/vaara/policy/loader.py
  • src/vaara/policy/schema.py
  • tests/test_policy.py
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/vaara/policy/loader.py

Comment thread scripts/eval_pair_attack.py Outdated
Comment thread tests/test_policy.py Outdated
Cleans up the two new findings from the 11f0d11 re-review plus the two
eval-script majors that were originally deferred. v0.6 surface fully
clean — no outstanding CodeRabbit advisories.

New findings on 11f0d11:

- tests/test_policy.py — module-level pytest.importorskip("yaml") was
  silently skipping the JSON / validation / immutability tests above
  it on a non-[yaml] install. Moved into per-test invocations so only
  the two yaml-specific tests skip when pyyaml is absent.
- scripts/eval_pair_attack.py — docstring referenced Llama-3.3-70B as
  attacker/judge, but the v0.6 calibration ran on Qwen2.5-32B-Instruct
  (per CHANGELOG and COMPLIANCE.md). Updated docstring.

Originally deferred, now landed for completeness:

- scripts/eval_distribution_shift.py — wrap AdversarialClassifier()
  constructor inside the same try/except as the import. Without ML
  extras, the constructor itself raises ImportError per its docstring,
  which previously bypassed the SystemExit path and crashed with a
  raw traceback.
- scripts/eval_pair_attack.py — handle malformed LLM responses as
  per-seed errors. New LLMResponseError raised from call_llm() when
  the body fails to parse as JSON, lacks `choices`, or lacks
  `message.content`. Caller's except tuple extended so the run keeps
  going across the remaining seeds instead of dying on one bad
  response.

Tests still 304 passed / 12 skipped, ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/eval_distribution_shift.py (1)

77-77: Consider defaulting to "ERROR:Unknown" for consistency with evaluate() bucketing.

When result.decision is missing, the fallback "UNKNOWN" won't match any of the outcome branches in evaluate() (lines 113-120), so the entry counts toward n but no outcome counter increments. This silently skews metrics.

For consistency with eval_pair_attack.py (which defaults to "DENY") and proper metric accounting, consider either:

  • Returning an error-like string: f"ERROR:MissingDecision"
  • Or treating it as "DENY" to fail closed
♻️ Option A: Count as error
-    decision = str(getattr(result, "decision", "UNKNOWN")).upper()
+    raw = getattr(result, "decision", None)
+    if raw is None:
+        return "ERROR:MissingDecision"
+    decision = str(raw).upper()
♻️ Option B: Fail closed (match eval_pair_attack.py)
-    decision = str(getattr(result, "decision", "UNKNOWN")).upper()
+    raw = getattr(result, "decision", None)
+    decision = str(raw).upper() if raw is not None else "DENY"
+    if decision not in {"ALLOW", "ESCALATE", "DENY"}:
+        decision = "DENY"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_distribution_shift.py` at line 77, The fallback for
result.decision should be an error-like string so evaluate() counts it in the
error bucket; change the assignment of the decision variable (currently decision
= str(getattr(result, "decision", "UNKNOWN")).upper()) to default to an ERROR
token (e.g., "ERROR:MissingDecision" or "ERROR:Unknown") instead of "UNKNOWN" so
the entry increments the error outcome in evaluate() and does not silently skew
metrics; update the decision assignment in scripts/eval_distribution_shift.py
(the decision variable) to use the chosen ERROR default and ensure it remains
uppercased.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_policy.py`:
- Around line 246-249: The test
test_from_json_unreadable_input_raises_policy_error is flaky because
from_json("not json at all") may hit a real file in the CWD; update the test to
use a guaranteed-missing tmp_path file instead: create a Path via the pytest
tmp_path fixture (e.g. tmp_path / "surely_missing.json"), ensure it does not
exist, and pass its string to from_json in the pytest.raises block so
from_json(...) deterministically raises PolicyError; keep the assertion text
match unchanged and reference the test function name and from_json call when
making the change.

---

Nitpick comments:
In `@scripts/eval_distribution_shift.py`:
- Line 77: The fallback for result.decision should be an error-like string so
evaluate() counts it in the error bucket; change the assignment of the decision
variable (currently decision = str(getattr(result, "decision",
"UNKNOWN")).upper()) to default to an ERROR token (e.g., "ERROR:MissingDecision"
or "ERROR:Unknown") instead of "UNKNOWN" so the entry increments the error
outcome in evaluate() and does not silently skew metrics; update the decision
assignment in scripts/eval_distribution_shift.py (the decision variable) to use
the chosen ERROR default and ensure it remains uppercased.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 9c645cf0-208b-46a9-bd88-b1cc548f247f

📥 Commits

Reviewing files that changed from the base of the PR and between 11f0d11 and 422f4d5.

📒 Files selected for processing (3)
  • scripts/eval_distribution_shift.py
  • scripts/eval_pair_attack.py
  • tests/test_policy.py

Comment thread tests/test_policy.py Outdated
…t skew

Two findings from the 11:23 UTC re-review:

- tests/test_policy.py:249 — Minor flakiness. The unreadable-input
  test passed `"not json at all"` to from_json, which would mistakenly
  succeed if a file with that exact name existed in the CWD. Switched
  to a tmp_path-based missing path so the test is deterministic.

- scripts/eval_distribution_shift.py:77 — Nitpick that CodeRabbit
  framed mildly but is functionally a measurement skew. Fallback
  "UNKNOWN" doesn't match any branch in evaluate() (deny / escalate
  / allow / error), so a missing result.decision would silently
  increment `n` without any outcome bucket, deflating recall and FPR
  reporting. Fail-closed pattern now matches eval_pair_attack.py
  (DENY for missing, DENY for unrecognised).

Tests still 304 passed / 12 skipped, ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/test_policy.py (1)

166-181: Consider parametrizing strict-shape mapping tests to reduce duplication.

These three tests share identical structure and could be consolidated with pytest.mark.parametrize for easier extension and maintenance.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_policy.py` around lines 166 - 181, Consolidate the three
near-identical tests into a single parametrized test using
pytest.mark.parametrize: iterate over the field names "action_classes",
"thresholds", and "escalation", build bad = {**MINIMAL_POLICY, field: []}, and
assert from_dict(bad) raises PolicyError with the appropriate regex f"{field}:
must be a mapping"; update or remove the original
test_action_classes_must_be_mapping_not_list,
test_thresholds_must_be_mapping_not_list, and
test_escalation_must_be_mapping_not_list accordingly so the single test covers
all three cases while still referencing MINIMAL_POLICY, from_dict, and
PolicyError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/eval_distribution_shift.py`:
- Around line 120-128: The code currently records ERROR outcomes into b["error"]
but still proceeds to compute and return metrics; update the post-counting logic
to abort the run if any bucket has b["error"] > 0: after the loop that
builds/upserts buckets and before the final return that computes percentages,
scan buckets for any nonzero "error" counts and raise a non-zero exit (e.g.,
raise SystemExit or RuntimeError) with a clear message listing the offending
src/cls keys and counts so the process fails instead of publishing invalid
percentages; apply this check to the same counting logic that uses variables
decision, b, buckets and the final return that spreads counts into the result.
- Around line 165-166: The current corpus validation only checks
corpus_root.exists(), which allows non-directories or empty loads to proceed;
update the validation to ensure corpus_root.is_dir() and after loading JSONL
compute the total entries (e.g., the variable tracking loaded entries or
total_entries) and raise SystemExit with a clear message if total_entries == 0;
modify the checks around corpus_root and the JSONL load logic (referencing
corpus_root and the variable that accumulates loaded entries) so the script
fails fast when the corpus path is not a directory or when no entries were
loaded.
- Around line 99-106: The function expected_category should validate the
entry["expected"] value instead of coercing unknown/missing values to "attack":
in expected_category, check that "expected" exists and is either a string
"ALLOW"/"DENY" (case-insensitive) or a non-empty list of such strings, normalize
to uppercase, and raise a ValueError with a descriptive message if the type or
contained labels are invalid or empty; then return "benign" only when the
normalized set is exactly {"ALLOW"} and "attack" when exactly {"DENY"} (or the
appropriate normalized set), so malformed fixtures fail fast (refer to function
expected_category and the local variables expected and expected_set).

---

Nitpick comments:
In `@tests/test_policy.py`:
- Around line 166-181: Consolidate the three near-identical tests into a single
parametrized test using pytest.mark.parametrize: iterate over the field names
"action_classes", "thresholds", and "escalation", build bad = {**MINIMAL_POLICY,
field: []}, and assert from_dict(bad) raises PolicyError with the appropriate
regex f"{field}: must be a mapping"; update or remove the original
test_action_classes_must_be_mapping_not_list,
test_thresholds_must_be_mapping_not_list, and
test_escalation_must_be_mapping_not_list accordingly so the single test covers
all three cases while still referencing MINIMAL_POLICY, from_dict, and
PolicyError.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a3a0254e-584c-4acd-9d61-a8d1ed7103ad

📥 Commits

Reviewing files that changed from the base of the PR and between 422f4d5 and a784831.

📒 Files selected for processing (2)
  • scripts/eval_distribution_shift.py
  • tests/test_policy.py

Comment thread scripts/eval_distribution_shift.py
Comment thread scripts/eval_distribution_shift.py
Comment thread scripts/eval_distribution_shift.py Outdated
Three findings on a784831, all in scripts/eval_distribution_shift.py:

- expected_category() (Major). Missing or invalid `expected` fields
  silently coerced to "DENY" then categorised as "attack", which
  would skew recall and FPR if a fixture had a typo. Now raises
  ValueError naming the offending entry id and the bad value(s).
  Allowed set: {ALLOW, DENY, ESCALATE}.

- report() (Major). Metrics were printed even when buckets recorded
  ERROR outcomes, producing publishable but invalid percentages.
  Aborts via SystemExit if any bucket has errors > 0.

- main() (Minor). corpus_root only checked for `.exists()` and the
  loop continued with zero entries (empty "successful" run). Now
  requires `.is_dir()` and aborts when zero JSONL entries load.

Smoke-tested expected_category locally: missing field, bogus value,
ALLOW, DENY all behave as expected. Full suite 304 passed / 12
skipped, ruff clean.

Note: the v0.6 published numbers came from runs where every entry had
a valid `expected` and zero pipeline errors, so this hardening doesn't
invalidate the existing artifacts. It just prevents future accidental
fixture drift from producing silent skew.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vaaraio
Copy link
Copy Markdown
Owner Author

vaaraio commented Apr 27, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

✅ Actions performed

Full review triggered.

Adds scripts/lint_full.sh as the pre-push hygiene gate so CodeRabbit-
class findings get caught locally before PR review round-trips.
Documented in CONTRIBUTING.md and listed in the v0.6.0 CHANGELOG.

Tools:
- ruff (already in [dev]) — style + correctness
- bandit>=1.7.5 (new) — security-focused static analysis
- mypy>=1.8 (new) — types; strict on vaara.policy, lenient on legacy
  via [[tool.mypy.overrides]]. Modules join the strict block as they
  get cleaned up, so the typing floor only ratchets upward.
- pytest (already)

Bandit configuration in pyproject.toml:
- skips=["B608"] with explanation. Every f-string SQL in
  audit/sqlite_backend.py interpolates only the output of
  _tenant_clause(), which returns one of two literal strings
  ("tenant_id = ?" or "1=1"). User values always go through `?`
  parameter binding. Bandit can't see the constraint, so the pattern
  trips B608 uniformly. False positive for our usage.
- Two inline `# nosec` annotations:
  - mc_dropout_gate.py:94 (B614) — trusted-bundle fallback after
    weights_only=True already failed because the bundle contains
    sklearn objects. Operator gets a logger.warning telling them to
    ensure provenance.
  - trace_gen.py:201 (B311) — random.Random for synthetic-trace
    sampling, not security-sensitive.

vaara.policy made strictly mypy-clean to seed the typing floor:
- _coerce_enum is now generic on the enum type (TypeVar EnumT bound
  to Enum) so call sites get the precise enum back rather than object.
- _require_mapping / _require_sequence get explicit object/dict/list
  annotations.
- The yaml import type-ignore narrowed to [import-untyped] (the
  actual code mypy emits) instead of [import-not-found].

Local sweep runtime: ~10s. Tests still 304 passed / 12 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (3)
scripts/eval_stack_ablation.py (3)

168-173: ⚠️ Potential issue | 🟠 Major

Wrap AdversarialClassifier() in the same try block as the import.

The constructor can still raise ImportError when optional ML deps are missing, so the current code can fall through to an unhandled traceback even after the import guard.

💡 Proposed fix
     try:
         from vaara.adversarial_classifier import AdversarialClassifier
+        classifier = AdversarialClassifier()
     except ImportError as exc:
         raise SystemExit(f"this script needs vaara[ml]: {exc}")
-    classifier = AdversarialClassifier()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_stack_ablation.py` around lines 168 - 173, The instantiation of
AdversarialClassifier can raise ImportError and must be inside the same
try/except as the import; move the call to AdversarialClassifier() into the try
block that imports vaara.adversarial_classifier and catch ImportError as exc,
then raise SystemExit with a clear message including exc (i.e., wrap both "from
vaara.adversarial_classifier import AdversarialClassifier" and "classifier =
AdversarialClassifier()" in the same try and keep the existing
print(f"[classifier] bundle v{classifier.bundle_version},
threshold={classifier.threshold}") after the try).

62-65: ⚠️ Potential issue | 🟠 Major

Validate expected labels before bucketing them.

Missing or malformed expected values are silently treated as "attack" here, which can skew both recall and benign FPR. This should fail fast instead of defaulting to "DENY".

💡 Proposed fix
 def expected_class(entry: dict) -> str:
-    expected = entry.get("expected", "DENY")
-    s = {str(x).upper() for x in (expected if isinstance(expected, list) else [expected])}
+    if "expected" not in entry:
+        raise ValueError("missing required field: expected")
+    expected = entry["expected"]
+    values = expected if isinstance(expected, list) else [expected]
+    s = {str(x).upper() for x in values}
+    allowed = {"ALLOW", "DENY", "ESCALATE"}
+    if not s or not s.issubset(allowed):
+        raise ValueError(f"invalid expected value(s): {sorted(s)}")
     return "benign" if s == {"ALLOW"} else "attack"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_stack_ablation.py` around lines 62 - 65, The function
expected_class currently defaults missing/malformed expected values to "DENY"
and silently classifies them as "attack"; update expected_class to validate the
entry["expected"] value(s) (accepting either a string or list of strings,
case-insensitive) and raise a clear exception when expected is missing or
contains any value outside the allowed set {"ALLOW","DENY"} instead of
defaulting, then keep the existing bucketing logic (treat {"ALLOW"} -> "benign",
otherwise "attack"); refer to the expected_class function and the local variable
expected/entry to implement this validation and error handling.

133-155: ⚠️ Potential issue | 🟠 Major

Abort the run before reporting metrics if any ERROR:* decisions occurred.

The script records err counts, but it still prints and optionally writes metrics afterward. That can publish invalid percentages from a partially failed run.

💡 Proposed fix
 def report(counts, json_out: Path | None) -> None:
     """Print per-config / per-source recall + FPR table."""
+    total_errors = sum(c["err"] for by_key in counts.values() for c in by_key.values())
+    if total_errors:
+        raise SystemExit(
+            f"aborting: {total_errors} ERROR outcome(s) present; "
+            "fix corpus/runtime issues before publishing metrics"
+        )
+
     rows = []
     print("\n=== Stack ablation (v0.5.3) ===")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/eval_stack_ablation.py` around lines 133 - 155, In report(), before
printing or writing metrics, detect any recorded errors by scanning counts (for
each config in ("heuristic_only","classifier_only","full_stack") and each bucket
c in counts[config]) and if any c["err"] > 0 abort the run (raise SystemExit or
call sys.exit with non‑zero) after printing a short diagnostic message; do this
check early in the function (using the existing counts, config_name, c and err
keys) so you never print or write metrics to json_out when there were ERROR:*
decisions.
🧹 Nitpick comments (2)
tests/test_audit_purge.py (1)

17-31: Seed purge tests through a real hash chain.

_make_record() computes standalone hashes with an empty previous_hash, so any sequence written directly with backend.write_record() is already chain-broken before purge. That makes it impossible for this module to assert the documented retention-boundary seam separately from setup artifacts. Consider seeding via AuditTrail or linking previous_hash/record_hash explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_audit_purge.py` around lines 17 - 31, The helper _make_record
builds AuditRecord objects but computes record_hash with an empty previous_hash,
breaking any chain when tests call backend.write_record() directly; instead seed
tests with a proper chain by either creating/appending records through
AuditTrail (use AuditTrail.append_record or whatever method ensures
previous_hash is set) or, if creating records manually, set each
record.previous_hash to the prior record.record_hash before calling
rec.record_hash = rec.compute_hash() so the linked hash chain is intact for
purge boundary assertions (refer to AuditRecord, compute_hash, previous_hash,
record_hash, backend.write_record, and AuditTrail).
tests/test_transparency_taxonomy.py (1)

128-167: Make the migration test prove the columns were actually added.

_row_to_record() now treats missing row[12:16] as None, so this test still passes even if the v2→v3 migration never creates the four transparency columns. Please assert the schema contains those columns, or do a post-migration write/read round-trip that would fail on a partially upgraded DB.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_transparency_taxonomy.py` around lines 128 - 167, The test
currently doesn't prove the v2→v3 migration actually added the four transparency
columns; after opening the DB with SQLiteAuditBackend in
test_migration_from_v2_to_v3, query the migrated schema (e.g., PRAGMA
table_info('audit_records')) and assert that columns 'system_operation',
'data_usage', 'decision_making', and 'limitations' are present, or perform a
post-migration write/read round-trip via SQLiteAuditBackend (insert a record
with those fields set, then reload it) to fail if the columns were never
created; reference the test function test_migration_from_v2_to_v3 and the
SQLiteAuditBackend/_row_to_record code paths so the assertion validates the
actual schema change rather than relying on None defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 13-14: Update the CHANGELOG wording to avoid implying "vaara trail
verify" validates the live DB after purge: clarify that src/vaara/cli.py's vaara
trail verify only validates signed zip bundles (the exported handoff) and that
those bundles remain self-consistent, while the live DB/trail will show a
deliberate chain seam where deleted records are referenced via previous_hash;
recommend exporting a signed handoff zip and using vaara trail verify on that
bundle, and explicitly state that the live DB chain break at the retention
boundary is expected and not validated by vaara trail verify.

In `@scripts/eval_distribution_shift.py`:
- Around line 69-73: The call to pipe.intercept is collapsing missing agent_id
values to the constant "adv", causing unrelated samples to share history; change
the fallback for agent_id in the pipe.intercept call (where
agent_id=entry.get("agent_id", "adv")) to produce a unique per-sample id (for
example derive from an existing entry unique field, use the loop index, or
generate a uuid) so each sample without an explicit agent_id gets its own
distinct identifier instead of "adv".

In `@scripts/eval_pair_attack.py`:
- Around line 161-163: The --model argument currently defaults to "llama" which
can cause silent mismatches; update the ap.add_argument for "--model" (the
parser setup in scripts/eval_pair_attack.py) to make the served model explicit
by either setting required=True or changing the default to the documented
"Qwen2.5-32B-Instruct", and update the help string to mention it must match the
vLLM --served-model-name to ensure reproducible evaluations.
- Around line 173-181: The ImportError branch currently falls back silently to
heuristic-only behavior, making published PAIR runs non-reproducible; change the
logic around importing AdversarialClassifier so that if AdversarialClassifier
cannot be imported or instantiation fails, the script aborts with a clear
error/exit (instead of setting classifier = None) and a message instructing the
user to install vaara[ml]; update the block that currently mentions
AdversarialClassifier and classifier (and the subsequent Pipeline usage) to
validate classifier is available before proceeding and exit/raise with a
descriptive error if not.

In `@scripts/eval_stack_ablation.py`:
- Around line 95-102: full_stack currently re-invokes heuristic_only(entry,
pipe) which can mutate sequence/agent-history state and make comparisons
invalid; change full_stack so it does NOT call heuristic_only again — instead
either accept the heuristic result as an extra parameter (e.g., def
full_stack(entry: dict, pipe: Pipeline, classifier, heuristic_result: str) ->
str) or ensure the caller computes h = heuristic_only(...) once and passes it
in; then use that h and only call classifier_only(entry, classifier) when h ==
"ALLOW". Apply the same fix to the other occurrence referenced (the block around
lines 121-125) so heuristic_only is computed once and reused rather than re-run.

In `@src/vaara/audit/sqlite_backend.py`:
- Around line 137-142: The migration logic in _run_migrations() must not
unconditionally set schema_version to 3 after any exception during the ALTER
TABLE block; change it so that the ALTERs for adding system_operation,
data_usage, decision_making, and limitations are executed atomically (either run
them inside a single transaction or run each ALTER and only mark success after
all four succeed) and ensure that on any exception you do not update
schema_version and instead surface the error (or return a failure) so
write_record() won't assume the columns exist; update _run_migrations() to only
update schema_version when all ALTERs complete without error and reference the
ALTER statements and the schema_version update in that function to implement
this behavior.

In `@src/vaara/cli.py`:
- Around line 199-201: The purge command currently instantiates
SQLiteAuditBackend(db_path) without a tenant, causing purge_older_than to act
globally; update the CLI parser for "vaara trail purge" to accept a tenant
override (e.g., --tenant or --tenant-id) and pass that tenant into the audit
backend instantiation or directly into purge_older_than so the tenant-scoped
clause is used; specifically modify the parser code around the purge command in
src/vaara/cli.py and change the backend construction
(SQLiteAuditBackend(db_path) -> SQLiteAuditBackend(db_path,
tenant_id=parsed_tenant)) or call purge_older_than(retention_seconds,
dry_run=args.dry_run, tenant_id=args.tenant) so the _tenant_clause() in
src/vaara/audit/sqlite_backend.py no longer falls back to "1=1" and only deletes
records for the requested tenant (also apply the same tenant wiring to the other
purge-related block around lines 258-274).

In `@src/vaara/policy/loader.py`:
- Around line 182-201: When reading input paths in the loader functions (e.g.
from_json and from_yaml) the code only converts FileNotFoundError to PolicyError
but other read_text failures (IsADirectoryError, PermissionError,
UnicodeDecodeError, generic OSError) currently leak; update the Path.read_text()
exception handlers to catch and normalize all file-read errors (catch a tuple
like (FileNotFoundError, IsADirectoryError, PermissionError, UnicodeDecodeError,
OSError) or simply OSError/Exception as appropriate) and raise
PolicyError(f"input is neither a JSON/YAML object nor a readable file path:
{source!r}") (or a clear message used elsewhere) from the original exception; do
the same normalization in both the JSON loader block around Path.read_text() and
the YAML loader block so callers never see raw OS/file exceptions (keep JSON
decoding still wrapped as PolicyError when json.loads fails).
- Around line 68-95: The loader currently accepts arbitrary keys under
thresholds.default and thresholds.<name> (in thr_block) and silently coerces
unknown keys; update the code that builds thresholds_default and
thresholds_overrides to validate allowed keys only ("escalate" and "deny") and
raise a PolicyError for any unknown key (include the path like
"thresholds.default: unknown key 'deni'" or "thresholds.<name>: unknown key
'denial'"). Specifically, in the handling of default_block and in the loop that
processes thr_block items (where thresholds_default, thresholds_overrides,
Thresholds(...), thr_block, and _require_mapping are used), check the
set(override.keys()) (and set(default_block.keys()) for default) against
{"escalate","deny"} before float coercion and raise PolicyError with a clear
message if extras are present.

In `@tests/adversarial/pair_v0_5_3.json`:
- Around line 6-8: The fixture metadata currently stores an ambiguous "model"
value ("qwen32b"); update tests/adversarial/pair_v0_5_3.json to record the full,
unambiguous attacker and judge model identifiers and the attacker-vs-judge
pairing so runs are reproducible: replace the single "model" field with explicit
keys (e.g., "attacker_model" and "judge_model" or a single "model_pair" string)
and ensure each entry in the "results" array references those full IDs so the
attacker/judge setup is preserved in the fixture metadata.

---

Duplicate comments:
In `@scripts/eval_stack_ablation.py`:
- Around line 168-173: The instantiation of AdversarialClassifier can raise
ImportError and must be inside the same try/except as the import; move the call
to AdversarialClassifier() into the try block that imports
vaara.adversarial_classifier and catch ImportError as exc, then raise SystemExit
with a clear message including exc (i.e., wrap both "from
vaara.adversarial_classifier import AdversarialClassifier" and "classifier =
AdversarialClassifier()" in the same try and keep the existing
print(f"[classifier] bundle v{classifier.bundle_version},
threshold={classifier.threshold}") after the try).
- Around line 62-65: The function expected_class currently defaults
missing/malformed expected values to "DENY" and silently classifies them as
"attack"; update expected_class to validate the entry["expected"] value(s)
(accepting either a string or list of strings, case-insensitive) and raise a
clear exception when expected is missing or contains any value outside the
allowed set {"ALLOW","DENY"} instead of defaulting, then keep the existing
bucketing logic (treat {"ALLOW"} -> "benign", otherwise "attack"); refer to the
expected_class function and the local variable expected/entry to implement this
validation and error handling.
- Around line 133-155: In report(), before printing or writing metrics, detect
any recorded errors by scanning counts (for each config in
("heuristic_only","classifier_only","full_stack") and each bucket c in
counts[config]) and if any c["err"] > 0 abort the run (raise SystemExit or call
sys.exit with non‑zero) after printing a short diagnostic message; do this check
early in the function (using the existing counts, config_name, c and err keys)
so you never print or write metrics to json_out when there were ERROR:*
decisions.

---

Nitpick comments:
In `@tests/test_audit_purge.py`:
- Around line 17-31: The helper _make_record builds AuditRecord objects but
computes record_hash with an empty previous_hash, breaking any chain when tests
call backend.write_record() directly; instead seed tests with a proper chain by
either creating/appending records through AuditTrail (use
AuditTrail.append_record or whatever method ensures previous_hash is set) or, if
creating records manually, set each record.previous_hash to the prior
record.record_hash before calling rec.record_hash = rec.compute_hash() so the
linked hash chain is intact for purge boundary assertions (refer to AuditRecord,
compute_hash, previous_hash, record_hash, backend.write_record, and AuditTrail).

In `@tests/test_transparency_taxonomy.py`:
- Around line 128-167: The test currently doesn't prove the v2→v3 migration
actually added the four transparency columns; after opening the DB with
SQLiteAuditBackend in test_migration_from_v2_to_v3, query the migrated schema
(e.g., PRAGMA table_info('audit_records')) and assert that columns
'system_operation', 'data_usage', 'decision_making', and 'limitations' are
present, or perform a post-migration write/read round-trip via
SQLiteAuditBackend (insert a record with those fields set, then reload it) to
fail if the columns were never created; reference the test function
test_migration_from_v2_to_v3 and the SQLiteAuditBackend/_row_to_record code
paths so the assertion validates the actual schema change rather than relying on
None defaults.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 15c3473a-5cc9-4516-a1b8-7d0a4005b051

📥 Commits

Reviewing files that changed from the base of the PR and between 45c3ee3 and e4d8d69.

📒 Files selected for processing (22)
  • .gitignore
  • CHANGELOG.md
  • COMPLIANCE.md
  • examples/policies/full.yaml
  • examples/policies/minimal.json
  • pyproject.toml
  • scripts/eval_distribution_shift.py
  • scripts/eval_pair_attack.py
  • scripts/eval_stack_ablation.py
  • src/vaara/__init__.py
  • src/vaara/audit/sqlite_backend.py
  • src/vaara/audit/trail.py
  • src/vaara/cli.py
  • src/vaara/policy/__init__.py
  • src/vaara/policy/loader.py
  • src/vaara/policy/schema.py
  • tests/adversarial/distribution_shift_v0_5_3.json
  • tests/adversarial/pair_v0_5_3.json
  • tests/adversarial/stack_ablation_v0_5_3.json
  • tests/test_audit_purge.py
  • tests/test_policy.py
  • tests/test_transparency_taxonomy.py

Comment thread CHANGELOG.md Outdated
Comment thread scripts/eval_distribution_shift.py
Comment thread scripts/eval_pair_attack.py
Comment thread scripts/eval_pair_attack.py Outdated
Comment thread scripts/eval_stack_ablation.py Outdated
Comment thread src/vaara/audit/sqlite_backend.py
Comment thread src/vaara/cli.py Outdated
Comment thread src/vaara/policy/loader.py
Comment thread src/vaara/policy/loader.py
Comment thread tests/adversarial/pair_v0_5_3.json
CodeRabbit's @Full review surfaced 10 actionable items spanning audit
data integrity, the public policy loader API, the v0.6 reproducibility
scripts, and a CHANGELOG wording bug. All real, all addressed in one
commit on top of the lint sweep (b569556).

Audit data integrity (Major):

- audit/sqlite_backend.py:_run_migrations was unsafe under partial
  failure. Each statement ran with `except Exception` swallowing every
  error before the schema_version got bumped unconditionally at the
  end. A real ALTER TABLE failure mid-migration left the DB marked at
  the new version while missing columns, then write_record() blew up
  on next insert. Fixed: only OperationalError messages matching
  "duplicate column" or "already exists" are swallowed (the idempotent-
  re-run case); everything else propagates. schema_version bumped
  per-version inside the loop, so a v2→v3 success followed by v3→v4
  failure leaves the DB correctly at v3 instead of stuck at v2 or
  falsely advanced to v4. Regression test exercises a DB with
  partial pre-existing columns to confirm idempotency.

- cli.py: `vaara trail purge` previously instantiated the backend with
  no tenant_id, which routes to _tenant_clause() == "1=1". On a
  shared multi-tenant audit DB that means every tenant's old records
  get deleted by anyone running the command. Tenant scoping is now
  required: the parser takes a mutually-exclusive --tenant TID or
  --all-tenants choice, both backed by an explicit decision.

Policy loader public-API hygiene (Major):

- policy/loader.py: unknown threshold keys now rejected. A typo like
  thresholds.default.deni would float-coerce, store, then silently
  fall back to default at threshold_for() query time. New
  _reject_unknown_keys() helper validates default and per-action
  override blocks against {escalate, deny}.

- policy/loader.py: read-error normalisation. from_json() previously
  caught only FileNotFoundError; from_yaml() caught nothing. Public
  callers of either could see raw IsADirectoryError, PermissionError,
  UnicodeDecodeError, or generic OSError. New _read_policy_text()
  helper wraps Path.read_text() and translates the full set into
  PolicyError with a path-specific message.

v0.6 reproducibility scripts (Major):

- eval_stack_ablation.full_stack() previously called heuristic_only()
  again, double-running pipe.intercept() per entry and advancing
  sequence/agent-history state on the second pass. The published v0.6
  ablation table assumes one intercept call per entry per config.
  Now full_stack takes the precomputed heuristic and classifier
  decisions and derives the upgrade rule from those, so the loop in
  evaluate() runs each pipeline path exactly once per entry.

- eval_distribution_shift.py + eval_stack_ablation.py: per-entry
  agent_id derived from entry id (or python id() fallback) instead of
  collapsing every sample to a constant "adv". Sequence/agent-history
  state stays isolated across samples, matching corpus-measurement
  semantics. Same anti-pattern fixed in both scripts for consistency.

- eval_pair_attack.py: classifier import is now a hard requirement.
  Silent fallback to heuristic-only meant a default invocation of the
  script wouldn't reproduce the published full-stack ASR — anyone
  re-running could legitimately report "ASR=N%" against a different
  stack and not notice. Now raises SystemExit if vaara[ml] is absent.

Reproducibility metadata (Major):

- tests/adversarial/pair_v0_5_3.json: model field expanded to
  attacker_model + judge_model + model_alias so a downstream auditor
  can see exactly what ran. Both attacker and judge were Qwen2.5-32B-
  Instruct per CHANGELOG / COMPLIANCE; the alias "qwen32b" stays for
  back-compat with any downstream code keying on it.

Minor:

- CHANGELOG.md v0.6.0 entry: removed misleading "vaara trail verify
  reports a chain break at the retention boundary" wording — that
  subcommand only validates signed zip bundles, not the live DB.
  Replaced with "the seam is exposed at load via hash mismatch", and
  the bullet now documents the new --tenant / --all-tenants gate.

- eval_pair_attack.py: --model default flipped from "llama" to
  "Qwen2.5-32B-Instruct" so a default invocation matches the
  documented v0.6 calibration.

Tests:
- 6 new cases (4 in test_policy, 1 in test_transparency_taxonomy
  partial-migration regression, 1 implicit via existing audit_purge).
- Full suite 310 passed / 12 skipped, ruff + bandit + mypy clean.
- scripts/lint_full.sh exits PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vaaraio vaaraio merged commit 68085c4 into main Apr 27, 2026
10 checks passed
@vaaraio vaaraio deleted the feat/v0.6-policy-schema branch April 27, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants