Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
64b180d
feat: v0.6 policy schema (Sketch A) — JSON-native loader + [yaml] extra
vaaraio Apr 27, 2026
0687457
feat: v0.6 retention purge — Article 12(2) enforcement with documente…
vaaraio Apr 27, 2026
75a16a2
docs: COMPLIANCE.md — Annex IV evidence sections + CEN-CENELEC alignment
vaaraio Apr 27, 2026
5211933
feat: v0.6 distribution-shift split — hand-curated vs LLM-generated r…
vaaraio Apr 27, 2026
fdc7b33
feat: v0.6 stack ablation — heuristic / classifier / full-stack measu…
vaaraio Apr 27, 2026
dfc0e20
feat: v0.6 transparency taxonomy — prEN ISO/IEC 12792 four-axis tagging
vaaraio Apr 27, 2026
8bfb83f
feat: v0.6 PAIR adaptive-attacker calibration — ASR 0.0% (0/25)
vaaraio Apr 27, 2026
b0e9545
chore: bump version 0.5.3 -> 0.6.0 + CHANGELOG entry
vaaraio Apr 27, 2026
4c3e3c4
chore: gitignore research/ and .claude/ scratch dirs
vaaraio Apr 27, 2026
902c5d5
test: silence CodeQL findings in test_transparency_taxonomy
vaaraio Apr 27, 2026
11f0d11
fix: address CodeRabbit major findings on PR #42
vaaraio Apr 27, 2026
422f4d5
fix: address remaining CodeRabbit findings on PR #42
vaaraio Apr 27, 2026
a784831
fix: address CodeRabbit findings on 422f4d5 — flaky test + measuremen…
vaaraio Apr 27, 2026
e4d8d69
fix: harden eval_distribution_shift script per CodeRabbit round-4
vaaraio Apr 27, 2026
b569556
chore: pre-push lint sweep — ruff + bandit + mypy + pytest
vaaraio Apr 27, 2026
95bb0f4
fix: address all 10 CodeRabbit full-review findings
vaaraio Apr 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ venv/
docs/blog/
docs/grant/
dist_verify/
research/
.claude/
32 changes: 32 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,38 @@ All notable changes to this project are documented here.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.6.0] - 2026-04-27

**Theme: standards alignment + legibility.** v0.5.x was the capability axis (jailbreak coverage closed, classifier rebalanced). v0.6 is the legibility axis: policies become readable, audit records become standards-aligned, adversarial numbers become honest, architecture contribution becomes documented.

### Added
- **`vaara.policy` package — JSON-native policy loader plus optional YAML via `vaara[yaml]` extra.** Frozen dataclasses for action classes, threshold curves, sequence patterns, and escalation routes. Hand-rolled validation with field-path error messages. Reuses existing `vaara.taxonomy.actions` enums verbatim. Threshold partial-overrides supported (set just `deny`, inherit default `escalate`). Implements Sketch A from the v0.6 DSL design exploration; embedded Python DSL (Sketch B) and standalone DSL (Sketch C) stay deferred to v0.7+ pending external pull.
- **`vaara trail purge --db PATH --retention-days N (--tenant TID | --all-tenants) [--dry-run]` CLI subcommand** plus `SQLiteAuditBackend.purge_older_than(seconds, *, dry_run=False)` Python API. Article 12(2) retention enforcement. Tenant scoping is required: pick `--tenant TID` for a single tenant or `--all-tenants` explicitly, so a shared multi-tenant audit DB can never be silently purged across all tenants. Hash-chain integrity: surviving records still reference deleted predecessors via `previous_hash`, leaving a documented seam at the retention boundary that subsequent loads expose as a hash mismatch. Intended workflow: export a signed handoff zip BEFORE purging, archive externally, then purge. The signed zip remains self-consistent forever; the live DB chain has the seam.
- **prEN ISO/IEC 12792 four-axis transparency taxonomy on `AuditRecord`.** Four optional fields (`system_operation`, `data_usage`, `decision_making`, `limitations`) with default-classification heuristic per `EventType`. Per-record override via construction kwargs. NOT tamper-evident in v0.6 — fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. v0.7+ may add a separate signing mechanism if compliance requires.
- **`scripts/eval_distribution_shift.py`** — runs the full Vaara stack against the adversarial corpus with per-source tagging (hand-curated vs LLM-generated). Reports recall and FPR per source/class.
- **`scripts/eval_stack_ablation.py`** — runs three configurations (heuristic-only, classifier-only, full-stack) against the same corpus. Quantifies the independent contribution of each layer.
- **`scripts/eval_pair_attack.py`** — PAIR (Chao et al. 2023) iterative adaptive attacker. Uses an OpenAI-compatible vLLM endpoint for both attacker and judge roles. Zero new runtime deps (uses `urllib.request`).
- **`[yaml]` optional extra in `pyproject.toml`** (`pyyaml>=6.0`). Core `dependencies = []` preserved.
- **`examples/policies/minimal.json` and `full.yaml`** as reference policies.
- **COMPLIANCE.md gains "EU AI Act Annex IV evidence sections"** (maps Vaara contribution per §1–§9; direct fill on §3, §5, §9; contributes on §2, §4, §6, §7; out of scope for §1, §8) **and "CEN-CENELEC harmonised standards alignment"** (per-standard table for ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792).
- **`scripts/lint_full.sh` pre-push lint sweep** — chains `ruff` (style + correctness), `bandit` (security), `mypy` (types — strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites.

### Changed
- **Audit DB schema v2 → v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns; their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
- **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results:
- **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
- **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant — heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
- **PAIR adaptive-attacker calibration.** Qwen2.5-32B-Instruct as both attacker and judge, 25 hand-curated jailbreak seeds, max 5 iterations: **ASR 0.0% (0/25)**. NOT a claim of imperviousness to all adaptive attackers — stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR.

### Deferred to v0.7+
- **prEN ISO/IEC 24970 field-alias layer** — pending public final of the standard. Will land when 24970 publishes.
- **DORA mapping refinement** — pending deployer-side signal. Conservative defaults shipped in v0.5.3 stay until a financial deployer's input refines them.

### Reproducible artifacts
- `tests/adversarial/distribution_shift_v0_5_3.json`
- `tests/adversarial/stack_ablation_v0_5_3.json`
- `tests/adversarial/pair_v0_5_3.json`

## [0.5.3] - 2026-04-26

### Fixed
Expand Down
139 changes: 128 additions & 11 deletions COMPLIANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,30 @@ Vaara produces is the feedstock a deployer uses to satisfy 26(1)
26(5) ("monitor operation"), and 26(6) ("keep logs"). Deployer conduct
outside the Vaara pipeline is not in scope.

## EU AI Act Annex IV evidence sections

Annex IV defines nine technical documentation sections required under
Article 11. Vaara fills three of those sections directly, contributes
to four, and stays out of two.

| Annex IV section | What it asks for | Vaara contribution |
|---|---|---|
| §1 General description | Purpose, intended use, versions, provider info | Out of scope. Provider supplies. |
| §2 Elements and development process | Architecture, datasets, training choices | Contributes a description of the runtime governance layer. Vaara docs and configuration are an Annex IV §2 input. |
| §3 Monitoring, functioning and control | How the system is monitored at runtime | **Direct fill.** Hash-chained audit trail, per-action risk score and reason, decision records. |
| §4 Performance metrics appropriateness | Metric choice and justification | Contributes runtime metrics: allow / deny / escalation rate, score distribution, calibration window. |
| §5 Risk management system per Article 9 | Risk identification, assessment, mitigation | **Direct fill.** `RISK_SCORED`, `ACTION_BLOCKED`, `DECISION_MADE` events with article tags. |
| §6 Relevant changes during lifecycle | Versioned change history of the system | Contributes the timestamped audit trail showing runtime config and threshold changes. Provider tracks model and code changes separately. |
| §7 List of harmonised standards applied | Named CEN-CENELEC standards | Vaara aligns with several JTC21 drafts (see next section). Once those finalize, deployers list them here. |
| §8 Copy of EU declaration of conformity | The DoC document itself | Out of scope. Provider drafts and signs. |
| §9 Post-market performance evaluation system | Mechanism for monitoring AI performance after deployment | **Direct fill.** `OUTCOME_RECORDED` events tied back to `action_id`, feeding the adaptive scorer. |

Direct-fill sections (§3, §5, §9) are populated automatically by the
`vaara trail export` handoff zip plus the `run_compliance_assessment`
report. Contributing sections (§2, §4, §6, §7) need a deployer to
combine Vaara output with their own provider-side documentation.
Out-of-scope sections (§1, §8) are the deployer's or provider's domain.

## DORA Article Mapping

Relevant for financial entities only. The default `ComplianceEngine`
Expand All @@ -63,6 +87,37 @@ also ships with a DORA bundle:
| **12(1)** | ICT Incident Detection | `ACTION_REQUESTED` and `ACTION_BLOCKED` records, with risk score and reason. |
| **13(1)** | ICT Incident Response and Learning | `OUTCOME_RECORDED` events close the loop and feed the adaptive scorer. |

## CEN-CENELEC harmonised standards alignment

The harmonised standards under EU AI Act Article 40 are being drafted
by CEN-CENELEC JTC21. Most are still in draft or public-consultation
phase. The table below maps Vaara's current state to the relevant
JTC21 work items so deployers can track alignment as standards
finalize. Status as of April 2026.

| Standard | WG | Status | Vaara alignment |
|---|---|---|---|
| **ISO/IEC 42001** AI Management System | WG2 | Final ballot for European adoption | Vaara is a tool that fits inside an Article 17 / 42001 AIMS. Vaara does not implement the AIMS itself. |
| **prEN 18286** European AI QMS for Regulatory Purposes | WG2 | Public consultation closed 22 Jan 2026 | Vaara feeds Article 72 ongoing-surveillance obligations and supports Annex VI / Annex VII evidence requirements. The QMS is the deployer's. |
| **prEN 18228** European AI Risk Management Standard | WG2 | Drafting | Vaara contributes the ongoing-monitoring signal called for in the AI Act risk-category integration sections. |
| **ISO/IEC 42006** Requirements for AI Management System Auditors | WG2 | DIS Stage 40 | Vaara's hash-chained trail is the artefact 42006-qualified auditors examine for surveillance evidence. |
| **prEN ISO/IEC 24970** AI System Logging | WG3 | Stage 30.2 (comment resolution) | Vaara aligns with the tamper-resistance, decision-factor logging, and audit-system integration requirements. Field-level alignment pending the published version. |
| **prEN 18229-1** Trustworthiness Framework Pt 1 (logging, transparency, human oversight) | WG4 | Public enquiry | Implements AI Act Articles 12-14, which Vaara already maps to in the article table above. Field-level alignment pending the published version. |
| **prEN ISO/IEC 12792** Transparency Taxonomy of AI Systems | WG4 | Stage 40 (final vote) | v0.6 ships per-action audit records tagged against the four-axis model (System Operation, Data Usage, Decision Making, Limitations) via four optional `AuditRecord` fields. Default classification heuristic by event type; per-record override available. NOT tamper-evident in v0.6 — fields are metadata annotations excluded from `record_hash` so pre-v0.6 chains stay valid. |

**What "alignment" means here.** Most of these standards have not
published. The mapping above is pre-compliance positioning: Vaara is
built so that when the finals drop, the gap to certified alignment is
small. It is not a claim of certified compliance. Once a standard
publishes, expect a v0.6 or v0.7 alignment audit and an updated entry
in this table.

**What the deployer does with this table.** When listing harmonised
standards applied (Annex IV §7), the deployer cites the published
ones. Where Vaara's runtime behaviour aligns with a draft, that is
useful context for an auditor or notified body but not a substitute
for the published version.

## What Vaara produces

Three artefact classes, all tied to a single `action_id`:
Expand Down Expand Up @@ -182,8 +237,19 @@ problem:
tampered with before reaching Vaara. Run Vaara inside a trust
boundary you control.
- **Retention policy.** Article 12(2) allows log retention periods set
in accordance with the intended purpose and applicable law. Vaara
does not purge on your behalf. Wire a retention job to your policy.
in accordance with the intended purpose and applicable law. The
deployer picks the period. Vaara enforces it via
`vaara trail purge --db PATH --retention-days N` (or
`SQLiteAuditBackend.purge_older_than(seconds)` from Python). A
`--dry-run` flag reports the count without modifying the DB.

**Hash-chain seam at the retention boundary.** Surviving records
still reference deleted predecessors via `previous_hash`, so
`vaara trail verify` will report a chain break at the boundary.
Intended workflow: export a signed handoff zip BEFORE purging,
archive the zip externally for long-tail audit history, then purge
the live DB. The signed zip remains self-consistent forever; the
live DB chain has a documented seam at the retention boundary.

## Current limits

Expand All @@ -199,15 +265,66 @@ Honest about the edges:
- The Article 11 technical documentation requirement is checked as a
presence flag only. Drafting the Annex IV file is outside Vaara's
scope and will stay that way.
- The `AdversarialClassifier` (v0.5.3, opt-in via `vaara[ml]`) was
retrained on a corpus that includes 1,500 LLM-generated jailbreak
variants. The distribution-shift gap between LLM-generated and
hand-curated held-out recall has not been measured separately in
v0.5.3. Hand-curated regression numbers in the CHANGELOG indicate
transfer is happening, but a formal split is owed in v0.6.
- v0.5.3 does not yet quote an adaptive-attacker (PAIR-style)
attack-success-rate. Iterative attacker capability is a known limit
and a calibration figure is planned for v0.6.
- **Distribution-shift split (v0.6 measurement of v0.5.3 stack).** The
`AdversarialClassifier` (opt-in via `vaara[ml]`) was retrained on a
corpus that mixes hand-curated and LLM-generated entries. v0.6
measures the per-source full-stack performance:

| Source | Attack recall | Benign FPR |
|---------------------------------------|--------------:|-----------:|
| Hand-curated (held-out, 250 entries) | 97.1% | 70.0% |
| LLM-generated (in-sample, 5,705) | 95.2% | 87.5% |

Reading: full-stack = heuristic `ESCALATE`/`DENY` preserved + classifier
upgrades on heuristic `ALLOW`. Hand-curated entries are held-out (not
in classifier training). LLM-generated entries WERE in training, so
their numbers are in-sample fit, not generalization.

The 1.9pp recall gap (97.1% > 95.2%) is small but goes against the
expected direction. The 18pp benign-FPR gap (70.0% < 87.5%) is the
dominant distribution-shift signal: the stack is much more confused
about LLM-generated benigns than hand-curated ones.

Note on FPR vs CHANGELOG headline: the CHANGELOG quotes "global benign
FPR 21.0%" which is classifier-alone 5-fold CV OOF. The full-stack
numbers above are dominated by the heuristic — most benign escalations
come from the heuristic `ESCALATE` branch, not from classifier upgrades
on heuristic-`ALLOW`ed entries.

Detailed per-source/per-class breakdown: `tests/adversarial/distribution_shift_v0_5_3.json`.
Reproducible via `scripts/eval_distribution_shift.py`. A proper OOF
split for the LLM-generated portion (re-running held-out per fold) is
a v0.7 follow-up if the gap demands it.
- **Stack composition (v0.6 measurement).** The full-stack numbers above
decompose into independent layer contributions. `heuristic_only` recall
is 35% / 63% (hand-curated / LLM-generated); `classifier_only` recall
is 94% / 86%. Layers are not redundant — heuristic catches a small set
of attacks the classifier misses, justifying the ensemble. Most of the
full-stack benign FPR comes from heuristic ESCALATEs, not classifier
upgrades. Detailed breakdown: `tests/adversarial/stack_ablation_v0_5_3.json`.
Reproducible via `scripts/eval_stack_ablation.py`.
- **Adaptive-attacker calibration (v0.6 measurement of v0.5.3 stack).**
PAIR (Chao et al. 2023) iterative attacker against the full Vaara
stack:
- Attacker + judge model: Qwen2.5-32B-Instruct (Apache 2.0)
- Seed corpus: 25 hand-curated jailbreak entries (`tests/adversarial/jailbreak.jsonl`)
- Max iterations per seed: 5
- Total LLM calls: 125 attacker iterations across 25 seeds, plus
judge confirmations on heuristic-ALLOW outcomes
- **ASR: 0.0% (0/25)**. Across 125 candidate prompts, Vaara
escalated 124 and allowed 1; the judge ruled the allowed candidate
not a successful jailbreak.

Reading: Vaara stack catches DAN-roleplay, "hypothetical scenario",
and "security drill" -style jailbreak attempts at this attacker
capability level. NOT a claim of imperviousness to all adaptive attackers
— a stronger attacker model (70B+), longer iteration budgets, or
different strategies (multi-turn drift, language-switch, obfuscation)
might produce non-zero ASR. v0.7 follow-up: re-run with 70B+ attacker
+ judge if a compliance audience requires the harder calibration.

Detailed per-seed breakdown: `tests/adversarial/pair_v0_5_3.json`.
Reproducible via `scripts/eval_pair_attack.py`.

## Questions

Expand Down
13 changes: 13 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,19 @@ Thanks for considering a contribution.
- **Public interface changes.** If you change the public API, update `docs/formal_specification.md` and `CHANGELOG.md` in the same PR.
- **Security-sensitive changes.** Follow `SECURITY.md` for private disclosure of vulnerabilities.

## Pre-push lint sweep

Before pushing, run the full lint sweep from the repo root:

```bash
pip install -e '.[dev]' # one-time setup
scripts/lint_full.sh
```

The script chains four checks: `ruff` (style + correctness), `bandit` (security), `mypy` (types — strict on `vaara.policy`, lenient elsewhere while legacy modules are migrated), and `pytest`. Total runtime ~10s. CI runs the same gates, so a green local sweep should mean a green PR.

New modules under `src/vaara/` are expected to type-check cleanly. As legacy modules get cleaned up, add them to the strict mypy block in `pyproject.toml` so the typing floor only ratchets upward.

## Licensing

By contributing you agree that your contributions will be licensed under the Apache License 2.0, the same license that covers the project (see `LICENSE`).
Loading
Loading