Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"source": "git-subdir",
"url": "https://github.com/vaaraio/vaara.git",
"path": "plugins/claude-code-vaara-governance",
"ref": "v0.40.3"
"ref": "v0.40.4"
},
"homepage": "https://vaara.io"
}
Expand Down
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,17 @@ dist_verify/
research/
.claude/

# Private / internal never publish
# Private / internal, never publish
*.tape
.regwatch/
scripts/regwatch*
.shipped/
.v0*_watch/
application_*.pdf
outbound_*.md
Comment on lines +27 to +28
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dot-prefixed private artifacts are not ignored by these patterns.

At Line 27 and Line 28, the new rules ignore application_* / outbound_* but not .application_* / .outbound_*, which are the prefixes used by internal tooling docs. That creates a leakage risk for private files.

Suggested fix
 application_*.pdf
 outbound_*.md
+.application_*.pdf
+.outbound_*.md
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
application_*.pdf
outbound_*.md
application_*.pdf
outbound_*.md
.application_*.pdf
.outbound_*.md
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.gitignore around lines 27 - 28, The .gitignore entries "application_*.pdf"
and "outbound_*.md" miss dot-prefixed filenames (e.g., ".application_x.pdf" and
".outbound_x.md"); update the ignore rules to also match dot-prefixed artifacts
by adding corresponding patterns (e.g., ".application_*.pdf" and
".outbound_*.md") or use a single pattern that covers both (e.g.,
"[.]application_*.pdf" and "[.]outbound_*.md") so files generated by internal
tooling are prevented from being committed.

site.py.live

# Bench output (PAIR runs, dist-shift, vLLM logs). Reproducible by rerun.
tests/adversarial/v031/
.parachute/
claude-code-audit.db
63 changes: 48 additions & 15 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,39 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht

## [Unreleased]

## [0.40.4] - 2026-05-28

**Theme: policy mode presets + plugin shakedown fix delivery.**

### Added
- `vaara mode` CLI subcommand with three actions:
- `vaara mode list` prints the four built-in preset operating points
(`eco`, `balanced`, `performance`, `strict`) with their thresholds
and one-line descriptions.
- `vaara mode show NAME` prints thresholds, description, and watt
profile for a single preset.
- `vaara mode emit NAME [--format json|yaml] [--output PATH]` emits
a minimal valid Vaara policy document for the chosen preset,
ready for the deployer to add action classes, sequences, and
escalation routes. Output round-trips through
`vaara.policy.from_dict`, `from_json`, and `from_yaml`.
- New `vaara.policy.modes` module exposing `Mode`, `available_modes`,
`get_mode`, `to_policy_dict`, `emit_json`, and `emit_yaml`. Presets
are shaped like CPU power profiles: `eco` (0.40 / 0.60) cuts agent
loops short on borderline risk, `balanced` (0.55 / 0.85) is the
default, `performance` (0.70 / 0.92) is for high-throughput
pipelines with tight action-class overrides, `strict` (0.30 / 0.55)
escalates on doubt for incident response and audit prep.

### Changed
- `.claude-plugin/marketplace.json` `ref` bumped from `v0.40.3` to
`v0.40.4`. Delivers the session_start audit-DB-creation fix from
PR #161 to marketplace users.
- `plugins/claude-code-vaara-governance/.claude-plugin/plugin.json`
bumped from `0.1.0` to `0.1.1`. Picks up the session_start fix.
- `server.json` and `server-vaara-server.json` version fields bumped
to 0.40.4.

## [0.40.3] - 2026-05-28

**Theme: registry completion + supply-chain cleanup.**
Expand Down Expand Up @@ -497,7 +530,7 @@ the closed-weight attacker patterns the v7 fold was missing.
tampering rejection, canonicalization invariants, and TTL handling.

### Changed
- Production classifier: v7 v8. v7 retained on disk for cross-eval
- Production classifier: v7 to v8. v7 retained on disk for cross-eval
reproducibility. Threshold unchanged at 0.9006.
- `attestation` optional extra: adds `rfc8785>=0.1.4` for JCS
canonicalization.
Expand Down Expand Up @@ -1498,8 +1531,8 @@ unchanged in behaviour; this patch restores PyPI/npm version lockstep
established in v0.15.0. No Python code changes versus 0.18.0.

### Changed
- `clients/ts/package.json`: 0.17.0 0.18.1 (lockstep with PyPI).
- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 0.18.1.
- `clients/ts/package.json`: 0.17.0 to 0.18.1 (lockstep with PyPI).
- `pyproject.toml`, `src/vaara/__init__.py`: 0.18.0 to 0.18.1.

## [0.18.0] - 2026-05-17

Expand Down Expand Up @@ -1541,7 +1574,7 @@ TEE report is a sibling artefact bound to a specific envelope by placing
non-SEV-SNP host error path.

### Not in this release
- AMD KDS-based cert-chain validation (VCEK ASK ARK). Validating
- AMD KDS-based cert-chain validation (VCEK to ASK to ARK). Validating
against AMD's Key Distribution Service requires a network fetch
against `https://kdsintf.amd.com/` and is tracked for v0.19+.
- Live `/dev/sev-guest` ioctl emission. The `SNP_GET_REPORT` ioctl path
Expand Down Expand Up @@ -1648,8 +1681,8 @@ Node service) can call Vaara without spawning a Python sidecar.
stays manual. Once enabled and an ``NPM_TOKEN`` secret is set, every
tag push publishes ``@vaara/client`` to npm with provenance.
- 6 new TypeScript tests covering URL construction, JSON body
serialisation, the 4xx ``VaaraError`` path with server-supplied
code, the network-failure ``VaaraTransportError`` path, the
serialisation, the 4xx to ``VaaraError`` path with server-supplied
code, the network-failure to ``VaaraTransportError`` path, the
detector response shape, and constructor input validation.

### Notes
Expand Down Expand Up @@ -1704,7 +1737,7 @@ slot in alongside Vaara's adaptive scorer with a single object.
close the most legible competitive gaps without diluting the kernel
position. Hot policy reload meets the Galileo Agent Control selling
point on its own ground. The OVERT 1.0 Phase 3 Independent Attestation
Provider (IAP) reference closes the AAL-3 AAL-4 promotion path that
Provider (IAP) reference closes the AAL-3 to AAL-4 promotion path that
v0.11.0's Provisional Receipt opens, so Vaara owns the full path
without forcing dependence on an external IAP vendor. Named injection
and PII detectors expose existing scoring surface under buyer-visible
Expand Down Expand Up @@ -1742,7 +1775,7 @@ facing visual artefact that the peer set has converged on.
vaara-bench-v1's published numbers (heuristic fallback when the ml
extra is absent; the `backend` field reports which path served the
call). `vaara.detect.detect_pii` is a zero-dependency regex extractor
over six categories email, phone, US SSN, IPv4, credit_card
over six categories: email, phone, US SSN, IPv4, credit_card
(Luhn-checked), IBAN (mod-97 checksum). `POST /v1/detect/injection`
and `POST /v1/detect/pii` mirror the CLI. `vaara detect injection`
and `vaara detect pii` read text from `--text`, `--file`, or
Expand Down Expand Up @@ -2057,7 +2090,7 @@ their conformal interval, get claimed by an operator, and produce an
- **`vaara.audit.review_queue` module.** `ReviewQueue` is a
SQLite-backed queue in its own DB file, separate from the audit DB
(which keeps its append-only invariant clean). Statuses:
`pending claimed resolved` happy path, `pending expired`
`pending to claimed to resolved` happy path, `pending to expired`
stale path. Resolutions: `allow`, `deny`, `abstain`. `enqueue`
records each item with the conformal interval, risk signals,
bucket category, and request parameters/context as JSON. The
Expand Down Expand Up @@ -2150,7 +2183,7 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
- **`scripts/lint_full.sh` pre-push lint sweep** - chains `ruff` (style + correctness), `bandit` (security), `mypy` (types - strict on `vaara.policy`, lenient on legacy modules), and `pytest`. Documented in CONTRIBUTING.md. Catches CodeRabbit-class findings before they hit a PR review round-trip. New dev extras: `bandit>=1.7.5`, `mypy>=1.8`. Bandit configured in `pyproject.toml` to skip B608 across `audit/sqlite_backend.py` (all f-string SQL there interpolates only internally-controlled tenant clauses, not user input). Two `# nosec` annotations document the remaining trusted-bundle and synthetic-trace-RNG sites.

### Changed
- **Audit DB schema v2 v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
- **Audit DB schema v2 to v3.** Migration `_MIGRATIONS[2]` adds four nullable transparency columns to `audit_records`. Pre-v0.6 records get NULL for the new columns. Their stored `record_hash` is preserved (NOT re-hashed on load), so chain verification of historical records continues to work.
- **COMPLIANCE.md "Current limits"** replaced placeholder bullets with v0.6 measurement results:
- **Distribution-shift split.** Hand-curated (held-out, 250): attack recall 97.1% / benign FPR 70.0%. LLM-generated (in-sample, 5,705): attack recall 95.2% / benign FPR 87.5%. The 18pp benign-FPR gap is the dominant distribution-shift signal.
- **Stack composition.** `heuristic_only` recall 35% / 63%. `classifier_only` recall 94% / 86%. `full_stack` recall 97% / 98%. Layers not redundant - heuristic catches a small set of attacks the classifier misses (justifies the ensemble). Most full-stack benign FPR comes from heuristic ESCALATEs, not classifier upgrades.
Expand Down Expand Up @@ -2178,8 +2211,8 @@ No functional code changes. v0.6.0 users are on the same code. V0.6.1 only refre
- `tests/test_adversarial_classifier_integration.py` covers the bundle-load, score-range, and known-bad-input paths end-to-end. Skipped when `vaara[ml]` extras are not installed.

### Changed
- **Default classifier threshold: `0.5` `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
- **Bundle format `version` bumped `1.1` `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
- **Default classifier threshold: `0.5` to `0.55`.** Justified by threshold sweep on the rebalanced corpus: 0.55 is the operating point that clears the FPR and jailbreak-recall gates (global benign FPR ≤ 25%, jailbreak recall ≥ 60%) and passes the canonical preflight smoke test, while staying close to v0.5.2's balanced-accuracy band.
- **Bundle format `version` bumped `1.1` to `1.4`.** Trained on the full 5,955-entry corpus (3,422 attack / 2,533 benign). Feature schema unchanged from v1.1 (236 features), so `_STATIC_FEATURES` schema-drift check passes without modification.
- **`scripts/train_adversarial_classifier.py`** now coerces non-dict `context` and `parameters` entries (string-typed entries existed in the corpus from v0.5.0 onward but the trainer crashed on them) and runs `baseline_predictions` in `best_effort=True` mode. Net effect: trainer runs cleanly on the heterogeneous corpus.

### Benchmarks (5-fold CV OOF, threshold 0.55)
Expand Down Expand Up @@ -2226,7 +2259,7 @@ Per-category allow-leakage on the seed corpus (`tests/adversarial/<category>.jso
| destructive_actions | 20% | **4%** |

### Known limits / honest read
- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% 79.4%) and attack recall regressed **5.4pp** (85.2% 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
- Aggregate balanced accuracy regressed **1.5pp** from v0.5.2 (80.9% to 79.4%) and attack recall regressed **5.4pp** (85.2% to 79.8%). The trade is justified by the **+78.3pp** jailbreak recall delta and the **−2.3pp** FPR improvement, plus the cleaner edge-case behaviour evidenced by the preflight smoke test. v0.5.2's 80.9% balanced accuracy was partly inflated by counting jailbreak as "in scope" while the classifier scored 0% on it.
- LLM-generated content shares Qwen-style writing. The distribution-shift gap between generated-test recall and hand-curated-held-out recall has **not** been measured separately in this release. It will be reported in v0.6. Hand-curated regression numbers above are evidence that transfer is happening, but a formal split is owed.
- Attacker-as-iterative-PAIR ceiling has **not** been measured. `COMPLIANCE.md` does not yet quote an adaptive-ASR figure.

Expand All @@ -2248,10 +2281,10 @@ At threshold 0.55, the 21.0% global FPR is a **reviewer queue**, not a blast doo
- `_STATIC_FEATURES` constant plus load-time schema-drift check in `src/vaara/adversarial_classifier.py`. A bundle whose `feature_names` tail diverges from the runtime static feature list now raises `ValueError` at construction time, pinpointing the first differing index. This class of bug is no longer shippable without failing loud.

### Changed
- Default threshold: `0.3` (v0.5.1) `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
- Default threshold: `0.3` (v0.5.1) to `0.5`. Balanced accuracy peaks at 0.5 on the rebuilt bundle. The v0.5.1 claim of "52% recall, 3.3% FPR at threshold 0.3" was itself a recordkeeping error: the bundle saved `0.8`, not `0.3`, and those numbers were measured at 0.8.
- `scripts/train_adversarial_classifier.py` `load_corpus` now uses `rglob` to recurse into `tests/adversarial/generated/` and `benign_generated/` automatically.
- The `data_exfil` and `destructive_actions` regressions disclosed in v0.5.1 were artifacts of the broken bundle. The rebuilt classifier in v0.5.2 beats the heuristic in both: `destructive_actions` +40.2, `data_exfil` +24.7.
- Bundle format `version` bumped 1.0 1.1.
- Bundle format `version` bumped 1.0 to 1.1.

### Benchmarks (by-seed held-out, threshold 0.5)
- Attack recall: **85.2%**
Expand Down
2 changes: 1 addition & 1 deletion COMPLIANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -470,7 +470,7 @@ correspondence.
policy id and violation reason.
- **TOOL-1.4** (provisional receipt before execution, upgrade to full
attestation after notary validation) - ✅ structurally at AAL-3,
with the AAL-3 AAL-4 path now implementable in-tree. The Article
with the AAL-3 to AAL-4 path now implementable in-tree. The Article
12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
Provisional Receipt; the v0.11.0 OVERT Base Envelope is the
attested form. v0.13.0 ships a reference Phase 3 IAP
Expand Down
2 changes: 1 addition & 1 deletion OVERT_CONTROLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ primitive in Section 9, MEA-2.
policy id and violation reason.
- **TOOL-1.4** (provisional receipt before execution, upgrade to full
attestation after notary validation) - ✅ structurally at AAL-3,
with the AAL-3 AAL-4 path now implementable in-tree. The Article
with the AAL-3 to AAL-4 path now implementable in-tree. The Article
12 commit-prove receipt pair (shipped v0.10.0) is the Phase 2
Provisional Receipt. The v0.11.0 OVERT Base Envelope is the
attested form. v0.13.0 ships a reference Phase 3 IAP
Expand Down
Loading