Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,71 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht

## [Unreleased]

## [0.9.0] - 2026-05-14

**Theme: policy artifact validate + test framework.** v0.9.0 ships the
two CLI surfaces that turn the YAML / JSON policy from "a config file
the pipeline loads" into a policy-as-code artifact that compliance
teams can validate and test in CI, independently from the agent code
it governs.

### Added
- **`vaara.policy.validate` module.** `validate(Policy)` returns a
`ValidationReport` with structured `PolicyIssue` records (level,
code, path, message). Semantic warnings emitted: empty
`action_classes`, threshold bands narrower than 0.05 (default and
per-class merged), threshold overrides targeting an action class
not declared, sequence-pattern steps that do not name a declared
action class, escalation routes whose `if_articles` overlap no
emitted regulatory tag, missing default escalation route.
`validate_source(source, fmt="auto")` combines load and check so a
single call yields `(policy, report)` or `(None,
report-with-error)`. Stable JSON shape via `ValidationReport.to_dict()`.
- **`vaara.policy.test_cases` module — Conftest analog for Vaara
policies.** `evaluate(policy, action_class, risk_score,
matched_sequences=())` is the underlying primitive: applies any
matched sequence pattern boosts (capped at 1.0), resolves the
merged threshold for the action class, returns
`EvaluationResult(verdict, boosted_risk, route)`. `PolicyTestCase`
captures the inputs plus an expected verdict and (for
`escalate`) an expected operator route. `run_test_cases(policy,
cases)` runs the list, captures evaluation errors as failed cases
rather than raising, and returns `PolicyTestResult` rows. The
evaluator validates inputs at the boundary (risk score in `[0,1]`,
action class declared, matched sequences known).
- **`vaara.policy.test_cases_io` module.** `load_test_cases(path)`
reads a YAML or JSON cases document and returns a list of
`PolicyTestCase`. Document shape mirrors typical OPA / Conftest
test files: a top-level `cases:` list with `action_class`,
`risk_score`, optional `matched_sequences`, and an `expect:` block
carrying `verdict` and optional `route`.
- **`vaara policy validate POLICY_PATH [--json]`** and **`vaara
policy test POLICY_PATH --cases CASES_PATH [--json]`** CLI
subcommands. Both honour standard CI exit codes: validate returns
1 on parse errors (warnings do not flip), test returns 1 on any
failed case (and 2 if the policy itself fails to parse).
- **`examples/policies/test_cases.yaml`** — six worked test cases
exercising thresholds, sequence-pattern boost, default and
article-matched escalation routes against
`examples/policies/full.yaml`.
- **48 new tests** (`tests/test_policy_validate.py`,
`tests/test_policy_test_cases.py`,
`tests/test_policy_test_cases_io.py`, `tests/test_policy_cli.py`)
covering report shape, every warning code, evaluator edges
(threshold-equal-escalate, boost cap at 1.0, unknown action class,
unknown sequence, out-of-range risk), case-construction validation
(bad verdict, route without escalate), YAML and JSON case files,
the worked example end-to-end, and CLI smoke for each subcommand
including `--json`. Full suite 472 / 472 pass.
- **COMPLIANCE.md** gains a *Policy artifact review* subsection under
Article 14 documenting both CLI surfaces as the path to reviewing
the policy artifact independently from the agent code.

### Note
Backwards-compatible. Pure addition. No existing module signatures
change. `Policy` and the load path are unchanged; the new modules
sit beside them under `vaara.policy.*`.

## [0.8.0] - 2026-05-14

**Theme: Article 73 serious-incident export (interim).** Adds the export
Expand Down
27 changes: 27 additions & 0 deletions COMPLIANCE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,33 @@ whether the model is confident. A conformal interval of [0.58, 0.62]
versus [0.2, 0.95] tells them whether to trust the number. Vaara
surfaces both on every escalation.

### Policy artifact review

The Vaara policy is a declarative YAML / JSON document loaded via
`vaara.policy.from_yaml()` or `from_json()`. As of v0.9, two CLI
surfaces let a compliance team review the policy artifact
independently from the agent code that uses it:

- `vaara policy validate POLICY_PATH` runs structured semantic checks
(parse errors plus warnings for narrow threshold bands, dangling
per-class overrides, unreachable escalation routes, sequence steps
not naming a declared action class, missing default escalation
route). Exit code 0 if no errors; warnings print without flipping
the exit code.
- `vaara policy test POLICY_PATH --cases CASES_PATH` runs a YAML/JSON
cases file against the policy (Conftest analog for Vaara). Each
case names an action class, a risk score, any sequence patterns to
treat as matched, and an expected verdict and route. Exit code 0
if every case passes.

Both commands carry a `--json` flag so CI pipelines can consume the
output directly. The policy document and its cases file live in the
deployer's source-control tree, version-controlled and diffable,
alongside any other policy-as-code artifacts (Rego, Cedar, Casbin)
used in the same governance stack. Worked example at
`examples/policies/test_cases.yaml` exercises
`examples/policies/full.yaml`.

### Article 26 (deployer obligations)

Article 26 obligations sit on the deployer, not on Vaara. The evidence
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ else:
- [Article 14 runtime: why oversight of agentic AI has to be evidenced as action, not model](https://futurium.ec.europa.eu/ga/apply-ai-alliance/community-content/article-14-runtime-why-oversight-agentic-ai-has-be-evidenced-action-not-model): why this exists. Posted on the EU Apply AI Alliance Futurium.
- `src/vaara/integrations/`: LangChain, OpenAI Agents SDK, CrewAI, MCP server.
- `src/vaara/audit/`: hash-chain trail, SQLite backend, append-only WAL.
- `src/vaara/policy/`: declarative YAML / JSON policy schema with `vaara policy validate` (semantic checks) and `vaara policy test` (Conftest-style cases-file runner) for reviewing the policy artifact in CI independently from agent code.
- `src/vaara/sandbox/`: synthetic-trace cold-start calibration.

> Vaara helps deployers assemble evidence for their own conformity work. It does not certify compliance or constitute legal advice. Deployers own their obligations under the EU AI Act and other applicable law.
Expand Down
58 changes: 58 additions & 0 deletions examples/policies/test_cases.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Worked example: test cases for `examples/policies/full.yaml`.
# Run with:
# vaara policy test examples/policies/full.yaml --cases examples/policies/test_cases.yaml
# Compliance teams can author and version-control these files independently
# from agent code (EU AI Act Article 14 — independently-reviewable policy
# artifact). Exit code is 0 iff every case passes.

cases:
# Baseline allow — risk below the escalate threshold.
- name: fs.write_file low risk allows
action_class: fs.write_file
risk_score: 0.30
expect:
verdict: allow

# Default escalate threshold (0.55) reached, no per-class override on escalate.
# fs.write_file does NOT declare aiact:14 so escalation falls through to default.
- name: fs.write_file mid risk escalates to default route
action_class: fs.write_file
risk_score: 0.60
expect:
verdict: escalate
route: on_call

# Per-class override tightens deny to 0.75 (default is 0.85).
- name: fs.write_file deny via tighter override
action_class: fs.write_file
risk_score: 0.80
expect:
verdict: deny

# tx.sign override is stricter: escalate=0.40, deny=0.65.
# Its regulatory tags include aiact:14 so the first escalation route wins.
- name: tx.sign escalates to ai_oversight_team on aiact 14 match
action_class: tx.sign
risk_score: 0.50
expect:
verdict: escalate
route: ai_oversight_team

# Sequence boost pushes a sub-escalate score over deny:
# 0.40 (risk) + 0.30 (config_then_signal boost) = 0.70 >= 0.65 deny override.
- name: tx.sign deny after config_then_signal sequence boost
action_class: tx.sign
risk_score: 0.40
matched_sequences: [config_then_signal]
expect:
verdict: deny

# email.send uses default thresholds (escalate 0.55, deny 0.85).
# Its only regulatory tag is aiact:13 — no escalation route matches that,
# so escalations fall through to the default 'on_call' route.
- name: email.send escalates to on_call when no specific route matches
action_class: email.send
risk_score: 0.70
expect:
verdict: escalate
route: on_call
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "vaara"
version = "0.8.0"
version = "0.9.0"
description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance"
requires-python = ">=3.10"
license = "Apache-2.0"
Expand Down
2 changes: 1 addition & 1 deletion src/vaara/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
oversight.
"""

__version__ = "0.8.0"
__version__ = "0.9.0"

from vaara.pipeline import InterceptionPipeline, InterceptionResult

Expand Down
117 changes: 117 additions & 0 deletions src/vaara/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,19 @@
Mark pending items older than ``timeout-seconds`` as expired.
Claimed items are left alone.

vaara policy validate POLICY_PATH [--json]
Load a YAML/JSON policy and run semantic checks. Exit 0 if no
errors. Warnings (narrow threshold bands, dangling overrides,
unreachable escalation routes, missing default route, sequence
steps not naming a declared action class) print without
flipping the exit code.

vaara policy test POLICY_PATH --cases CASES_PATH [--json]
Run a YAML/JSON cases file against a policy (Conftest analog).
Each case names an action_class, a risk_score, optional
matched_sequences, and an expected verdict / route. Exit 0 if
every case passes.

vaara version
Print the installed Vaara version.

Expand Down Expand Up @@ -465,6 +478,78 @@ def _cmd_review_expire(args: argparse.Namespace) -> int:
return 0


def _render_validation_report(report, *, source_label: str, as_json: bool) -> str:
if as_json:
return json.dumps(report.to_dict(), indent=2)
if not report.issues:
return f"{source_label}: ok (no issues)"
lines = [
f" [{i.level.value}] {i.code}"
f"{(' at ' + i.path) if i.path else ''}: {i.message}"
for i in report.issues
]
header = (
f"{source_label}: {len(report.errors)} error(s), "
f"{len(report.warnings)} warning(s)"
)
return header + "\n" + "\n".join(lines)


def _cmd_policy_validate(args: argparse.Namespace) -> int:
from vaara.policy.validate import validate_source

policy_path = Path(args.policy).expanduser()
_policy, report = validate_source(policy_path)
print(_render_validation_report(
report, source_label=str(policy_path), as_json=args.json,
))
return 0 if report.ok else 1


def _render_test_results(results, *, as_json: bool) -> str:
if as_json:
return json.dumps({
"total": len(results),
"passed": sum(1 for r in results if r.passed),
"failed": sum(1 for r in results if not r.passed),
"results": [r.to_dict() for r in results],
}, indent=2)
lines = []
for r in results:
mark = "PASS" if r.passed else "FAIL"
suffix = "" if r.passed else f" — {r.diagnostic}"
lines.append(f" [{mark}] {r.case.name}{suffix}")
failed = sum(1 for r in results if not r.passed)
header = f"{len(results)} case(s), {len(results) - failed} passed, {failed} failed"
return header + "\n" + "\n".join(lines)


def _cmd_policy_test(args: argparse.Namespace) -> int:
from vaara.policy.test_cases import run_test_cases
from vaara.policy.test_cases_io import load_test_cases
from vaara.policy.validate import validate_source

policy_path = Path(args.policy).expanduser()
cases_path = Path(args.cases).expanduser()

policy, report = validate_source(policy_path)
if policy is None:
print(_render_validation_report(
report, source_label=str(policy_path), as_json=args.json,
), file=sys.stderr)
return 2

try:
cases = load_test_cases(cases_path)
except Exception as e:
print(f"failed to load cases from {cases_path}: {e}", file=sys.stderr)
return 2

results = run_test_cases(policy, cases)
print(_render_test_results(results, as_json=args.json))
return 0 if all(r.passed for r in results) else 1


def build_parser() -> argparse.ArgumentParser:
p = argparse.ArgumentParser(prog="vaara", description="Vaara AI Agent Execution Layer")
sub = p.add_subparsers(dest="cmd", required=True)
Expand Down Expand Up @@ -636,6 +721,38 @@ def build_parser() -> argparse.ArgumentParser:
)
re_.set_defaults(func=_cmd_review_expire)

pp_policy = sub.add_parser(
"policy",
help="Policy artifact commands (validate, test)",
)
psub = pp_policy.add_subparsers(dest="policy_cmd", required=True)

pvalid = psub.add_parser(
"validate",
help="Load a policy and report parse errors plus semantic warnings",
)
pvalid.add_argument("policy", help="Path to a YAML or JSON policy file")
pvalid.add_argument(
"--json", action="store_true",
help="Emit the report as JSON (stable shape for CI)",
)
pvalid.set_defaults(func=_cmd_policy_validate)

ptest = psub.add_parser(
"test",
help="Run a YAML/JSON cases file against a policy (Conftest analog)",
)
ptest.add_argument("policy", help="Path to a YAML or JSON policy file")
ptest.add_argument(
"--cases", required=True,
help="Path to a YAML or JSON file containing a 'cases:' list",
)
ptest.add_argument(
"--json", action="store_true",
help="Emit results as JSON (stable shape for CI)",
)
ptest.set_defaults(func=_cmd_policy_test)

return p


Expand Down
37 changes: 37 additions & 0 deletions src/vaara/policy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,16 @@
The companion JSON Schema document at `docs/policy_schema.json` is the citable
spec for compliance-evidence purposes. Hand-rolled validation in `loader.py`
mirrors the schema, with clean error paths for human readers.

Beyond load and parse, two surfaces support reviewing the policy artifact
independently from agent code:

- ``validate`` / ``validate_source`` (in ``vaara.policy.validate``) returns a
structured report with parse errors and semantic warnings — usable in CI.
- ``evaluate`` + ``run_test_cases`` (in ``vaara.policy.test_cases``) let a
team write synthetic action contexts and assert expected verdicts against a
policy, Conftest-style. YAML/JSON case files load via
``load_test_cases`` (in ``vaara.policy.test_cases_io``).
"""

from vaara.policy.schema import (
Expand All @@ -24,16 +34,43 @@
Thresholds,
)
from vaara.policy.loader import from_dict, from_json, from_yaml
from vaara.policy.validate import (
IssueLevel,
PolicyIssue,
ValidationReport,
validate,
validate_source,
)
from vaara.policy.test_cases import (
EvaluationResult,
PolicyTestCase,
PolicyTestResult,
evaluate,
run_test_cases,
)
from vaara.policy.test_cases_io import load_test_cases, parse_cases

__all__ = [
"SCHEMA_VERSION",
"ActionClassDef",
"EscalationRoute",
"EvaluationResult",
"IssueLevel",
"Policy",
"PolicyError",
"PolicyIssue",
"PolicyTestCase",
"PolicyTestResult",
"SequencePattern",
"Thresholds",
"ValidationReport",
"evaluate",
"from_dict",
"from_json",
"from_yaml",
"load_test_cases",
"parse_cases",
"run_test_cases",
"validate",
"validate_source",
]
Loading