vaaraio · vaaraio · May 15, 2026 · May 14, 2026 · May 14, 2026
@@ -6,6 +6,71 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 
 ## [Unreleased]
 
+## [0.9.0] - 2026-05-14
+
+**Theme: policy artifact validate + test framework.** v0.9.0 ships the
+two CLI surfaces that turn the YAML / JSON policy from "a config file
+the pipeline loads" into a policy-as-code artifact that compliance
+teams can validate and test in CI, independently from the agent code
+it governs.
+
+### Added
+- **`vaara.policy.validate` module.** `validate(Policy)` returns a
+  `ValidationReport` with structured `PolicyIssue` records (level,
+  code, path, message). Semantic warnings emitted: empty
+  `action_classes`, threshold bands narrower than 0.05 (default and
+  per-class merged), threshold overrides targeting an action class
+  not declared, sequence-pattern steps that do not name a declared
+  action class, escalation routes whose `if_articles` overlap no
+  emitted regulatory tag, missing default escalation route.
+  `validate_source(source, fmt="auto")` combines load and check so a
+  single call yields `(policy, report)` or `(None,
+  report-with-error)`. Stable JSON shape via `ValidationReport.to_dict()`.
+- **`vaara.policy.test_cases` module — Conftest analog for Vaara
+  policies.** `evaluate(policy, action_class, risk_score,
+  matched_sequences=())` is the underlying primitive: applies any
+  matched sequence pattern boosts (capped at 1.0), resolves the
+  merged threshold for the action class, returns
+  `EvaluationResult(verdict, boosted_risk, route)`. `PolicyTestCase`
+  captures the inputs plus an expected verdict and (for
+  `escalate`) an expected operator route. `run_test_cases(policy,
+  cases)` runs the list, captures evaluation errors as failed cases
+  rather than raising, and returns `PolicyTestResult` rows. The
+  evaluator validates inputs at the boundary (risk score in `[0,1]`,
+  action class declared, matched sequences known).
+- **`vaara.policy.test_cases_io` module.** `load_test_cases(path)`
+  reads a YAML or JSON cases document and returns a list of
+  `PolicyTestCase`. Document shape mirrors typical OPA / Conftest
+  test files: a top-level `cases:` list with `action_class`,
+  `risk_score`, optional `matched_sequences`, and an `expect:` block
+  carrying `verdict` and optional `route`.
+- **`vaara policy validate POLICY_PATH [--json]`** and **`vaara
+  policy test POLICY_PATH --cases CASES_PATH [--json]`** CLI
+  subcommands. Both honour standard CI exit codes: validate returns
+  1 on parse errors (warnings do not flip), test returns 1 on any
+  failed case (and 2 if the policy itself fails to parse).
+- **`examples/policies/test_cases.yaml`** — six worked test cases
+  exercising thresholds, sequence-pattern boost, default and
+  article-matched escalation routes against
+  `examples/policies/full.yaml`.
+- **48 new tests** (`tests/test_policy_validate.py`,
+  `tests/test_policy_test_cases.py`,
+  `tests/test_policy_test_cases_io.py`, `tests/test_policy_cli.py`)
+  covering report shape, every warning code, evaluator edges
+  (threshold-equal-escalate, boost cap at 1.0, unknown action class,
+  unknown sequence, out-of-range risk), case-construction validation
+  (bad verdict, route without escalate), YAML and JSON case files,
+  the worked example end-to-end, and CLI smoke for each subcommand
+  including `--json`. Full suite 472 / 472 pass.
+- **COMPLIANCE.md** gains a *Policy artifact review* subsection under
+  Article 14 documenting both CLI surfaces as the path to reviewing
+  the policy artifact independently from the agent code.
+
+### Note
+Backwards-compatible. Pure addition. No existing module signatures
+change. `Policy` and the load path are unchanged; the new modules
+sit beside them under `vaara.policy.*`.
+
 ## [0.8.0] - 2026-05-14
 
 **Theme: Article 73 serious-incident export (interim).** Adds the export

@@ -45,6 +45,33 @@ whether the model is confident. A conformal interval of [0.58, 0.62]
 versus [0.2, 0.95] tells them whether to trust the number. Vaara
 surfaces both on every escalation.
 
+### Policy artifact review
+
+The Vaara policy is a declarative YAML / JSON document loaded via
+`vaara.policy.from_yaml()` or `from_json()`. As of v0.9, two CLI
+surfaces let a compliance team review the policy artifact
+independently from the agent code that uses it:
+
+- `vaara policy validate POLICY_PATH` runs structured semantic checks
+  (parse errors plus warnings for narrow threshold bands, dangling
+  per-class overrides, unreachable escalation routes, sequence steps
+  not naming a declared action class, missing default escalation
+  route). Exit code 0 if no errors; warnings print without flipping
+  the exit code.
+- `vaara policy test POLICY_PATH --cases CASES_PATH` runs a YAML/JSON
+  cases file against the policy (Conftest analog for Vaara). Each
+  case names an action class, a risk score, any sequence patterns to
+  treat as matched, and an expected verdict and route. Exit code 0
+  if every case passes.
+
+Both commands carry a `--json` flag so CI pipelines can consume the
+output directly. The policy document and its cases file live in the
+deployer's source-control tree, version-controlled and diffable,
+alongside any other policy-as-code artifacts (Rego, Cedar, Casbin)
+used in the same governance stack. Worked example at
+`examples/policies/test_cases.yaml` exercises
+`examples/policies/full.yaml`.
+
 ### Article 26 (deployer obligations)
 
 Article 26 obligations sit on the deployer, not on Vaara. The evidence

@@ -62,6 +62,7 @@ else:
 - [Article 14 runtime: why oversight of agentic AI has to be evidenced as action, not model](https://futurium.ec.europa.eu/ga/apply-ai-alliance/community-content/article-14-runtime-why-oversight-agentic-ai-has-be-evidenced-action-not-model): why this exists. Posted on the EU Apply AI Alliance Futurium.
 - `src/vaara/integrations/`: LangChain, OpenAI Agents SDK, CrewAI, MCP server.
 - `src/vaara/audit/`: hash-chain trail, SQLite backend, append-only WAL.
+- `src/vaara/policy/`: declarative YAML / JSON policy schema with `vaara policy validate` (semantic checks) and `vaara policy test` (Conftest-style cases-file runner) for reviewing the policy artifact in CI independently from agent code.
 - `src/vaara/sandbox/`: synthetic-trace cold-start calibration.
 
 > Vaara helps deployers assemble evidence for their own conformity work. It does not certify compliance or constitute legal advice. Deployers own their obligations under the EU AI Act and other applicable law.

@@ -0,0 +1,58 @@
+# Worked example: test cases for `examples/policies/full.yaml`.
+# Run with:
+#     vaara policy test examples/policies/full.yaml --cases examples/policies/test_cases.yaml
+# Compliance teams can author and version-control these files independently
+# from agent code (EU AI Act Article 14 — independently-reviewable policy
+# artifact). Exit code is 0 iff every case passes.
+
+cases:
+  # Baseline allow — risk below the escalate threshold.
+  - name: fs.write_file low risk allows
+    action_class: fs.write_file
+    risk_score: 0.30
+    expect:
+      verdict: allow
+
+  # Default escalate threshold (0.55) reached, no per-class override on escalate.
+  # fs.write_file does NOT declare aiact:14 so escalation falls through to default.
+  - name: fs.write_file mid risk escalates to default route
+    action_class: fs.write_file
+    risk_score: 0.60
+    expect:
+      verdict: escalate
+      route: on_call
+
+  # Per-class override tightens deny to 0.75 (default is 0.85).
+  - name: fs.write_file deny via tighter override
+    action_class: fs.write_file
+    risk_score: 0.80
+    expect:
+      verdict: deny
+
+  # tx.sign override is stricter: escalate=0.40, deny=0.65.
+  # Its regulatory tags include aiact:14 so the first escalation route wins.
+  - name: tx.sign escalates to ai_oversight_team on aiact 14 match
+    action_class: tx.sign
+    risk_score: 0.50
+    expect:
+      verdict: escalate
+      route: ai_oversight_team
+
+  # Sequence boost pushes a sub-escalate score over deny:
+  #   0.40 (risk) + 0.30 (config_then_signal boost) = 0.70 >= 0.65 deny override.
+  - name: tx.sign deny after config_then_signal sequence boost
+    action_class: tx.sign
+    risk_score: 0.40
+    matched_sequences: [config_then_signal]
+    expect:
+      verdict: deny
+
+  # email.send uses default thresholds (escalate 0.55, deny 0.85).
+  # Its only regulatory tag is aiact:13 — no escalation route matches that,
+  # so escalations fall through to the default 'on_call' route.
+  - name: email.send escalates to on_call when no specific route matches
+    action_class: email.send
+    risk_score: 0.70
+    expect:
+      verdict: escalate
+      route: on_call
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "vaara"
-version = "0.8.0"
+version = "0.9.0"
 description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance"
 requires-python = ">=3.10"
 license = "Apache-2.0"

@@ -6,7 +6,7 @@
 oversight.
 """
 
-__version__ = "0.8.0"
+__version__ = "0.9.0"
 
 from vaara.pipeline import InterceptionPipeline, InterceptionResult
 

@@ -45,6 +45,19 @@
         Mark pending items older than ``timeout-seconds`` as expired.
         Claimed items are left alone.
 
+    vaara policy validate POLICY_PATH [--json]
+        Load a YAML/JSON policy and run semantic checks. Exit 0 if no
+        errors. Warnings (narrow threshold bands, dangling overrides,
+        unreachable escalation routes, missing default route, sequence
+        steps not naming a declared action class) print without
+        flipping the exit code.
+
+    vaara policy test POLICY_PATH --cases CASES_PATH [--json]
+        Run a YAML/JSON cases file against a policy (Conftest analog).
+        Each case names an action_class, a risk_score, optional
+        matched_sequences, and an expected verdict / route. Exit 0 if
+        every case passes.
+
     vaara version
         Print the installed Vaara version.
 
@@ -465,6 +478,78 @@ def _cmd_review_expire(args: argparse.Namespace) -> int:
     return 0
 
 
+def _render_validation_report(report, *, source_label: str, as_json: bool) -> str:
+    if as_json:
+        return json.dumps(report.to_dict(), indent=2)
+    if not report.issues:
+        return f"{source_label}: ok (no issues)"
+    lines = [
+        f"  [{i.level.value}] {i.code}"
+        f"{(' at ' + i.path) if i.path else ''}: {i.message}"
+        for i in report.issues
+    ]
+    header = (
+        f"{source_label}: {len(report.errors)} error(s), "
+        f"{len(report.warnings)} warning(s)"
+    )
+    return header + "\n" + "\n".join(lines)
+
+
+def _cmd_policy_validate(args: argparse.Namespace) -> int:
+    from vaara.policy.validate import validate_source
+
+    policy_path = Path(args.policy).expanduser()
+    _policy, report = validate_source(policy_path)
+    print(_render_validation_report(
+        report, source_label=str(policy_path), as_json=args.json,
+    ))
+    return 0 if report.ok else 1
+
+
+def _render_test_results(results, *, as_json: bool) -> str:
+    if as_json:
+        return json.dumps({
+            "total": len(results),
+            "passed": sum(1 for r in results if r.passed),
+            "failed": sum(1 for r in results if not r.passed),
+            "results": [r.to_dict() for r in results],
+        }, indent=2)
+    lines = []
+    for r in results:
+        mark = "PASS" if r.passed else "FAIL"
+        suffix = "" if r.passed else f" — {r.diagnostic}"
+        lines.append(f"  [{mark}] {r.case.name}{suffix}")
+    failed = sum(1 for r in results if not r.passed)
+    header = f"{len(results)} case(s), {len(results) - failed} passed, {failed} failed"
+    return header + "\n" + "\n".join(lines)
+
+
+def _cmd_policy_test(args: argparse.Namespace) -> int:
+    from vaara.policy.test_cases import run_test_cases
+    from vaara.policy.test_cases_io import load_test_cases
+    from vaara.policy.validate import validate_source
+
+    policy_path = Path(args.policy).expanduser()
+    cases_path = Path(args.cases).expanduser()
+
+    policy, report = validate_source(policy_path)
+    if policy is None:
+        print(_render_validation_report(
+            report, source_label=str(policy_path), as_json=args.json,
+        ), file=sys.stderr)
+        return 2
+
+    try:
+        cases = load_test_cases(cases_path)
+    except Exception as e:
+        print(f"failed to load cases from {cases_path}: {e}", file=sys.stderr)
+        return 2
+
+    results = run_test_cases(policy, cases)
+    print(_render_test_results(results, as_json=args.json))
+    return 0 if all(r.passed for r in results) else 1
+
+
 def build_parser() -> argparse.ArgumentParser:
     p = argparse.ArgumentParser(prog="vaara", description="Vaara AI Agent Execution Layer")
     sub = p.add_subparsers(dest="cmd", required=True)
@@ -636,6 +721,38 @@ def build_parser() -> argparse.ArgumentParser:
     )
     re_.set_defaults(func=_cmd_review_expire)
 
+    pp_policy = sub.add_parser(
+        "policy",
+        help="Policy artifact commands (validate, test)",
+    )
+    psub = pp_policy.add_subparsers(dest="policy_cmd", required=True)
+
+    pvalid = psub.add_parser(
+        "validate",
+        help="Load a policy and report parse errors plus semantic warnings",
+    )
+    pvalid.add_argument("policy", help="Path to a YAML or JSON policy file")
+    pvalid.add_argument(
+        "--json", action="store_true",
+        help="Emit the report as JSON (stable shape for CI)",
+    )
+    pvalid.set_defaults(func=_cmd_policy_validate)
+
+    ptest = psub.add_parser(
+        "test",
+        help="Run a YAML/JSON cases file against a policy (Conftest analog)",
+    )
+    ptest.add_argument("policy", help="Path to a YAML or JSON policy file")
+    ptest.add_argument(
+        "--cases", required=True,
+        help="Path to a YAML or JSON file containing a 'cases:' list",
+    )
+    ptest.add_argument(
+        "--json", action="store_true",
+        help="Emit results as JSON (stable shape for CI)",
+    )
+    ptest.set_defaults(func=_cmd_policy_test)
+
     return p
 
 

@@ -12,6 +12,16 @@
 The companion JSON Schema document at `docs/policy_schema.json` is the citable
 spec for compliance-evidence purposes. Hand-rolled validation in `loader.py`
 mirrors the schema, with clean error paths for human readers.
+
+Beyond load and parse, two surfaces support reviewing the policy artifact
+independently from agent code:
+
+- ``validate`` / ``validate_source`` (in ``vaara.policy.validate``) returns a
+  structured report with parse errors and semantic warnings — usable in CI.
+- ``evaluate`` + ``run_test_cases`` (in ``vaara.policy.test_cases``) let a
+  team write synthetic action contexts and assert expected verdicts against a
+  policy, Conftest-style. YAML/JSON case files load via
+  ``load_test_cases`` (in ``vaara.policy.test_cases_io``).
 """
 
 from vaara.policy.schema import (
@@ -24,16 +34,43 @@
     Thresholds,
 )
 from vaara.policy.loader import from_dict, from_json, from_yaml
+from vaara.policy.validate import (
+    IssueLevel,
+    PolicyIssue,
+    ValidationReport,
+    validate,
+    validate_source,
+)
+from vaara.policy.test_cases import (
+    EvaluationResult,
+    PolicyTestCase,
+    PolicyTestResult,
+    evaluate,
+    run_test_cases,
+)
+from vaara.policy.test_cases_io import load_test_cases, parse_cases
 
 __all__ = [
     "SCHEMA_VERSION",
     "ActionClassDef",
     "EscalationRoute",
+    "EvaluationResult",
+    "IssueLevel",
     "Policy",
     "PolicyError",
+    "PolicyIssue",
+    "PolicyTestCase",
+    "PolicyTestResult",
     "SequencePattern",
     "Thresholds",
+    "ValidationReport",
+    "evaluate",
     "from_dict",
     "from_json",
     "from_yaml",
+    "load_test_cases",
+    "parse_cases",
+    "run_test_cases",
+    "validate",
+    "validate_source",
 ]