Skip to content

feat: policy validate + test framework (v0.9.0)#71

Merged
vaaraio merged 2 commits into
mainfrom
feat/policy-validate-test
May 15, 2026
Merged

feat: policy validate + test framework (v0.9.0)#71
vaaraio merged 2 commits into
mainfrom
feat/policy-validate-test

Conversation

@vaaraio
Copy link
Copy Markdown
Owner

@vaaraio vaaraio commented May 15, 2026

Summary

  • Adds vaara.policy.validate (semantic checks → ValidationReport) and vaara.policy.test_cases (Conftest-style evaluator + cases-file runner).
  • New CLI surfaces: vaara policy validate POLICY [--json] and vaara policy test POLICY --cases CASES [--json].
  • 48 new tests across four files. Full suite 472 / 472 pass. Ruff + mypy clean.
  • Pure addition. No schema change, no DSL, backwards-compatible.

Closes the policy-as-code expressiveness gap from a CI / compliance-review angle: the YAML / JSON policy artifact is now reviewable and testable independently from the agent code it governs.

Test plan

  • pytest -q → 472 / 472
  • ruff check . clean
  • mypy src/vaara clean
  • vaara policy validate examples/policies/full.yaml
  • vaara policy test examples/policies/full.yaml --cases examples/policies/test_cases.yaml

Summary by CodeRabbit

  • New Features

    • Policy validation with semantic checking and structured error reporting
    • Policy testing framework supporting YAML/JSON test case files
    • New CLI commands: vaara policy validate and vaara policy test
    • JSON output support for CI/CD pipeline integration
  • Documentation

    • Added policy artifact review guidance in compliance documentation
    • Updated README with policy module directory information
  • Tests

    • Comprehensive test coverage for validation and testing features
  • Chores

    • Version bumped to v0.9.0

Review Change Stack

vaaraio and others added 2 commits May 14, 2026 23:33
Adds two CLI surfaces that turn the YAML / JSON Vaara policy into a
policy-as-code artifact a compliance team can validate and test in CI,
independently from the agent code it governs.

- vaara.policy.validate module: validate(Policy) returns a
  ValidationReport with structured PolicyIssue records. Warnings for
  empty action_classes, narrow threshold bands, dangling threshold
  overrides, sequence steps not naming a declared action class,
  unreachable escalation routes, missing default route.
  validate_source combines load and check.
- vaara.policy.test_cases module (Conftest analog):
  evaluate(policy, action_class, risk_score, matched_sequences)
  applies sequence boosts capped at 1.0, resolves the merged
  threshold, and returns EvaluationResult(verdict, boosted_risk,
  route). PolicyTestCase plus run_test_cases run case lists and
  capture evaluation errors as failed cases rather than raising.
- vaara.policy.test_cases_io: load_test_cases reads YAML / JSON cases
  documents shaped like typical OPA / Conftest test files.
- vaara policy validate POLICY_PATH [--json] and vaara policy test
  POLICY_PATH --cases CASES_PATH [--json] CLI subcommands. Exit
  codes: validate returns 1 on parse errors (warnings do not flip),
  test returns 1 on any failed case (2 if the policy itself fails
  to parse).
- examples/policies/test_cases.yaml exercising
  examples/policies/full.yaml end-to-end across thresholds, sequence
  boost, default and article-matched escalation routes.
- 48 new tests across four files. Full suite 472 / 472 pass.
- COMPLIANCE.md gains a Policy artifact review subsection under
  Article 14.
- README.md surfaces the new CLI in "Where things live".

Backwards-compatible. Pure addition. Closes the policy-as-code
expressiveness gap from a CI / compliance-review angle without
extending the schema or adding a new DSL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps version from 0.8.0 to 0.9.0 and moves the [Unreleased]
CHANGELOG block to [0.9.0] - 2026-05-14. No code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR adds a comprehensive policy validation and test framework to Vaara. It introduces structured validation reports for semantic policy checks, a Conftest-style test evaluator that runs test cases against policies, YAML/JSON loaders for both workflows, CLI commands for validate/test with JSON output support, and public API exports making these new capabilities accessible as first-class policy review surfaces.

Changes

Policy validation and test framework

Layer / File(s) Summary
Validation data structures and semantic checks
src/vaara/policy/validate.py, tests/test_policy_validate.py
IssueLevel enum, PolicyIssue, and ValidationReport immutable data structures provide structured validation results; validate() function performs semantic checks across action classes, thresholds, sequence patterns, and escalation routes; validate_source() wraps loading and validation with error recovery, returning (policy, report) or (None, report) on parse/import failures.
Test case evaluation framework
src/vaara/policy/test_cases.py, tests/test_policy_test_cases.py
evaluate() function validates inputs (risk score ranges, action class existence, sequence names), applies per-sequence risk boosts, computes verdict by threshold comparison, and derives escalation routes from regulatory article unions; PolicyTestCase, EvaluationResult, and PolicyTestResult dataclasses model test inputs and outcomes; run_test_cases() executes all cases, catches evaluation errors into diagnostics, and compares expected vs. actual verdicts/routes.
Test case loading from YAML/JSON
src/vaara/policy/test_cases_io.py, tests/test_policy_test_cases_io.py
parse_cases() validates document structure and extracts required (action_class, risk_score) and optional (matched_sequences, expect) fields; load_test_cases() accepts strings or filesystem paths, auto-detects YAML by file extension and JSON by leading {, and delegates parsing to _parse_text() with explicit error wrapping for missing [yaml] extra.
CLI validate and test subcommands
src/vaara/cli.py, tests/test_policy_cli.py
vaara policy validate loads a policy and reports parse/semantic issues with exit codes reflecting success/failure; vaara policy test runs cases against a policy and reports pass/fail per case with overall exit status; both support --json for machine-readable CI output; reporting helpers render results as JSON objects or human-readable text summaries.
Public API exports and package integration
src/vaara/policy/__init__.py
Package __all__ expanded to export validation (IssueLevel, PolicyIssue, ValidationReport, validate, validate_source) and test-case (EvaluationResult, PolicyTestCase, PolicyTestResult, evaluate, run_test_cases, load_test_cases, parse_cases) symbols; module docstring documents new validation and evaluation as independent policy review surfaces.
Example fixtures, documentation, and version bumps
examples/policies/test_cases.yaml, README.md, CHANGELOG.md, COMPLIANCE.md, pyproject.toml, src/vaara/__init__.py
test_cases.yaml fixture defines realistic test scenarios with verdicts and routes for the example policy; README points to src/vaara/policy/ documentation; CHANGELOG and COMPLIANCE document the new validation/test workflows and CLI commands; version bumped from 0.8.0 to 0.9.0 in both manifest and package.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • vaaraio/vaara#42: Establishes the base vaara.policy public API and __all__ exports; this PR extends that same API surface to include validation and test-case symbols.

Poem

🐰 A rabbit's hop through policies fine,
Validating verdicts in YAML design,
Test cases that pass with a verdictful way,
CLI commands to keep chaos at bay! 🧪

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: policy validate + test framework (v0.9.0)' clearly and concisely summarizes the main changes: addition of policy validation and testing capabilities with the version bump.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/policy-validate-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_policy_test_cases_io.py (1)

61-94: ⚡ Quick win

Add regressions for one-line inline payload and non-numeric risk_score.

These two cases would lock in the parser/loader contracts and prevent future regressions:

  1. load_test_cases('{"cases":[...]}') as inline one-line JSON.
  2. parse_cases(... risk_score: "not-a-number") raising PolicyError (not raw ValueError).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_policy_test_cases_io.py` around lines 61 - 94, Add two regression
tests: one that verifies load_test_cases can parse a one-line inline JSON
payload (e.g., create a tmp_path file containing
'{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}'
and assert load_test_cases(...)[0].name == "c1"), and another that ensures a
non-numeric risk_score triggers a PolicyError (create a cases file or object
with risk_score: "not-a-number" and assert that load_test_cases(...) or
parse_cases(...) raises PolicyError rather than ValueError). Reference
load_test_cases and PolicyError (and parse_cases if used) when adding tests so
the parser/loader contract is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vaara/policy/test_cases_io.py`:
- Around line 64-70: The function load_test_cases currently treats only strings
containing "\n" as inline content, which misclassifies single-line JSON/YAML
payloads as file paths; update the initial conditional in load_test_cases to
detect inline raw string input by trimming leading whitespace and checking for
JSON/YAML indicators (e.g., startswith('{') or startswith('[') or
startswith('---') or a YAML key pattern) in addition to the existing "\n" check,
then call _parse_text(parse_input) and parse_cases as before; modify the branch
that constructs Path only for true filesystem paths and preserve prefer_yaml
detection when a path suffix is present so one-line payloads like
'{"cases":[...]}' are parsed as inline content rather than causing
FileNotFoundError.

---

Nitpick comments:
In `@tests/test_policy_test_cases_io.py`:
- Around line 61-94: Add two regression tests: one that verifies load_test_cases
can parse a one-line inline JSON payload (e.g., create a tmp_path file
containing
'{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}'
and assert load_test_cases(...)[0].name == "c1"), and another that ensures a
non-numeric risk_score triggers a PolicyError (create a cases file or object
with risk_score: "not-a-number" and assert that load_test_cases(...) or
parse_cases(...) raises PolicyError rather than ValueError). Reference
load_test_cases and PolicyError (and parse_cases if used) when adding tests so
the parser/loader contract is enforced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f323a67d-7b16-4ac1-b357-a380592102fa

📥 Commits

Reviewing files that changed from the base of the PR and between c6194b9 and 34d9041.

📒 Files selected for processing (15)
  • CHANGELOG.md
  • COMPLIANCE.md
  • README.md
  • examples/policies/test_cases.yaml
  • pyproject.toml
  • src/vaara/__init__.py
  • src/vaara/cli.py
  • src/vaara/policy/__init__.py
  • src/vaara/policy/test_cases.py
  • src/vaara/policy/test_cases_io.py
  • src/vaara/policy/validate.py
  • tests/test_policy_cli.py
  • tests/test_policy_test_cases.py
  • tests/test_policy_test_cases_io.py
  • tests/test_policy_validate.py

Comment on lines +39 to +47
try:
action_class = entry["action_class"]
risk_score = float(entry["risk_score"])
except KeyError as e:
raise PolicyError(
f"cases[{i}] ({name}): missing required field {e.args[0]!r}"
) from None
matched = tuple(entry.get("matched_sequences") or ())
expect = entry.get("expect") or {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and normalize case field types before constructing PolicyTestCase.

Line 41 can raise ValueError/TypeError that bypasses PolicyError wrapping, and Line 46 can turn a string into per-character sequence names ("abc"("a","b","c")). Both lead to inconsistent parser behavior.

Proposed fix
-        try:
-            action_class = entry["action_class"]
-            risk_score = float(entry["risk_score"])
-        except KeyError as e:
+        try:
+            action_class = entry["action_class"]
+            risk_score_raw = entry["risk_score"]
+        except KeyError as e:
             raise PolicyError(
                 f"cases[{i}] ({name}): missing required field {e.args[0]!r}"
             ) from None
-        matched = tuple(entry.get("matched_sequences") or ())
+        if not isinstance(action_class, str) or not action_class:
+            raise PolicyError(f"cases[{i}] ({name}): 'action_class' must be a non-empty string")
+        try:
+            risk_score = float(risk_score_raw)
+        except (TypeError, ValueError):
+            raise PolicyError(f"cases[{i}] ({name}): 'risk_score' must be numeric") from None
+
+        raw_matched = entry.get("matched_sequences") or ()
+        if isinstance(raw_matched, str) or not isinstance(raw_matched, (list, tuple)):
+            raise PolicyError(
+                f"cases[{i}] ({name}): 'matched_sequences' must be a list of strings"
+            )
+        if not all(isinstance(s, str) for s in raw_matched):
+            raise PolicyError(
+                f"cases[{i}] ({name}): 'matched_sequences' must contain only strings"
+            )
+        matched = tuple(raw_matched)

Comment on lines +64 to +70
def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]:
if isinstance(source, str) and "\n" in source:
return parse_cases(_parse_text(source))
path = Path(source) if not isinstance(source, Path) else source
text = path.read_text(encoding="utf-8")
prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Single-line inline payloads are incorrectly treated as file paths.

Line 65 only treats strings with \n as inline content. A one-line JSON/YAML payload (e.g., '{"cases":[...]}') goes down the path branch and can fail with FileNotFoundError, which breaks the documented “raw string input” behavior.

Proposed fix
 def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]:
-    if isinstance(source, str) and "\n" in source:
-        return parse_cases(_parse_text(source))
-    path = Path(source) if not isinstance(source, Path) else source
-    text = path.read_text(encoding="utf-8")
-    prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
-    return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+    if isinstance(source, Path):
+        path = source
+        text = path.read_text(encoding="utf-8")
+        prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
+        return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+
+    if "\n" in source:
+        return parse_cases(_parse_text(source))
+
+    candidate = Path(source)
+    if candidate.is_file():
+        text = candidate.read_text(encoding="utf-8")
+        prefer_yaml = candidate.suffix.lower() in {".yaml", ".yml"}
+        return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+
+    return parse_cases(_parse_text(source))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/vaara/policy/test_cases_io.py` around lines 64 - 70, The function
load_test_cases currently treats only strings containing "\n" as inline content,
which misclassifies single-line JSON/YAML payloads as file paths; update the
initial conditional in load_test_cases to detect inline raw string input by
trimming leading whitespace and checking for JSON/YAML indicators (e.g.,
startswith('{') or startswith('[') or startswith('---') or a YAML key pattern)
in addition to the existing "\n" check, then call _parse_text(parse_input) and
parse_cases as before; modify the branch that constructs Path only for true
filesystem paths and preserve prefer_yaml detection when a path suffix is
present so one-line payloads like '{"cases":[...]}' are parsed as inline content
rather than causing FileNotFoundError.

@vaaraio vaaraio merged commit ea201fe into main May 15, 2026
10 checks passed
@vaaraio vaaraio deleted the feat/policy-validate-test branch May 15, 2026 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant