feat: policy validate + test framework (v0.9.0) by vaaraio · Pull Request #71 · vaaraio/vaara

vaaraio · 2026-05-15T07:40:58Z

Summary

Adds vaara.policy.validate (semantic checks → ValidationReport) and vaara.policy.test_cases (Conftest-style evaluator + cases-file runner).
New CLI surfaces: vaara policy validate POLICY [--json] and vaara policy test POLICY --cases CASES [--json].
48 new tests across four files. Full suite 472 / 472 pass. Ruff + mypy clean.
Pure addition. No schema change, no DSL, backwards-compatible.

Closes the policy-as-code expressiveness gap from a CI / compliance-review angle: the YAML / JSON policy artifact is now reviewable and testable independently from the agent code it governs.

Test plan

pytest -q → 472 / 472
ruff check . clean
mypy src/vaara clean
vaara policy validate examples/policies/full.yaml
vaara policy test examples/policies/full.yaml --cases examples/policies/test_cases.yaml

Summary by CodeRabbit

New Features
- Policy validation with semantic checking and structured error reporting
- Policy testing framework supporting YAML/JSON test case files
- New CLI commands: vaara policy validate and vaara policy test
- JSON output support for CI/CD pipeline integration
Documentation
- Added policy artifact review guidance in compliance documentation
- Updated README with policy module directory information
Tests
- Comprehensive test coverage for validation and testing features
Chores
- Version bumped to v0.9.0

Adds two CLI surfaces that turn the YAML / JSON Vaara policy into a policy-as-code artifact a compliance team can validate and test in CI, independently from the agent code it governs. - vaara.policy.validate module: validate(Policy) returns a ValidationReport with structured PolicyIssue records. Warnings for empty action_classes, narrow threshold bands, dangling threshold overrides, sequence steps not naming a declared action class, unreachable escalation routes, missing default route. validate_source combines load and check. - vaara.policy.test_cases module (Conftest analog): evaluate(policy, action_class, risk_score, matched_sequences) applies sequence boosts capped at 1.0, resolves the merged threshold, and returns EvaluationResult(verdict, boosted_risk, route). PolicyTestCase plus run_test_cases run case lists and capture evaluation errors as failed cases rather than raising. - vaara.policy.test_cases_io: load_test_cases reads YAML / JSON cases documents shaped like typical OPA / Conftest test files. - vaara policy validate POLICY_PATH [--json] and vaara policy test POLICY_PATH --cases CASES_PATH [--json] CLI subcommands. Exit codes: validate returns 1 on parse errors (warnings do not flip), test returns 1 on any failed case (2 if the policy itself fails to parse). - examples/policies/test_cases.yaml exercising examples/policies/full.yaml end-to-end across thresholds, sequence boost, default and article-matched escalation routes. - 48 new tests across four files. Full suite 472 / 472 pass. - COMPLIANCE.md gains a Policy artifact review subsection under Article 14. - README.md surfaces the new CLI in "Where things live". Backwards-compatible. Pure addition. Closes the policy-as-code expressiveness gap from a CI / compliance-review angle without extending the schema or adding a new DSL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bumps version from 0.8.0 to 0.9.0 and moves the [Unreleased] CHANGELOG block to [0.9.0] - 2026-05-14. No code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai · 2026-05-15T07:41:11Z

📝 Walkthrough

Walkthrough

This PR adds a comprehensive policy validation and test framework to Vaara. It introduces structured validation reports for semantic policy checks, a Conftest-style test evaluator that runs test cases against policies, YAML/JSON loaders for both workflows, CLI commands for validate/test with JSON output support, and public API exports making these new capabilities accessible as first-class policy review surfaces.

Changes

Policy validation and test framework

Layer / File(s)	Summary
Validation data structures and semantic checks `src/vaara/policy/validate.py`, `tests/test_policy_validate.py`	`IssueLevel` enum, `PolicyIssue`, and `ValidationReport` immutable data structures provide structured validation results; `validate()` function performs semantic checks across action classes, thresholds, sequence patterns, and escalation routes; `validate_source()` wraps loading and validation with error recovery, returning `(policy, report)` or `(None, report)` on parse/import failures.
Test case evaluation framework `src/vaara/policy/test_cases.py`, `tests/test_policy_test_cases.py`	`evaluate()` function validates inputs (risk score ranges, action class existence, sequence names), applies per-sequence risk boosts, computes verdict by threshold comparison, and derives escalation routes from regulatory article unions; `PolicyTestCase`, `EvaluationResult`, and `PolicyTestResult` dataclasses model test inputs and outcomes; `run_test_cases()` executes all cases, catches evaluation errors into diagnostics, and compares expected vs. actual verdicts/routes.
Test case loading from YAML/JSON `src/vaara/policy/test_cases_io.py`, `tests/test_policy_test_cases_io.py`	`parse_cases()` validates document structure and extracts required (`action_class`, `risk_score`) and optional (`matched_sequences`, `expect`) fields; `load_test_cases()` accepts strings or filesystem paths, auto-detects YAML by file extension and JSON by leading `{`, and delegates parsing to `_parse_text()` with explicit error wrapping for missing `[yaml]` extra.
CLI validate and test subcommands `src/vaara/cli.py`, `tests/test_policy_cli.py`	`vaara policy validate` loads a policy and reports parse/semantic issues with exit codes reflecting success/failure; `vaara policy test` runs cases against a policy and reports pass/fail per case with overall exit status; both support `--json` for machine-readable CI output; reporting helpers render results as JSON objects or human-readable text summaries.
Public API exports and package integration `src/vaara/policy/__init__.py`	Package `__all__` expanded to export validation (`IssueLevel`, `PolicyIssue`, `ValidationReport`, `validate`, `validate_source`) and test-case (`EvaluationResult`, `PolicyTestCase`, `PolicyTestResult`, `evaluate`, `run_test_cases`, `load_test_cases`, `parse_cases`) symbols; module docstring documents new validation and evaluation as independent policy review surfaces.
Example fixtures, documentation, and version bumps `examples/policies/test_cases.yaml`, `README.md`, `CHANGELOG.md`, `COMPLIANCE.md`, `pyproject.toml`, `src/vaara/__init__.py`	`test_cases.yaml` fixture defines realistic test scenarios with verdicts and routes for the example policy; README points to `src/vaara/policy/` documentation; CHANGELOG and COMPLIANCE document the new validation/test workflows and CLI commands; version bumped from 0.8.0 to 0.9.0 in both manifest and package.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

vaaraio/vaara#42: Establishes the base vaara.policy public API and __all__ exports; this PR extends that same API surface to include validation and test-case symbols.

Poem

🐰 A rabbit's hop through policies fine,
Validating verdicts in YAML design,
Test cases that pass with a verdictful way,
CLI commands to keep chaos at bay! 🧪

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: policy validate + test framework (v0.9.0)' clearly and concisely summarizes the main changes: addition of policy validation and testing capabilities with the version bump.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/policy-validate-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/test_policy_test_cases_io.py (1)
61-94: ⚡ Quick win

Add regressions for one-line inline payload and non-numeric risk_score.

These two cases would lock in the parser/loader contracts and prevent future regressions:

load_test_cases('{"cases":[...]}') as inline one-line JSON.

parse_cases(... risk_score: "not-a-number") raising PolicyError (not raw ValueError).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_policy_test_cases_io.py` around lines 61 - 94, Add two regression
tests: one that verifies load_test_cases can parse a one-line inline JSON
payload (e.g., create a tmp_path file containing
'{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}'
and assert load_test_cases(...)[0].name == "c1"), and another that ensures a
non-numeric risk_score triggers a PolicyError (create a cases file or object
with risk_score: "not-a-number" and assert that load_test_cases(...) or
parse_cases(...) raises PolicyError rather than ValueError). Reference
load_test_cases and PolicyError (and parse_cases if used) when adding tests so
the parser/loader contract is enforced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/vaara/policy/test_cases_io.py`:
- Around line 64-70: The function load_test_cases currently treats only strings
containing "\n" as inline content, which misclassifies single-line JSON/YAML
payloads as file paths; update the initial conditional in load_test_cases to
detect inline raw string input by trimming leading whitespace and checking for
JSON/YAML indicators (e.g., startswith('{') or startswith('[') or
startswith('---') or a YAML key pattern) in addition to the existing "\n" check,
then call _parse_text(parse_input) and parse_cases as before; modify the branch
that constructs Path only for true filesystem paths and preserve prefer_yaml
detection when a path suffix is present so one-line payloads like
'{"cases":[...]}' are parsed as inline content rather than causing
FileNotFoundError.

---

Nitpick comments:
In `@tests/test_policy_test_cases_io.py`:
- Around line 61-94: Add two regression tests: one that verifies load_test_cases
can parse a one-line inline JSON payload (e.g., create a tmp_path file
containing
'{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}'
and assert load_test_cases(...)[0].name == "c1"), and another that ensures a
non-numeric risk_score triggers a PolicyError (create a cases file or object
with risk_score: "not-a-number" and assert that load_test_cases(...) or
parse_cases(...) raises PolicyError rather than ValueError). Reference
load_test_cases and PolicyError (and parse_cases if used) when adding tests so
the parser/loader contract is enforced.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f323a67d-7b16-4ac1-b357-a380592102fa

📥 Commits

Reviewing files that changed from the base of the PR and between c6194b9 and 34d9041.

📒 Files selected for processing (15)

CHANGELOG.md
COMPLIANCE.md
README.md
examples/policies/test_cases.yaml
pyproject.toml
src/vaara/__init__.py
src/vaara/cli.py
src/vaara/policy/__init__.py
src/vaara/policy/test_cases.py
src/vaara/policy/test_cases_io.py
src/vaara/policy/validate.py
tests/test_policy_cli.py
tests/test_policy_test_cases.py
tests/test_policy_test_cases_io.py
tests/test_policy_validate.py

coderabbitai · 2026-05-15T07:44:03Z

+        try:
+            action_class = entry["action_class"]
+            risk_score = float(entry["risk_score"])
+        except KeyError as e:
+            raise PolicyError(
+                f"cases[{i}] ({name}): missing required field {e.args[0]!r}"
+            ) from None
+        matched = tuple(entry.get("matched_sequences") or ())
+        expect = entry.get("expect") or {}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate and normalize case field types before constructing PolicyTestCase.

Line 41 can raise ValueError/TypeError that bypasses PolicyError wrapping, and Line 46 can turn a string into per-character sequence names ("abc" → ("a","b","c")). Both lead to inconsistent parser behavior.

Proposed fix

- try: - action_class = entry["action_class"] - risk_score = float(entry["risk_score"]) - except KeyError as e: + try: + action_class = entry["action_class"] + risk_score_raw = entry["risk_score"] + except KeyError as e: raise PolicyError( f"cases[{i}] ({name}): missing required field {e.args[0]!r}" ) from None - matched = tuple(entry.get("matched_sequences") or ()) + if not isinstance(action_class, str) or not action_class: + raise PolicyError(f"cases[{i}] ({name}): 'action_class' must be a non-empty string") + try: + risk_score = float(risk_score_raw) + except (TypeError, ValueError): + raise PolicyError(f"cases[{i}] ({name}): 'risk_score' must be numeric") from None + + raw_matched = entry.get("matched_sequences") or () + if isinstance(raw_matched, str) or not isinstance(raw_matched, (list, tuple)): + raise PolicyError( + f"cases[{i}] ({name}): 'matched_sequences' must be a list of strings" + ) + if not all(isinstance(s, str) for s in raw_matched): + raise PolicyError( + f"cases[{i}] ({name}): 'matched_sequences' must contain only strings" + ) + matched = tuple(raw_matched)

coderabbitai · 2026-05-15T07:44:03Z

+def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]:
+    if isinstance(source, str) and "\n" in source:
+        return parse_cases(_parse_text(source))
+    path = Path(source) if not isinstance(source, Path) else source
+    text = path.read_text(encoding="utf-8")
+    prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
+    return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Single-line inline payloads are incorrectly treated as file paths.

Line 65 only treats strings with \n as inline content. A one-line JSON/YAML payload (e.g., '{"cases":[...]}') goes down the path branch and can fail with FileNotFoundError, which breaks the documented “raw string input” behavior.

Proposed fix

def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]: - if isinstance(source, str) and "\n" in source: - return parse_cases(_parse_text(source)) - path = Path(source) if not isinstance(source, Path) else source - text = path.read_text(encoding="utf-8") - prefer_yaml = path.suffix.lower() in {".yaml", ".yml"} - return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml)) + if isinstance(source, Path): + path = source + text = path.read_text(encoding="utf-8") + prefer_yaml = path.suffix.lower() in {".yaml", ".yml"} + return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml)) + + if "\n" in source: + return parse_cases(_parse_text(source)) + + candidate = Path(source) + if candidate.is_file(): + text = candidate.read_text(encoding="utf-8") + prefer_yaml = candidate.suffix.lower() in {".yaml", ".yml"} + return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml)) + + return parse_cases(_parse_text(source))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vaara/policy/test_cases_io.py` around lines 64 - 70, The function load_test_cases currently treats only strings containing "\n" as inline content, which misclassifies single-line JSON/YAML payloads as file paths; update the initial conditional in load_test_cases to detect inline raw string input by trimming leading whitespace and checking for JSON/YAML indicators (e.g., startswith('{') or startswith('[') or startswith('---') or a YAML key pattern) in addition to the existing "\n" check, then call _parse_text(parse_input) and parse_cases as before; modify the branch that constructs Path only for true filesystem paths and preserve prefer_yaml detection when a path suffix is present so one-line payloads like '{"cases":[...]}' are parsed as inline content rather than causing FileNotFoundError.

vaaraio and others added 2 commits May 14, 2026 23:33

chore: release v0.9.0 (policy validate + test framework)

34d9041

Bumps version from 0.8.0 to 0.9.0 and moves the [Unreleased] CHANGELOG block to [0.9.0] - 2026-05-14. No code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

vaaraio merged commit ea201fe into main May 15, 2026
10 checks passed

vaaraio deleted the feat/policy-validate-test branch May 15, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: policy validate + test framework (v0.9.0)#71

feat: policy validate + test framework (v0.9.0)#71
vaaraio merged 2 commits into
mainfrom
feat/policy-validate-test

vaaraio commented May 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

coderabbitai Bot May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaaraio commented May 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vaaraio commented May 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading