feat: policy validate + test framework (v0.9.0)#71
Conversation
Adds two CLI surfaces that turn the YAML / JSON Vaara policy into a policy-as-code artifact a compliance team can validate and test in CI, independently from the agent code it governs. - vaara.policy.validate module: validate(Policy) returns a ValidationReport with structured PolicyIssue records. Warnings for empty action_classes, narrow threshold bands, dangling threshold overrides, sequence steps not naming a declared action class, unreachable escalation routes, missing default route. validate_source combines load and check. - vaara.policy.test_cases module (Conftest analog): evaluate(policy, action_class, risk_score, matched_sequences) applies sequence boosts capped at 1.0, resolves the merged threshold, and returns EvaluationResult(verdict, boosted_risk, route). PolicyTestCase plus run_test_cases run case lists and capture evaluation errors as failed cases rather than raising. - vaara.policy.test_cases_io: load_test_cases reads YAML / JSON cases documents shaped like typical OPA / Conftest test files. - vaara policy validate POLICY_PATH [--json] and vaara policy test POLICY_PATH --cases CASES_PATH [--json] CLI subcommands. Exit codes: validate returns 1 on parse errors (warnings do not flip), test returns 1 on any failed case (2 if the policy itself fails to parse). - examples/policies/test_cases.yaml exercising examples/policies/full.yaml end-to-end across thresholds, sequence boost, default and article-matched escalation routes. - 48 new tests across four files. Full suite 472 / 472 pass. - COMPLIANCE.md gains a Policy artifact review subsection under Article 14. - README.md surfaces the new CLI in "Where things live". Backwards-compatible. Pure addition. Closes the policy-as-code expressiveness gap from a CI / compliance-review angle without extending the schema or adding a new DSL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bumps version from 0.8.0 to 0.9.0 and moves the [Unreleased] CHANGELOG block to [0.9.0] - 2026-05-14. No code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds a comprehensive policy validation and test framework to Vaara. It introduces structured validation reports for semantic policy checks, a Conftest-style test evaluator that runs test cases against policies, YAML/JSON loaders for both workflows, CLI commands for validate/test with JSON output support, and public API exports making these new capabilities accessible as first-class policy review surfaces. ChangesPolicy validation and test framework
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_policy_test_cases_io.py (1)
61-94: ⚡ Quick winAdd regressions for one-line inline payload and non-numeric
risk_score.These two cases would lock in the parser/loader contracts and prevent future regressions:
load_test_cases('{"cases":[...]}')as inline one-line JSON.parse_cases(... risk_score: "not-a-number")raisingPolicyError(not rawValueError).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_policy_test_cases_io.py` around lines 61 - 94, Add two regression tests: one that verifies load_test_cases can parse a one-line inline JSON payload (e.g., create a tmp_path file containing '{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}' and assert load_test_cases(...)[0].name == "c1"), and another that ensures a non-numeric risk_score triggers a PolicyError (create a cases file or object with risk_score: "not-a-number" and assert that load_test_cases(...) or parse_cases(...) raises PolicyError rather than ValueError). Reference load_test_cases and PolicyError (and parse_cases if used) when adding tests so the parser/loader contract is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vaara/policy/test_cases_io.py`:
- Around line 64-70: The function load_test_cases currently treats only strings
containing "\n" as inline content, which misclassifies single-line JSON/YAML
payloads as file paths; update the initial conditional in load_test_cases to
detect inline raw string input by trimming leading whitespace and checking for
JSON/YAML indicators (e.g., startswith('{') or startswith('[') or
startswith('---') or a YAML key pattern) in addition to the existing "\n" check,
then call _parse_text(parse_input) and parse_cases as before; modify the branch
that constructs Path only for true filesystem paths and preserve prefer_yaml
detection when a path suffix is present so one-line payloads like
'{"cases":[...]}' are parsed as inline content rather than causing
FileNotFoundError.
---
Nitpick comments:
In `@tests/test_policy_test_cases_io.py`:
- Around line 61-94: Add two regression tests: one that verifies load_test_cases
can parse a one-line inline JSON payload (e.g., create a tmp_path file
containing
'{"cases":[{"name":"c1","action_class":"tx.sign","risk_score":0.3,"expect":{"verdict":"allow"}}]}'
and assert load_test_cases(...)[0].name == "c1"), and another that ensures a
non-numeric risk_score triggers a PolicyError (create a cases file or object
with risk_score: "not-a-number" and assert that load_test_cases(...) or
parse_cases(...) raises PolicyError rather than ValueError). Reference
load_test_cases and PolicyError (and parse_cases if used) when adding tests so
the parser/loader contract is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f323a67d-7b16-4ac1-b357-a380592102fa
📒 Files selected for processing (15)
CHANGELOG.mdCOMPLIANCE.mdREADME.mdexamples/policies/test_cases.yamlpyproject.tomlsrc/vaara/__init__.pysrc/vaara/cli.pysrc/vaara/policy/__init__.pysrc/vaara/policy/test_cases.pysrc/vaara/policy/test_cases_io.pysrc/vaara/policy/validate.pytests/test_policy_cli.pytests/test_policy_test_cases.pytests/test_policy_test_cases_io.pytests/test_policy_validate.py
| try: | ||
| action_class = entry["action_class"] | ||
| risk_score = float(entry["risk_score"]) | ||
| except KeyError as e: | ||
| raise PolicyError( | ||
| f"cases[{i}] ({name}): missing required field {e.args[0]!r}" | ||
| ) from None | ||
| matched = tuple(entry.get("matched_sequences") or ()) | ||
| expect = entry.get("expect") or {} |
There was a problem hiding this comment.
Validate and normalize case field types before constructing PolicyTestCase.
Line 41 can raise ValueError/TypeError that bypasses PolicyError wrapping, and Line 46 can turn a string into per-character sequence names ("abc" → ("a","b","c")). Both lead to inconsistent parser behavior.
Proposed fix
- try:
- action_class = entry["action_class"]
- risk_score = float(entry["risk_score"])
- except KeyError as e:
+ try:
+ action_class = entry["action_class"]
+ risk_score_raw = entry["risk_score"]
+ except KeyError as e:
raise PolicyError(
f"cases[{i}] ({name}): missing required field {e.args[0]!r}"
) from None
- matched = tuple(entry.get("matched_sequences") or ())
+ if not isinstance(action_class, str) or not action_class:
+ raise PolicyError(f"cases[{i}] ({name}): 'action_class' must be a non-empty string")
+ try:
+ risk_score = float(risk_score_raw)
+ except (TypeError, ValueError):
+ raise PolicyError(f"cases[{i}] ({name}): 'risk_score' must be numeric") from None
+
+ raw_matched = entry.get("matched_sequences") or ()
+ if isinstance(raw_matched, str) or not isinstance(raw_matched, (list, tuple)):
+ raise PolicyError(
+ f"cases[{i}] ({name}): 'matched_sequences' must be a list of strings"
+ )
+ if not all(isinstance(s, str) for s in raw_matched):
+ raise PolicyError(
+ f"cases[{i}] ({name}): 'matched_sequences' must contain only strings"
+ )
+ matched = tuple(raw_matched)| def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]: | ||
| if isinstance(source, str) and "\n" in source: | ||
| return parse_cases(_parse_text(source)) | ||
| path = Path(source) if not isinstance(source, Path) else source | ||
| text = path.read_text(encoding="utf-8") | ||
| prefer_yaml = path.suffix.lower() in {".yaml", ".yml"} | ||
| return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml)) |
There was a problem hiding this comment.
Single-line inline payloads are incorrectly treated as file paths.
Line 65 only treats strings with \n as inline content. A one-line JSON/YAML payload (e.g., '{"cases":[...]}') goes down the path branch and can fail with FileNotFoundError, which breaks the documented “raw string input” behavior.
Proposed fix
def load_test_cases(source: Union[str, Path]) -> list[PolicyTestCase]:
- if isinstance(source, str) and "\n" in source:
- return parse_cases(_parse_text(source))
- path = Path(source) if not isinstance(source, Path) else source
- text = path.read_text(encoding="utf-8")
- prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
- return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+ if isinstance(source, Path):
+ path = source
+ text = path.read_text(encoding="utf-8")
+ prefer_yaml = path.suffix.lower() in {".yaml", ".yml"}
+ return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+
+ if "\n" in source:
+ return parse_cases(_parse_text(source))
+
+ candidate = Path(source)
+ if candidate.is_file():
+ text = candidate.read_text(encoding="utf-8")
+ prefer_yaml = candidate.suffix.lower() in {".yaml", ".yml"}
+ return parse_cases(_parse_text(text, prefer_yaml=prefer_yaml))
+
+ return parse_cases(_parse_text(source))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vaara/policy/test_cases_io.py` around lines 64 - 70, The function
load_test_cases currently treats only strings containing "\n" as inline content,
which misclassifies single-line JSON/YAML payloads as file paths; update the
initial conditional in load_test_cases to detect inline raw string input by
trimming leading whitespace and checking for JSON/YAML indicators (e.g.,
startswith('{') or startswith('[') or startswith('---') or a YAML key pattern)
in addition to the existing "\n" check, then call _parse_text(parse_input) and
parse_cases as before; modify the branch that constructs Path only for true
filesystem paths and preserve prefer_yaml detection when a path suffix is
present so one-line payloads like '{"cases":[...]}' are parsed as inline content
rather than causing FileNotFoundError.
Summary
vaara.policy.validate(semantic checks →ValidationReport) andvaara.policy.test_cases(Conftest-style evaluator + cases-file runner).vaara policy validate POLICY [--json]andvaara policy test POLICY --cases CASES [--json].Closes the policy-as-code expressiveness gap from a CI / compliance-review angle: the YAML / JSON policy artifact is now reviewable and testable independently from the agent code it governs.
Test plan
pytest -q→ 472 / 472ruff check .cleanmypy src/vaaracleanvaara policy validate examples/policies/full.yamlvaara policy test examples/policies/full.yaml --cases examples/policies/test_cases.yamlSummary by CodeRabbit
New Features
vaara policy validateandvaara policy testDocumentation
Tests
Chores