diff --git a/CHANGELOG.md b/CHANGELOG.md index e45d3b0c..8eef6651 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,6 +6,57 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht ## [Unreleased] +## [0.10.0] - 2026-05-16 + +**Theme: Vaara as the kernel others build around.** v0.10.0 ships the +network-callable surface, the auditor-facing evidence artefact, and the +offline-verifiable receipt pair. Each of the three pieces is additive +and backward-compatible; together they reposition Vaara from a Python +library to a runtime kernel that control planes, audit consumers, and +orchestration frameworks reference. The HTTP contract at +`docs/openapi.yaml` is versioned `/v1/` independently of the project +version, following the OPA pattern. + +### Added +- **HTTP API reference server (`vaara[server]` extra).** Exposes the + conformal scorer and hash-chained audit trail over HTTP per the + contract in `docs/openapi.yaml`. Endpoints: `POST /v1/score`, + `POST /v1/score/outcome`, `POST /v1/audit/events`, + `GET /v1/audit/actions/{action_id}/chain`, `POST /v1/audit/verify`, + `GET /v1/server`, `GET /v1/health`. The spec is authoritative; the + reference server in `src/vaara/server/` is a FastAPI implementation + suitable for local development and modest production loads. +- **`vaara serve`** CLI subcommand. +- **OpenAPI 3.1 contract at `docs/openapi.yaml`.** Stable v1 surface, + intended as the integration point for control planes, orchestration + frameworks, and audit consumers. Vaara defines the interface; the + vendors call it. +- 11 new HTTP server tests (`tests/test_server.py`). +- **Auditor-facing evidence report rendering.** New module + `vaara.compliance.render` with `render_markdown`, `render_json`, and + `render_narrative` for the `ConformityReport` produced by + `ComplianceEngine.assess`. Markdown output has per-domain article + tables, per-article detail sections, evidence status badges, + audit-chain integrity flagging, and a deployer-owns-the-decision + disclaimer suitable for shipping to a regulator or attaching to an + internal conformity submission. +- **`vaara compliance report --db PATH --format md|json|narrative + [--out FILE]`** CLI subcommand. Loads an audit SQLite DB, runs + `ComplianceEngine.assess`, renders to chosen format. +- 5 new compliance-render tests (`tests/test_compliance_render.py`). +- **Article 12 commit-prove receipt pair.** New module + `vaara.audit.receipts` derives an offline-verifiable receipt from the + existing audit chain: a `commit_hash` covering the gate-time decision + (action_id, decision, risk_score, thresholds, decided_at) and an + `outcome_hash` covering the post-execution outcome and embedding the + commit_hash. Open-standards SHA-256 over canonical JSON, no external + cryptography library required. Verification needs only `hashlib`, + enabling per-action handoff to auditors without sharing the full chain + or key material. +- **`vaara trail receipt --db PATH --action-id ID [--out FILE]`** CLI + subcommand. Extracts and verifies the receipt pair, prints JSON. +- 11 new receipt tests (`tests/test_receipts.py`). + ## [0.9.0] - 2026-05-15 **Theme: policy artifact validate + test framework.** v0.9.0 ships the diff --git a/README.md b/README.md index ad97d6e9..f24dc1df 100644 --- a/README.md +++ b/README.md @@ -55,6 +55,23 @@ else: `report_outcome` closes the loop. MWU reweights signals based on which ones predicted the outcome. +## HTTP API + +The same scorer and audit trail are available over HTTP for non-Python agents and for control planes that prefer a network boundary. Install with the `server` extra: + +``` +pip install 'vaara[server]' +vaara serve --host 0.0.0.0 --port 8000 +``` + +``` +curl -sX POST http://localhost:8000/v1/score \ + -H 'content-type: application/json' \ + -d '{"tool_name":"tx.transfer","agent_id":"agent-007","base_risk_score":0.5}' +``` + +The contract is in [docs/openapi.yaml](docs/openapi.yaml). Vaara defines the interface; control-plane and orchestration vendors call it. Integration recipes for adopters live under `examples/recipes/`. + ## Where things live - [docs/formal_specification.md](docs/formal_specification.md): math. MWU regret bound O(sqrt(T log N)), conformal coverage guarantees, security properties. diff --git a/docs/openapi.yaml b/docs/openapi.yaml new file mode 100644 index 00000000..4b16aa63 --- /dev/null +++ b/docs/openapi.yaml @@ -0,0 +1,431 @@ +openapi: 3.1.0 +info: + title: Vaara HTTP API + version: "1.0.0" + summary: Conformal-scoring risk evaluation and hash-chained audit emission for AI agent actions. + description: | + The Vaara HTTP API exposes the Vaara runtime kernel as a network surface. + Any agent runtime, control plane, or evaluation framework can call the + `/v1/score` endpoint to obtain a calibrated risk assessment for an action, + and `/v1/audit/*` endpoints to append events to a hash-chained audit trail + aligned with EU AI Act Articles 12 (record-keeping), 14 (human oversight), + and 17 (quality management). + + The interface is implementation-agnostic. A reference server ships with + `pip install vaara[server]`, but the spec is authoritative; alternative + servers in any language can implement it without changing the contract. + + Versioning: paths are prefixed with `/v1/`. Breaking changes increment + the path version. Additive changes do not. + license: + name: Apache-2.0 + url: https://www.apache.org/licenses/LICENSE-2.0 + contact: + name: Vaara + url: https://vaara.io +servers: + - url: http://localhost:8000 + description: Local reference server (default) +tags: + - { name: score, description: Risk evaluation } + - { name: audit, description: Hash-chained audit trail } + - { name: server, description: Server identity and liveness } + +paths: + /v1/score: + post: + tags: [score] + summary: Score an action. + description: | + Submit an action context and receive a calibrated risk assessment with + an `allow / escalate / deny` decision. The scorer is stateful: it + maintains conformal calibration, MWU expert weights, and per-agent + sequence-pattern history across requests. + operationId: score + requestBody: + required: true + content: + application/json: + schema: { $ref: "#/components/schemas/ScoreRequest" } + responses: + "200": + description: Risk assessment. + content: + application/json: + schema: { $ref: "#/components/schemas/ScoreResponse" } + "400": + description: Malformed request. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + "503": + description: Scorer unavailable. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + + /v1/score/outcome: + post: + tags: [score] + summary: Report an outcome for online learning. + description: | + Report the observed outcome of a previously-scored action. Outcomes + feed the MWU expert system and conformal calibration. An action's + outcome can be reported once; subsequent reports for the same + `action_id` are no-ops. + operationId: reportOutcome + requestBody: + required: true + content: + application/json: + schema: { $ref: "#/components/schemas/OutcomeRequest" } + responses: + "204": { description: Outcome recorded. } + "400": + description: Malformed request. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + "404": + description: Unknown action_id. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + + /v1/audit/events: + post: + tags: [audit] + summary: Append an event to the audit chain. + description: | + Append a typed event to the hash-chained audit trail. The server + returns the assigned `event_id` and `chain_position`. The event is + cryptographically linked to the previous event; tampering invalidates + the chain past the modified point. + operationId: appendAuditEvent + requestBody: + required: true + content: + application/json: + schema: { $ref: "#/components/schemas/AuditEventRequest" } + responses: + "201": + description: Event appended. + content: + application/json: + schema: { $ref: "#/components/schemas/AuditEventResponse" } + "400": + description: Malformed event. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + + /v1/audit/actions/{action_id}/chain: + get: + tags: [audit] + summary: Read the audit chain for an action. + description: | + Return every event recorded for a given `action_id`, in order. Each + event carries its hash linkage so an external auditor can verify + integrity offline. + operationId: readActionChain + parameters: + - name: action_id + in: path + required: true + schema: { type: string } + responses: + "200": + description: Chain segment for this action. + content: + application/json: + schema: { $ref: "#/components/schemas/AuditChain" } + "404": + description: Unknown action_id. + content: + application/json: + schema: { $ref: "#/components/schemas/Error" } + + /v1/audit/verify: + post: + tags: [audit] + summary: Verify the audit chain. + description: | + Verify hash-chain integrity over a range of events. If `from_event_id` + and `to_event_id` are omitted, the full chain is verified. Returns the + first broken link if any, or a clean verification report. + operationId: verifyAuditChain + requestBody: + required: false + content: + application/json: + schema: { $ref: "#/components/schemas/VerifyRequest" } + responses: + "200": + description: Verification result. + content: + application/json: + schema: { $ref: "#/components/schemas/VerifyResponse" } + + /v1/server: + get: + tags: [server] + summary: Server identity and capabilities. + operationId: serverIdentity + responses: + "200": + description: Server info. + content: + application/json: + schema: { $ref: "#/components/schemas/ServerInfo" } + + /v1/health: + get: + tags: [server] + summary: Liveness probe. + operationId: health + responses: + "200": + description: Server is alive. + content: + application/json: + schema: + type: object + required: [status] + properties: + status: { type: string, enum: [ok] } + +components: + schemas: + ScoreRequest: + type: object + required: [tool_name, agent_id] + properties: + tool_name: + type: string + maxLength: 512 + example: "tx.transfer" + agent_id: + type: string + maxLength: 256 + example: "agent-007" + action_type: + type: string + description: Optional category hint (transaction, data, infra, etc.). + parameters: + type: object + description: Action parameters. Used for signal extraction. + additionalProperties: true + agent_confidence: + type: number + format: float + minimum: 0 + maximum: 1 + base_risk_score: + type: number + format: float + minimum: 0 + maximum: 1 + reversibility: + type: string + enum: [reversible, partially_reversible, irreversible] + blast_radius: + type: string + enum: [local, account, organization, global] + session_id: + type: string + maxLength: 256 + parent_action_id: + type: string + maxLength: 128 + context: + type: object + description: Arbitrary additional context. Not used by core scorer. + additionalProperties: true + + ScoreResponse: + type: object + required: [action_id, decision, risk, thresholds, evaluation_ms] + properties: + action_id: + type: string + description: Server-assigned identifier. Use with /score/outcome and audit endpoints. + decision: + type: string + enum: [allow, escalate, deny] + risk: + type: object + required: [point, lower, upper, alpha] + properties: + point: { type: number, format: float, minimum: 0, maximum: 1 } + lower: { type: number, format: float, minimum: 0, maximum: 1 } + upper: + type: number + format: float + minimum: 0 + maximum: 1 + description: Upper bound. The decision uses this conservatively. + alpha: + type: number + format: float + minimum: 0 + maximum: 1 + description: Effective miscoverage rate for this assessment. + bucket: + type: string + nullable: true + description: Mondrian bucket category, or null for marginal. + signals: + type: object + description: Per-expert signal contributions, name to value in [0,1]. + additionalProperties: { type: number, format: float } + mwu_weights: + type: object + description: Current MWU expert weights snapshot. + additionalProperties: { type: number, format: float } + thresholds: + type: object + required: [allow, deny] + properties: + allow: { type: number, format: float } + deny: { type: number, format: float } + sequence_risk: { type: number, format: float } + calibration_size: { type: integer, minimum: 0 } + evaluation_ms: { type: number, format: float } + explanation: { type: string } + + OutcomeRequest: + type: object + required: [action_id, outcome_severity] + properties: + action_id: { type: string } + outcome_severity: + type: number + format: float + minimum: 0 + maximum: 1 + description: 0 = benign, 1 = catastrophic. Drives MWU weight update. + notes: { type: string } + + AuditEventRequest: + type: object + required: [event_type, action_id] + properties: + event_type: + type: string + enum: + - action_requested + - risk_scored + - decision_made + - action_executed + - action_blocked + - escalation_sent + - escalation_resolved + - outcome_recorded + - policy_override + action_id: { type: string } + agent_id: { type: string } + tool_name: { type: string } + payload: + type: object + description: Event-type-specific payload. + additionalProperties: true + + AuditEventResponse: + type: object + required: [event_id, chain_position, event_hash, previous_hash, timestamp] + properties: + event_id: { type: string } + chain_position: { type: integer, minimum: 0 } + event_hash: + type: string + description: SHA-256 of this event including previous_hash. + pattern: "^[0-9a-f]{64}$" + previous_hash: + type: string + pattern: "^[0-9a-f]{64}$" + timestamp: + type: string + format: date-time + + AuditChain: + type: object + required: [action_id, events] + properties: + action_id: { type: string } + events: + type: array + items: + type: object + required: [event_id, event_type, chain_position, event_hash, previous_hash, timestamp] + properties: + event_id: { type: string } + event_type: { type: string } + chain_position: { type: integer } + event_hash: { type: string } + previous_hash: { type: string } + timestamp: { type: string, format: date-time } + payload: + type: object + additionalProperties: true + + VerifyRequest: + type: object + properties: + from_event_id: + type: string + description: Inclusive start. Omit for chain start. + to_event_id: + type: string + description: Inclusive end. Omit for chain end. + + VerifyResponse: + type: object + required: [valid, events_checked] + properties: + valid: { type: boolean } + events_checked: { type: integer, minimum: 0 } + first_break: + type: object + nullable: true + description: Present iff valid is false. Identifies the first broken link. + properties: + event_id: { type: string } + chain_position: { type: integer } + expected_previous_hash: { type: string } + actual_previous_hash: { type: string } + + ServerInfo: + type: object + required: [name, version, vaara_version, capabilities] + properties: + name: { type: string, example: "vaara-reference-server" } + version: { type: string, example: "1.0.0" } + vaara_version: { type: string, example: "0.10.0" } + capabilities: + type: object + properties: + score: { type: boolean } + audit: { type: boolean } + outcome_feedback: { type: boolean } + scorer: + type: object + properties: + type: { type: string } + calibration_size: { type: integer } + threshold_allow: { type: number, format: float } + threshold_deny: { type: number, format: float } + alpha: { type: number, format: float } + + Error: + type: object + required: [error] + properties: + error: + type: object + required: [code, message] + properties: + code: { type: string } + message: { type: string } + details: + type: object + additionalProperties: true diff --git a/pyproject.toml b/pyproject.toml index d7f95f9f..a67ee3cb 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "vaara" -version = "0.9.0" +version = "0.10.0" description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance" requires-python = ">=3.10" license = "Apache-2.0" @@ -42,6 +42,7 @@ dev = [ export = ["cryptography>=41.0"] ml = ["xgboost>=2.0", "scikit-learn>=1.3", "joblib>=1.3", "numpy>=1.24"] yaml = ["pyyaml>=6.0"] +server = ["fastapi>=0.110", "uvicorn>=0.27"] [project.scripts] vaara = "vaara.cli:main" diff --git a/src/vaara/__init__.py b/src/vaara/__init__.py index 89d7d4c6..59d33f9e 100644 --- a/src/vaara/__init__.py +++ b/src/vaara/__init__.py @@ -6,7 +6,7 @@ oversight. """ -__version__ = "0.9.0" +__version__ = "0.10.0" from vaara.pipeline import InterceptionPipeline, InterceptionResult diff --git a/src/vaara/audit/receipts.py b/src/vaara/audit/receipts.py new file mode 100644 index 00000000..0029f619 --- /dev/null +++ b/src/vaara/audit/receipts.py @@ -0,0 +1,245 @@ +"""Article 12 commit-prove receipt pair. + +A receipt binds a gate-time commitment to its post-execution outcome +via two SHA-256 hashes: + +- ``commit_hash`` covers (action_id, decision, risk_score, thresholds, + decided_at). Proves the runtime committed to a specific decision before + the action ran. +- ``outcome_hash`` covers (action_id, commit_hash, outcome_severity, + outcome_payload, recorded_at). Proves a specific outcome belongs to + that specific commitment by embedding the commit_hash. + +The two hashes form an offline-verifiable chain of accountability for one +action. The full audit chain still protects integrity in aggregate; this +is a structured pairing for per-action handoff. Verification needs only +``hashlib.sha256`` — no key infrastructure, no external crypto libs. +""" + +from __future__ import annotations + +import hashlib +import json +import math +from dataclasses import asdict, dataclass, field +from typing import Any, Optional + +from vaara.audit.trail import AuditRecord, AuditTrail, EventType + + +@dataclass(frozen=True) +class CommitPayload: + """Canonical commit-side payload. Pre-action decision.""" + + action_id: str + decision: str + risk_score: float + threshold_allow: float + threshold_deny: float + decided_at: float + + def canonical_json(self) -> str: + return _canonical_json(asdict(self)) + + def hash(self) -> str: + return _sha256_hex(self.canonical_json()) + + +@dataclass(frozen=True) +class OutcomePayload: + """Canonical outcome-side payload. Post-execution observation.""" + + action_id: str + commit_hash: str + outcome_severity: float + outcome_payload: dict = field(default_factory=dict) + recorded_at: float = 0.0 + + def canonical_json(self) -> str: + return _canonical_json(asdict(self)) + + def hash(self) -> str: + return _sha256_hex(self.canonical_json()) + + +@dataclass(frozen=True) +class Receipt: + """Receipt pair binding a commitment to its outcome.""" + + commit: CommitPayload + outcome: Optional[OutcomePayload] = None + + @property + def commit_hash(self) -> str: + return self.commit.hash() + + @property + def outcome_hash(self) -> Optional[str]: + return self.outcome.hash() if self.outcome is not None else None + + def to_dict(self) -> dict[str, Any]: + d: dict[str, Any] = { + "version": "1.0", + "commit": { + "payload": asdict(self.commit), + "hash": self.commit_hash, + }, + } + if self.outcome is not None: + d["outcome"] = { + "payload": asdict(self.outcome), + "hash": self.outcome_hash, + } + return d + + +def verify_receipt(receipt: Receipt) -> bool: + """Recompute hashes and verify the commit-outcome binding.""" + if receipt.commit.hash() != receipt.commit_hash: + return False + if receipt.outcome is None: + return True + if receipt.outcome.commit_hash != receipt.commit_hash: + return False + if receipt.outcome.hash() != receipt.outcome_hash: + return False + return True + + +def verify_receipt_dict(d: dict[str, Any]) -> bool: + """Verify a serialized receipt (as produced by ``Receipt.to_dict``).""" + try: + commit_d = d["commit"] + commit = CommitPayload(**commit_d["payload"]) + if commit.hash() != commit_d["hash"]: + return False + outcome_d = d.get("outcome") + if outcome_d is None: + return True + outcome = OutcomePayload(**outcome_d["payload"]) + if outcome.commit_hash != commit_d["hash"]: + return False + if outcome.hash() != outcome_d["hash"]: + return False + return True + except (KeyError, TypeError, ValueError): + return False + + +def extract_receipt(trail: AuditTrail, action_id: str) -> Optional[Receipt]: + """Derive a receipt pair from an existing trail. + + Reads the per-action records and reconstructs the commit and (if + present) outcome payloads. Returns None if no decision exists for the + action. + """ + records = trail._by_action.get(action_id, []) + if not records: + return None + commit = _commit_from_records(records, action_id) + if commit is None: + return None + outcome = _outcome_from_records(records, action_id, commit.hash()) + return Receipt(commit=commit, outcome=outcome) + + +def _commit_from_records( + records: list[AuditRecord], action_id: str, +) -> Optional[CommitPayload]: + decision_record: Optional[AuditRecord] = None + risk_record: Optional[AuditRecord] = None + for r in records: + if r.event_type in (EventType.DECISION_MADE, EventType.ACTION_BLOCKED): + decision_record = r + if r.event_type == EventType.RISK_SCORED: + risk_record = r + if decision_record is None: + return None + data = decision_record.data or {} + decision = str(data.get("decision", "")) or _event_to_decision( + decision_record.event_type, + ) + risk_score = _coerce_float(data.get("risk_score")) + threshold_allow, threshold_deny = _thresholds_from_risk_record(risk_record) + if risk_score is None and risk_record is not None: + risk_score = _coerce_float((risk_record.data or {}).get("point_estimate")) + if risk_score is None: + risk_score = 0.0 + return CommitPayload( + action_id=action_id, + decision=decision, + risk_score=float(risk_score), + threshold_allow=threshold_allow, + threshold_deny=threshold_deny, + decided_at=float(decision_record.timestamp), + ) + + +def _outcome_from_records( + records: list[AuditRecord], action_id: str, commit_hash: str, +) -> Optional[OutcomePayload]: + for r in records: + if r.event_type != EventType.OUTCOME_RECORDED: + continue + data = r.data or {} + severity = _coerce_float(data.get("outcome_severity")) + if severity is None: + severity = _coerce_float(data.get("severity")) + if severity is None: + severity = 0.0 + return OutcomePayload( + action_id=action_id, + commit_hash=commit_hash, + outcome_severity=float(severity), + outcome_payload={ + k: v for k, v in data.items() + if k not in ("outcome_severity", "severity") + }, + recorded_at=float(r.timestamp), + ) + return None + + +def _thresholds_from_risk_record( + risk_record: Optional[AuditRecord], +) -> tuple[float, float]: + if risk_record is None: + return 0.4, 0.7 + data = risk_record.data or {} + ta = _coerce_float(data.get("threshold_allow")) or 0.4 + td = _coerce_float(data.get("threshold_deny")) or 0.7 + return float(ta), float(td) + + +def _event_to_decision(event_type: EventType) -> str: + return "deny" if event_type == EventType.ACTION_BLOCKED else "allow" + + +def _coerce_float(value: Any) -> Optional[float]: + if value is None: + return None + try: + f = float(value) + except (TypeError, ValueError): + return None + if not math.isfinite(f): + return None + return f + + +def _canonical_json(d: dict[str, Any]) -> str: + return json.dumps(d, sort_keys=True, separators=(",", ":"), allow_nan=False) + + +def _sha256_hex(s: str) -> str: + return hashlib.sha256(s.encode("utf-8")).hexdigest() + + +__all__ = [ + "CommitPayload", + "OutcomePayload", + "Receipt", + "extract_receipt", + "verify_receipt", + "verify_receipt_dict", +] diff --git a/src/vaara/cli.py b/src/vaara/cli.py index cd1e185e..87265636 100644 --- a/src/vaara/cli.py +++ b/src/vaara/cli.py @@ -550,6 +550,109 @@ def _cmd_policy_test(args: argparse.Namespace) -> int: return 0 if all(r.passed for r in results) else 1 +def _cmd_trail_receipt(args: argparse.Namespace) -> int: + from vaara.audit.receipts import extract_receipt, verify_receipt + from vaara.audit.sqlite_backend import SQLiteAuditBackend + + db_path = Path(args.db).expanduser() + if not db_path.exists(): + print(f"audit DB not found: {db_path}", file=sys.stderr) + return 2 + + backend = SQLiteAuditBackend(str(db_path)) + try: + trail = backend.load_trail() + except Exception as exc: + print(f"failed to load audit trail: {exc}", file=sys.stderr) + return 2 + + receipt = extract_receipt(trail, args.action_id) + if receipt is None: + print( + f"no decision record found for action_id {args.action_id!r}", + file=sys.stderr, + ) + return 1 + + if not verify_receipt(receipt): + print( + "receipt verification failed — derived hashes do not match payloads", + file=sys.stderr, + ) + return 1 + + text = json.dumps(receipt.to_dict(), indent=2, sort_keys=False) + if args.out: + Path(args.out).expanduser().write_text(text, encoding="utf-8") + else: + print(text) + return 0 + + +def _cmd_compliance_report(args: argparse.Namespace) -> int: + from vaara.audit.sqlite_backend import SQLiteAuditBackend + from vaara.compliance.engine import create_default_engine + from vaara.compliance.render import ( + render_json, + render_markdown, + render_narrative, + ) + + db_path = Path(args.db).expanduser() + if not db_path.exists(): + print(f"audit DB not found: {db_path}", file=sys.stderr) + return 2 + + backend = SQLiteAuditBackend(str(db_path)) + try: + trail = backend.load_trail() + except Exception as exc: + print(f"failed to load audit trail: {exc}", file=sys.stderr) + return 2 + + engine = create_default_engine() + report = engine.assess( + trail, + system_name=args.system_name, + system_version=args.system_version, + ) + + if args.format == "md": + text = render_markdown(report) + elif args.format == "json": + text = render_json(report) + elif args.format == "narrative": + text = render_narrative(report) + else: + print(f"unknown format: {args.format}", file=sys.stderr) + return 2 + + if args.out: + out_path = Path(args.out).expanduser() + out_path.write_text(text, encoding="utf-8") + else: + print(text) + return 0 + + +def _cmd_serve(args: argparse.Namespace) -> int: + try: + import uvicorn + except ImportError: + print( + "vaara serve requires the server extra. " + "Install with: pip install 'vaara[server]'", + file=sys.stderr, + ) + return 2 + + from vaara.server import create_app + + app = create_app() + uvicorn.run(app, host=args.host, port=args.port, log_level=args.log_level) + return 0 + + def build_parser() -> argparse.ArgumentParser: p = argparse.ArgumentParser(prog="vaara", description="Vaara AI Agent Execution Layer") sub = p.add_subparsers(dest="cmd", required=True) @@ -625,6 +728,15 @@ def build_parser() -> argparse.ArgumentParser: ) pei.set_defaults(func=_cmd_trail_export_incident) + prec = tsub.add_parser( + "receipt", + help="Extract an Article 12 commit-prove receipt pair for an action", + ) + prec.add_argument("--db", required=True, help="Path to the audit SQLite DB") + prec.add_argument("--action-id", required=True, help="action_id to extract") + prec.add_argument("--out", default=None, help="Write to file (default: stdout)") + prec.set_defaults(func=_cmd_trail_receipt) + pp = tsub.add_parser( "purge", help="Delete audit records older than the retention period (Article 12(2))", @@ -753,6 +865,50 @@ def build_parser() -> argparse.ArgumentParser: ) ptest.set_defaults(func=_cmd_policy_test) + pcr = sub.add_parser( + "compliance", + help="Compliance reporting commands", + ) + csub = pcr.add_subparsers(dest="compliance_cmd", required=True) + pcrep = csub.add_parser( + "report", + help="Assemble and render an article-level evidence report", + ) + pcrep.add_argument( + "--db", required=True, + help="Path to the audit SQLite DB to read evidence from", + ) + pcrep.add_argument( + "--format", choices=["md", "json", "narrative"], default="md", + help="Output format (default: md)", + ) + pcrep.add_argument( + "--out", default=None, + help="Write to file (default: stdout)", + ) + pcrep.add_argument( + "--system-name", default="Vaara-governed AI system", + help="System name to include in the report header", + ) + pcrep.add_argument( + "--system-version", default="unspecified", + help="System version to include in the report header", + ) + pcrep.set_defaults(func=_cmd_compliance_report) + + pserve = sub.add_parser( + "serve", + help="Run the Vaara HTTP API reference server (requires vaara[server])", + ) + pserve.add_argument("--host", default="127.0.0.1", help="Bind host") + pserve.add_argument("--port", type=int, default=8000, help="Bind port") + pserve.add_argument( + "--log-level", + default="info", + choices=["critical", "error", "warning", "info", "debug", "trace"], + ) + pserve.set_defaults(func=_cmd_serve) + return p diff --git a/src/vaara/compliance/render.py b/src/vaara/compliance/render.py new file mode 100644 index 00000000..d1ea7973 --- /dev/null +++ b/src/vaara/compliance/render.py @@ -0,0 +1,180 @@ +"""Auditor-facing renderers for ConformityReport. + +Each rendering produces a deployer-shippable artefact: + +- ``render_markdown`` — Markdown with per-domain sections, article tables, + evidence status badges, and gap/recommendation lists. The canonical + human-shipped format; reviewable in a PR, diffable in CI, attachable to + a regulator submission as `.md`. +- ``render_narrative`` — Plain-text narrative (existing + `ConformityReport.narrative` property, re-exposed here for symmetry). +- ``render_json`` — Strict-JSON dict (existing `ConformityReport.to_dict`, + also re-exposed). + +PDF export is intentionally NOT in v1: a clean Markdown render can be piped +through `pandoc` or `weasyprint` by the deployer's own pipeline. Vaara +defines the article-evidence content; format conversion is downstream. +""" + +from __future__ import annotations + +import json +import time + +from vaara.compliance.engine import ( + ArticleEvidence, + ConformityReport, + EvidenceStatus, +) + + +_STATUS_BADGE = { + EvidenceStatus.EVIDENCE_SUFFICIENT: "[OK] sufficient", + EvidenceStatus.EVIDENCE_PARTIAL: "[!!] partial", + EvidenceStatus.EVIDENCE_INSUFFICIENT: "[XX] insufficient", + EvidenceStatus.NOT_APPLICABLE: "[--] not applicable", +} + + +def render_narrative(report: ConformityReport) -> str: + """Plain-text narrative — wraps the existing report.narrative property.""" + return report.narrative + + +def render_json(report: ConformityReport, *, indent: int = 2) -> str: + """Strict-JSON serialization.""" + return json.dumps(report.to_dict(), indent=indent, sort_keys=False) + + +def render_markdown(report: ConformityReport) -> str: + """Render the report as Markdown. + + Layout: + # Article-level evidence report + ## System / generation metadata + ## Audit trail integrity + ## Summary + ## Critical gaps (if any) + ## Per-domain article tables + ## Detailed per-article sections + """ + lines: list[str] = [] + ts = time.strftime("%Y-%m-%d %H:%M UTC", time.gmtime(report.generated_at)) + + lines.append("# Article-level evidence report") + lines.append("") + lines.append( + "> This is an evidence artefact assembled from the Vaara runtime " + "audit trail. It is **not** a conformity determination. The " + "deployer (and where applicable a Notified Body) owns the " + "conformity verdict under the EU AI Act and other applicable law." + ) + lines.append("") + + lines.append("## System") + lines.append("") + lines.append(f"- **Name:** {report.system_name}") + lines.append(f"- **Version:** {report.system_version}") + lines.append(f"- **Generated:** {ts}") + lines.append(f"- **Overall evidence status:** `{report.overall_status.value}`") + lines.append("") + + lines.append("## Audit trail integrity") + lines.append("") + chain_state = "intact" if report.trail_chain_intact else "**BROKEN**" + lines.append(f"- **Trail size:** {report.trail_size} records") + lines.append(f"- **Hash chain:** {chain_state}") + if not report.trail_chain_intact: + lines.append("") + lines.append( + "> The hash chain is broken. Every article below is reported as " + "insufficient until the chain is reconstructed or re-verified." + ) + lines.append("") + + lines.append("## Summary") + lines.append("") + lines.append(report.summary or "_no summary provided_") + lines.append("") + + if report.critical_gaps: + lines.append("## Critical gaps") + lines.append("") + for gap in report.critical_gaps: + lines.append(f"- {gap}") + lines.append("") + + # Group by domain + by_domain: dict[str, list[ArticleEvidence]] = {} + for a in report.articles: + by_domain.setdefault(a.requirement.domain.value, []).append(a) + + for domain in sorted(by_domain): + articles = by_domain[domain] + lines.append(f"## {domain.upper()} — article evidence") + lines.append("") + lines.append("| Article | Title | Status | Strength | Records |") + lines.append("|---|---|---|---|---|") + for art in articles: + status_str = _STATUS_BADGE.get(art.status, art.status.value) + lines.append( + f"| `{art.requirement.article}` | " + f"{art.requirement.title} | " + f"{status_str} | " + f"{art.strength.value} | " + f"{art.evidence_count} |" + ) + lines.append("") + + for art in articles: + lines.append( + f"### {domain.upper()} {art.requirement.article} — " + f"{art.requirement.title}" + ) + lines.append("") + lines.append( + f"- **Status:** {_STATUS_BADGE.get(art.status, art.status.value)}" + ) + lines.append(f"- **Strength:** `{art.strength.value}`") + lines.append(f"- **Evidence records:** {art.evidence_count}") + if art.evidence_count > 0: + freshest = art.freshest_evidence_age_hours + oldest = art.oldest_evidence_age_hours + if freshest is not None and freshest != float("inf"): + lines.append( + f"- **Freshest evidence age:** {freshest:.1f} hours" + ) + if oldest is not None and oldest != float("inf"): + lines.append( + f"- **Oldest evidence age:** {oldest:.1f} hours" + ) + if art.requirement.description: + lines.append("") + lines.append(f"> {art.requirement.description}") + if art.gaps: + lines.append("") + lines.append("**Gaps:**") + for gap in art.gaps: + lines.append(f"- {gap}") + if art.recommendations: + lines.append("") + lines.append("**Recommendations:**") + for rec in art.recommendations: + lines.append(f"- {rec}") + if art.sample_record_ids: + lines.append("") + lines.append("**Sample audit record IDs:**") + for rid in art.sample_record_ids[:5]: + lines.append(f"- `{rid}`") + lines.append("") + + lines.append("---") + lines.append("") + lines.append( + "_Generated by Vaara. Article-level evidence is collected from the " + "runtime audit trail; deployer owns the conformity decision._" + ) + return "\n".join(lines) + + +__all__ = ["render_markdown", "render_narrative", "render_json"] diff --git a/src/vaara/server/__init__.py b/src/vaara/server/__init__.py new file mode 100644 index 00000000..902b49c3 --- /dev/null +++ b/src/vaara/server/__init__.py @@ -0,0 +1,23 @@ +"""Vaara HTTP API reference server. + +Exposes the Vaara scorer and audit trail as a network service following the +contract in `docs/openapi.yaml`. The spec is authoritative; this module is a +reference implementation suitable for local development, integration testing, +and modest production loads. Production deployments with sustained traffic +should provide their own implementation against the same spec. + +Install: ``pip install vaara[server]``. + +Run:: + + vaara serve --host 0.0.0.0 --port 8000 + +Or programmatically:: + + from vaara.server import create_app + app = create_app() +""" + +from vaara.server.app import create_app + +__all__ = ["create_app"] diff --git a/src/vaara/server/app.py b/src/vaara/server/app.py new file mode 100644 index 00000000..5b7ffdbc --- /dev/null +++ b/src/vaara/server/app.py @@ -0,0 +1,46 @@ +"""FastAPI application factory for the Vaara HTTP API reference server. + +The server holds a single in-process `AdaptiveScorer` and `AuditTrail` for +the lifetime of the process. Both are stateful: the scorer maintains +conformal calibration and MWU weights across requests, and the audit trail +is a single hash chain. + +State persistence is out of scope for v1. State is in-memory unless the +embedder wires the audit trail to a persistent backend. +""" + +from __future__ import annotations + +from typing import Optional + +from fastapi import FastAPI + +from vaara.audit.trail import AuditTrail +from vaara.scorer.adaptive import AdaptiveScorer +from vaara.server.routes import register +from vaara.server.state import ServerState + + +def create_app( + scorer: Optional[AdaptiveScorer] = None, + audit: Optional[AuditTrail] = None, +) -> FastAPI: + """Build the FastAPI application. + + Args: + scorer: Pre-configured scorer, or None for default `AdaptiveScorer()`. + audit: Pre-configured audit trail, or None for default in-memory. + """ + state = ServerState(scorer=scorer, audit=audit) + app = FastAPI( + title="Vaara HTTP API", + version="1.0.0", + description=( + "Conformal-scoring risk evaluation and hash-chained audit " + "emission. Authoritative spec: docs/openapi.yaml in the vaara " + "repository." + ), + ) + app.state.vaara = state + register(app, state) + return app diff --git a/src/vaara/server/routes.py b/src/vaara/server/routes.py new file mode 100644 index 00000000..22ba0225 --- /dev/null +++ b/src/vaara/server/routes.py @@ -0,0 +1,201 @@ +"""Route handlers for the Vaara HTTP API reference server.""" + +from __future__ import annotations + +import time +import uuid +from datetime import datetime, timezone +from typing import Optional + +from fastapi import FastAPI, HTTPException, status +from fastapi.responses import JSONResponse + +from vaara import __version__ as _vaara_version +from vaara.audit.trail import AuditRecord, EventType +from vaara.server import schemas as S +from vaara.server.state import ServerState + + +_SERVER_NAME = "vaara-reference-server" +_SERVER_VERSION = "1.0.0" + + +def _error(code: str, message: str, http_status: int, **details) -> HTTPException: + body = {"error": {"code": code, "message": message}} + if details: + body["error"]["details"] = details + return HTTPException(status_code=http_status, detail=body) + + +def _iso(ts: float) -> str: + return datetime.fromtimestamp(ts, tz=timezone.utc).isoformat() + + +def register(app: FastAPI, state: ServerState) -> None: + + @app.exception_handler(HTTPException) + async def _http_exc_handler(_request, exc: HTTPException): + if isinstance(exc.detail, dict) and "error" in exc.detail: + return JSONResponse(status_code=exc.status_code, content=exc.detail) + return JSONResponse( + status_code=exc.status_code, + content={"error": {"code": "http_error", "message": str(exc.detail)}}, + ) + + @app.get("/v1/health") + async def health(): + return {"status": "ok"} + + @app.get("/v1/server", response_model=S.ServerInfo) + async def server_info(): + return S.ServerInfo( + name=_SERVER_NAME, + version=_SERVER_VERSION, + vaara_version=_vaara_version, + capabilities=S.Capabilities( + score=True, audit=True, outcome_feedback=True, + ), + scorer=S.ScorerInfo( + type=type(state.scorer).__name__, + calibration_size=state.scorer._conformal.calibration_size, + threshold_allow=state.scorer._threshold_allow, + threshold_deny=state.scorer._threshold_deny, + alpha=state.scorer._conformal._alpha, + ), + ) + + @app.post("/v1/score", response_model=S.ScoreResponse) + async def score(req: S.ScoreRequest): + ctx = req.model_dump(exclude_none=True) + try: + decision_dict = state.scorer.evaluate(ctx) + except Exception as exc: + raise _error( + "scorer_error", str(exc), status.HTTP_503_SERVICE_UNAVAILABLE, + ) + + raw = decision_dict.get("raw_result", {}) or {} + lower, upper = (raw.get("conformal_interval") or [0.0, 1.0]) + action_id = str(uuid.uuid4()) + signals = {k: float(v) for k, v in (raw.get("signals") or {}).items()} + state.remember_action( + action_id=action_id, + agent_id=req.agent_id, + tool_name=req.tool_name, + predicted_risk=float(raw.get("point_estimate", 0.5) or 0.5), + signals=signals, + ) + + return S.ScoreResponse( + action_id=action_id, + decision=decision_dict.get("action", "escalate"), + risk=S.RiskBlock( + point=raw.get("point_estimate", 0.5), + lower=lower, + upper=upper, + alpha=raw.get("effective_alpha", 0.10), + bucket=raw.get("bucket_category"), + ), + signals=signals, + mwu_weights={k: float(v) for k, v in state.scorer._mwu.weights.items()}, + thresholds=S.Thresholds( + allow=state.scorer._threshold_allow, + deny=state.scorer._threshold_deny, + ), + sequence_risk=float(raw.get("sequence_risk", 0.0) or 0.0), + calibration_size=int(raw.get("calibration_size", 0) or 0), + evaluation_ms=float(decision_dict.get("evaluation_ms", 0.0) or 0.0), + explanation=decision_dict.get("reason", ""), + ) + + @app.post("/v1/score/outcome", status_code=204) + async def score_outcome(req: S.OutcomeRequest): + info = state.lookup_action(req.action_id) + if info is None: + raise _error( + "unknown_action", f"action_id {req.action_id!r} not found", + status.HTTP_404_NOT_FOUND, + ) + state.scorer.record_outcome( + agent_id=info.agent_id, + tool_name=info.tool_name, + predicted_risk=info.predicted_risk, + actual_outcome=req.outcome_severity, + signals=info.signals, + ) + return None + + @app.post( + "/v1/audit/events", + response_model=S.AuditEventResponse, + status_code=201, + ) + async def append_audit_event(req: S.AuditEventRequest): + try: + event_type = EventType(req.event_type) + except ValueError: + raise _error( + "bad_event_type", f"unknown event_type {req.event_type!r}", + status.HTTP_400_BAD_REQUEST, + ) + + record = AuditRecord( + record_id=str(uuid.uuid4()), + action_id=req.action_id, + event_type=event_type, + timestamp=time.time(), + agent_id=req.agent_id or "", + tool_name=req.tool_name or "", + data=req.payload or {}, + regulatory_articles=[], + ) + state.audit._append(record) + return S.AuditEventResponse( + event_id=record.record_id, + chain_position=state.audit.size - 1, + event_hash=record.record_hash, + previous_hash=record.previous_hash, + timestamp=_iso(record.timestamp), + ) + + @app.get( + "/v1/audit/actions/{action_id}/chain", + response_model=S.AuditChain, + ) + async def read_action_chain(action_id: str): + records = state.audit._by_action.get(action_id, []) + if not records: + raise _error( + "unknown_action", f"no audit records for {action_id!r}", + status.HTTP_404_NOT_FOUND, + ) + return S.AuditChain( + action_id=action_id, + events=[ + S.AuditChainEvent( + event_id=r.record_id, + event_type=r.event_type.value, + chain_position=state.audit._records.index(r), + event_hash=r.record_hash, + previous_hash=r.previous_hash, + timestamp=_iso(r.timestamp), + payload=r.data or {}, + ) + for r in records + ], + ) + + @app.post("/v1/audit/verify", response_model=S.VerifyResponse) + async def verify_audit_chain(_req: Optional[S.VerifyRequest] = None): + # v1: full-chain verify only. Ranged verify is in the spec but + # not yet implemented server-side. + problem = state.audit.verify_chain() + if problem is None: + return S.VerifyResponse( + valid=True, events_checked=state.audit.size, + ) + return S.VerifyResponse( + valid=False, + events_checked=state.audit.size, + first_break=None, + ) diff --git a/src/vaara/server/schemas.py b/src/vaara/server/schemas.py new file mode 100644 index 00000000..9f1fe079 --- /dev/null +++ b/src/vaara/server/schemas.py @@ -0,0 +1,163 @@ +"""Pydantic models matching docs/openapi.yaml v1 contract. + +These models are the wire format. The internal `AdaptiveScorer.evaluate(ctx)` +takes and returns dicts; this module is the bridge between the spec and the +internal types. +""" + +from __future__ import annotations + +from typing import Any, Literal, Optional + +from pydantic import BaseModel, Field, ConfigDict + + +_Reversibility = Literal["reversible", "partially_reversible", "irreversible"] +_BlastRadius = Literal["local", "account", "organization", "global"] +_Decision = Literal["allow", "escalate", "deny"] +_EventType = Literal[ + "action_requested", + "risk_scored", + "decision_made", + "action_executed", + "action_blocked", + "escalation_sent", + "escalation_resolved", + "outcome_recorded", + "policy_override", +] + + +class ScoreRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + tool_name: str = Field(max_length=512) + agent_id: str = Field(max_length=256) + action_type: Optional[str] = None + parameters: dict[str, Any] = Field(default_factory=dict) + agent_confidence: Optional[float] = Field(default=None, ge=0, le=1) + base_risk_score: Optional[float] = Field(default=None, ge=0, le=1) + reversibility: Optional[_Reversibility] = None + blast_radius: Optional[_BlastRadius] = None + session_id: Optional[str] = Field(default=None, max_length=256) + parent_action_id: Optional[str] = Field(default=None, max_length=128) + context: dict[str, Any] = Field(default_factory=dict) + + +class RiskBlock(BaseModel): + point: float = Field(ge=0, le=1) + lower: float = Field(ge=0, le=1) + upper: float = Field(ge=0, le=1) + alpha: float = Field(ge=0, le=1) + bucket: Optional[str] = None + + +class Thresholds(BaseModel): + allow: float + deny: float + + +class ScoreResponse(BaseModel): + action_id: str + decision: _Decision + risk: RiskBlock + signals: dict[str, float] = Field(default_factory=dict) + mwu_weights: dict[str, float] = Field(default_factory=dict) + thresholds: Thresholds + sequence_risk: float = 0.0 + calibration_size: int = 0 + evaluation_ms: float = 0.0 + explanation: str = "" + + +class OutcomeRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + action_id: str + outcome_severity: float = Field(ge=0, le=1) + notes: Optional[str] = None + + +class AuditEventRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + event_type: _EventType + action_id: str + agent_id: Optional[str] = None + tool_name: Optional[str] = None + payload: dict[str, Any] = Field(default_factory=dict) + + +class AuditEventResponse(BaseModel): + event_id: str + chain_position: int + event_hash: str + previous_hash: str + timestamp: str + + +class AuditChainEvent(BaseModel): + event_id: str + event_type: str + chain_position: int + event_hash: str + previous_hash: str + timestamp: str + payload: dict[str, Any] = Field(default_factory=dict) + + +class AuditChain(BaseModel): + action_id: str + events: list[AuditChainEvent] + + +class VerifyRequest(BaseModel): + model_config = ConfigDict(extra="forbid") + + from_event_id: Optional[str] = None + to_event_id: Optional[str] = None + + +class FirstBreak(BaseModel): + event_id: str + chain_position: int + expected_previous_hash: str + actual_previous_hash: str + + +class VerifyResponse(BaseModel): + valid: bool + events_checked: int + first_break: Optional[FirstBreak] = None + + +class ScorerInfo(BaseModel): + type: str + calibration_size: int + threshold_allow: float + threshold_deny: float + alpha: float + + +class Capabilities(BaseModel): + score: bool + audit: bool + outcome_feedback: bool + + +class ServerInfo(BaseModel): + name: str + version: str + vaara_version: str + capabilities: Capabilities + scorer: Optional[ScorerInfo] = None + + +class ErrorBody(BaseModel): + code: str + message: str + details: dict[str, Any] = Field(default_factory=dict) + + +class ErrorResponse(BaseModel): + error: ErrorBody diff --git a/src/vaara/server/state.py b/src/vaara/server/state.py new file mode 100644 index 00000000..58d15b60 --- /dev/null +++ b/src/vaara/server/state.py @@ -0,0 +1,54 @@ +"""Server state container — scorer + audit trail singletons.""" + +from __future__ import annotations + +import threading +from dataclasses import dataclass, field +from typing import Optional + +from vaara.audit.trail import AuditTrail +from vaara.scorer.adaptive import AdaptiveScorer + + +@dataclass +class _ActionInfo: + agent_id: str + tool_name: str + predicted_risk: float + signals: dict[str, float] = field(default_factory=dict) + + +class ServerState: + """Holds scorer + audit-trail singletons. Lifetime = process lifetime.""" + + def __init__( + self, + scorer: Optional[AdaptiveScorer] = None, + audit: Optional[AuditTrail] = None, + ) -> None: + self.scorer = scorer or AdaptiveScorer() + self.audit = audit or AuditTrail() + self._lock = threading.Lock() + # action_id → info captured at score time so outcome reports can + # feed the MWU update without the client having to resend context. + self._actions: dict[str, _ActionInfo] = {} + + def remember_action( + self, + action_id: str, + agent_id: str, + tool_name: str, + predicted_risk: float, + signals: dict[str, float], + ) -> None: + with self._lock: + self._actions[action_id] = _ActionInfo( + agent_id=agent_id, + tool_name=tool_name, + predicted_risk=predicted_risk, + signals=signals, + ) + + def lookup_action(self, action_id: str) -> Optional[_ActionInfo]: + with self._lock: + return self._actions.get(action_id) diff --git a/tests/test_compliance_render.py b/tests/test_compliance_render.py new file mode 100644 index 00000000..fae48529 --- /dev/null +++ b/tests/test_compliance_render.py @@ -0,0 +1,113 @@ +"""Tests for Markdown / narrative / JSON renderers of ConformityReport.""" + +from __future__ import annotations + +import json + +from vaara.audit.trail import AuditTrail +from vaara.compliance.engine import create_default_engine +from vaara.compliance.render import ( + render_json, + render_markdown, + render_narrative, +) +from vaara.taxonomy.actions import ( + ActionCategory, + ActionRequest, + ActionType, + BlastRadius, + RegulatoryDomain, + Reversibility, + UrgencyClass, +) + + +def _populated_trail() -> AuditTrail: + """Build a small trail with one full action lifecycle.""" + trail = AuditTrail() + action_type = ActionType( + name="data.read", + category=ActionCategory.DATA, + reversibility=Reversibility.FULLY, + blast_radius=BlastRadius.LOCAL, + urgency=UrgencyClass.DEFERRABLE, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + req = ActionRequest( + agent_id="a-1", + tool_name="data.read", + action_type=action_type, + parameters={}, + confidence=0.9, + ) + action_id = trail.record_action_requested(req) + trail.record_risk_scored( + action_id=action_id, + agent_id="a-1", + tool_name="data.read", + assessment={"point_estimate": 0.2, "decision": "allow"}, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + trail.record_decision( + action_id=action_id, + agent_id="a-1", + tool_name="data.read", + decision="allow", + reason="risk below threshold", + risk_score=0.2, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + return trail + + +def test_render_markdown_produces_structured_output(): + trail = _populated_trail() + engine = create_default_engine() + report = engine.assess(trail, system_name="TestSys", system_version="1.0") + + md = render_markdown(report) + assert "# Article-level evidence report" in md + assert "## System" in md + assert "TestSys" in md + assert "## Audit trail integrity" in md + assert "intact" in md + # Per-domain section header. + assert "EU_AI_ACT" in md.upper() or "EU AI ACT" in md.upper() + # Tables are present. + assert "| Article | Title |" in md + # Per-article sections. + assert "Article 9(1)" in md + # Trailing disclaimer. + assert "deployer owns the conformity decision" in md + + +def test_render_json_is_strict_json(): + trail = _populated_trail() + report = create_default_engine().assess(trail) + text = render_json(report) + parsed = json.loads(text) + assert parsed["overall_status"] + assert isinstance(parsed["articles"], list) + + +def test_render_narrative_matches_property(): + trail = _populated_trail() + report = create_default_engine().assess(trail) + assert render_narrative(report) == report.narrative + + +def test_render_markdown_flags_broken_chain(): + trail = _populated_trail() + # Tamper a record to break the chain. + trail._records[1].record_hash = "0" * 64 + report = create_default_engine().assess(trail) + md = render_markdown(report) + assert "**BROKEN**" in md + assert "chain is broken" in md.lower() + + +def test_render_markdown_includes_summary_section(): + trail = _populated_trail() + report = create_default_engine().assess(trail) + md = render_markdown(report) + assert "## Summary" in md diff --git a/tests/test_receipts.py b/tests/test_receipts.py new file mode 100644 index 00000000..6f0ddcda --- /dev/null +++ b/tests/test_receipts.py @@ -0,0 +1,192 @@ +"""Article 12 commit-prove receipt-pair tests.""" + +from __future__ import annotations + +import json + +from vaara.audit.receipts import ( + CommitPayload, + OutcomePayload, + Receipt, + extract_receipt, + verify_receipt, + verify_receipt_dict, +) +from vaara.audit.trail import AuditTrail +from vaara.taxonomy.actions import ( + ActionCategory, + ActionRequest, + ActionType, + BlastRadius, + RegulatoryDomain, + Reversibility, + UrgencyClass, +) + + +def _trail_with_action(decision: str = "allow", with_outcome: bool = True): + trail = AuditTrail() + action_type = ActionType( + name="data.read", + category=ActionCategory.DATA, + reversibility=Reversibility.FULLY, + blast_radius=BlastRadius.LOCAL, + urgency=UrgencyClass.DEFERRABLE, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + req = ActionRequest( + agent_id="a-1", + tool_name="data.read", + action_type=action_type, + confidence=0.9, + ) + action_id = trail.record_action_requested(req) + trail.record_risk_scored( + action_id=action_id, + agent_id="a-1", + tool_name="data.read", + assessment={ + "point_estimate": 0.3, + "threshold_allow": 0.4, + "threshold_deny": 0.7, + }, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + trail.record_decision( + action_id=action_id, + agent_id="a-1", + tool_name="data.read", + decision=decision, + reason="below threshold", + risk_score=0.3, + regulatory_domains=frozenset({RegulatoryDomain.EU_AI_ACT}), + ) + if with_outcome: + trail.record_outcome( + action_id=action_id, + agent_id="a-1", + tool_name="data.read", + outcome_severity=0.0, + description="benign read completed", + ) + return trail, action_id + + +def test_commit_hash_is_deterministic_64_hex(): + payload = CommitPayload( + action_id="a", decision="allow", + risk_score=0.3, threshold_allow=0.4, threshold_deny=0.7, + decided_at=1700000000.0, + ) + h1 = payload.hash() + h2 = payload.hash() + assert h1 == h2 + assert len(h1) == 64 + assert all(c in "0123456789abcdef" for c in h1) + + +def test_commit_hash_changes_when_any_field_changes(): + base = CommitPayload( + action_id="a", decision="allow", risk_score=0.3, + threshold_allow=0.4, threshold_deny=0.7, decided_at=1.0, + ) + flipped_decision = CommitPayload( + action_id="a", decision="deny", risk_score=0.3, + threshold_allow=0.4, threshold_deny=0.7, decided_at=1.0, + ) + flipped_risk = CommitPayload( + action_id="a", decision="allow", risk_score=0.31, + threshold_allow=0.4, threshold_deny=0.7, decided_at=1.0, + ) + flipped_time = CommitPayload( + action_id="a", decision="allow", risk_score=0.3, + threshold_allow=0.4, threshold_deny=0.7, decided_at=2.0, + ) + h = base.hash() + assert h != flipped_decision.hash() + assert h != flipped_risk.hash() + assert h != flipped_time.hash() + + +def test_outcome_payload_embeds_commit_hash(): + commit = CommitPayload( + action_id="a", decision="allow", risk_score=0.3, + threshold_allow=0.4, threshold_deny=0.7, decided_at=1.0, + ) + outcome = OutcomePayload( + action_id="a", commit_hash=commit.hash(), + outcome_severity=0.0, recorded_at=2.0, + ) + receipt = Receipt(commit=commit, outcome=outcome) + assert verify_receipt(receipt) is True + + +def test_verify_receipt_detects_tampered_outcome_commit_hash(): + commit = CommitPayload( + action_id="a", decision="allow", risk_score=0.3, + threshold_allow=0.4, threshold_deny=0.7, decided_at=1.0, + ) + outcome = OutcomePayload( + action_id="a", commit_hash="0" * 64, + outcome_severity=0.0, recorded_at=2.0, + ) + receipt = Receipt(commit=commit, outcome=outcome) + assert verify_receipt(receipt) is False + + +def test_extract_receipt_from_trail(): + trail, action_id = _trail_with_action(decision="allow") + receipt = extract_receipt(trail, action_id) + assert receipt is not None + assert receipt.commit.action_id == action_id + assert receipt.commit.decision == "allow" + assert receipt.commit.risk_score == 0.3 + assert receipt.commit.threshold_allow == 0.4 + assert receipt.commit.threshold_deny == 0.7 + assert receipt.outcome is not None + assert receipt.outcome.commit_hash == receipt.commit_hash + assert verify_receipt(receipt) is True + + +def test_extract_receipt_no_outcome_yet(): + trail, action_id = _trail_with_action(with_outcome=False) + receipt = extract_receipt(trail, action_id) + assert receipt is not None + assert receipt.outcome is None + assert verify_receipt(receipt) is True + + +def test_extract_receipt_denied_decision(): + trail, action_id = _trail_with_action(decision="deny", with_outcome=False) + receipt = extract_receipt(trail, action_id) + assert receipt is not None + assert receipt.commit.decision == "deny" + + +def test_extract_receipt_returns_none_for_unknown_action(): + trail, _ = _trail_with_action() + assert extract_receipt(trail, "no-such-action") is None + + +def test_receipt_to_dict_round_trips_through_verify_receipt_dict(): + trail, action_id = _trail_with_action() + receipt = extract_receipt(trail, action_id) + assert receipt is not None + d = receipt.to_dict() + # Round-trip through JSON to catch any non-serializable surprises. + d2 = json.loads(json.dumps(d)) + assert verify_receipt_dict(d2) is True + + +def test_verify_receipt_dict_rejects_tampered_serialized_form(): + trail, action_id = _trail_with_action() + receipt = extract_receipt(trail, action_id) + d = receipt.to_dict() + d["commit"]["payload"]["decision"] = "deny" # tamper but keep hash + assert verify_receipt_dict(d) is False + + +def test_verify_receipt_dict_handles_garbage(): + assert verify_receipt_dict({}) is False + assert verify_receipt_dict({"commit": {}}) is False + assert verify_receipt_dict({"commit": {"payload": {}, "hash": "x"}}) is False diff --git a/tests/test_server.py b/tests/test_server.py new file mode 100644 index 00000000..5bd638a5 --- /dev/null +++ b/tests/test_server.py @@ -0,0 +1,167 @@ +"""HTTP API reference-server tests. + +Exercises the v1 contract: score, outcome, audit append, chain read, verify, +health, server identity. +""" + +from __future__ import annotations + +import pytest + +try: + from fastapi.testclient import TestClient + + from vaara.server import create_app +except ImportError: + pytest.skip( + "server extra not installed (pip install 'vaara[server]')", + allow_module_level=True, + ) + + +@pytest.fixture +def client(): + app = create_app() + return TestClient(app) + + +def test_health(client): + r = client.get("/v1/health") + assert r.status_code == 200 + assert r.json() == {"status": "ok"} + + +def test_server_info(client): + r = client.get("/v1/server") + assert r.status_code == 200 + body = r.json() + assert body["name"] == "vaara-reference-server" + assert body["version"] == "1.0.0" + assert body["capabilities"] == { + "score": True, "audit": True, "outcome_feedback": True, + } + assert body["scorer"]["type"] == "AdaptiveScorer" + assert 0 < body["scorer"]["alpha"] < 1 + assert body["scorer"]["threshold_allow"] < body["scorer"]["threshold_deny"] + + +def test_score_returns_assessment(client): + r = client.post("/v1/score", json={ + "tool_name": "tx.transfer", + "agent_id": "agent-007", + "base_risk_score": 0.5, + }) + assert r.status_code == 200, r.text + body = r.json() + assert body["decision"] in ("allow", "escalate", "deny") + assert 0 <= body["risk"]["point"] <= 1 + assert 0 <= body["risk"]["lower"] <= body["risk"]["upper"] <= 1 + assert body["thresholds"]["allow"] < body["thresholds"]["deny"] + assert body["action_id"] + assert isinstance(body["signals"], dict) + assert isinstance(body["mwu_weights"], dict) + + +def test_score_validates_input(client): + # Missing required fields. + r = client.post("/v1/score", json={"tool_name": "x"}) + assert r.status_code == 422 + + # Out-of-range confidence. + r = client.post("/v1/score", json={ + "tool_name": "x", "agent_id": "a", "agent_confidence": 1.5, + }) + assert r.status_code == 422 + + +def test_score_outcome_roundtrip(client): + r1 = client.post("/v1/score", json={ + "tool_name": "data.read", "agent_id": "a-1", + }) + assert r1.status_code == 200 + action_id = r1.json()["action_id"] + + r2 = client.post("/v1/score/outcome", json={ + "action_id": action_id, "outcome_severity": 0.0, + }) + assert r2.status_code == 204 + + +def test_outcome_unknown_action_404(client): + r = client.post("/v1/score/outcome", json={ + "action_id": "does-not-exist", "outcome_severity": 0.0, + }) + assert r.status_code == 404 + assert r.json()["error"]["code"] == "unknown_action" + + +def test_audit_append_and_read_chain(client): + r1 = client.post("/v1/audit/events", json={ + "event_type": "action_requested", + "action_id": "act-1", + "agent_id": "a-1", + "tool_name": "data.read", + "payload": {"foo": "bar"}, + }) + assert r1.status_code == 201, r1.text + e1 = r1.json() + assert e1["chain_position"] == 0 + assert e1["previous_hash"] == "" + assert len(e1["event_hash"]) == 64 + + r2 = client.post("/v1/audit/events", json={ + "event_type": "decision_made", + "action_id": "act-1", + "payload": {"decision": "allow"}, + }) + e2 = r2.json() + assert e2["chain_position"] == 1 + assert e2["previous_hash"] == e1["event_hash"] + + rc = client.get("/v1/audit/actions/act-1/chain") + assert rc.status_code == 200 + chain = rc.json() + assert chain["action_id"] == "act-1" + assert len(chain["events"]) == 2 + assert chain["events"][0]["event_type"] == "action_requested" + assert chain["events"][1]["event_type"] == "decision_made" + + +def test_audit_chain_unknown_action_404(client): + r = client.get("/v1/audit/actions/no-such-action/chain") + assert r.status_code == 404 + + +def test_audit_event_bad_type_400(client): + r = client.post("/v1/audit/events", json={ + "event_type": "not_a_real_event", + "action_id": "x", + }) + # Pydantic enum validation may return 422; underlying dispatcher would + # return 400. Either is acceptable as a "bad input" signal. + assert r.status_code in (400, 422) + + +def test_audit_verify_empty_chain(client): + r = client.post("/v1/audit/verify", json={}) + assert r.status_code == 200 + body = r.json() + assert body["valid"] is True + assert body["events_checked"] == 0 + + +def test_audit_verify_after_events(client): + client.post("/v1/audit/events", json={ + "event_type": "action_requested", + "action_id": "act-v", + "agent_id": "a", + "tool_name": "t", + }) + client.post("/v1/audit/events", json={ + "event_type": "decision_made", + "action_id": "act-v", + }) + r = client.post("/v1/audit/verify", json={}) + body = r.json() + assert body["valid"] is True + assert body["events_checked"] == 2