Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ jobs:
- name: Install Vaara (editable, no deps)
run: pip install -e . --no-deps

- name: Install server extra (fastapi, uvicorn, httpx) for HTTP transport tests
run: pip install 'fastapi>=0.110' 'uvicorn>=0.27' 'httpx>=0.27'

- name: Lint (ruff)
run: ruff check .

Expand Down
101 changes: 96 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,97 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht

## [Unreleased]

## [0.40.0] - 2026-05-28

**Theme: deployment shape. One Vaara process now serves a fleet of
upstream MCP servers, with multi-tenant policy, audit, and attestation
on the same substrate.**

The v0.39 sidecar shape ran one Vaara process per upstream. v0.40
turns that into a single process that speaks Streamable HTTP, holds
N upstream MCP-server connections, picks the upstream per request
from a header, scopes every score, audit record, and OVERT envelope
to a tenant, and reloads per-tenant policy in place.

### Added
- `vaara-mcp-proxy --transport http --http-host H --http-port P`:
Streamable HTTP transport at `POST /mcp`, backed by FastAPI /
uvicorn (the `vaara[server]` extra already shipped in v0.39 for
`vaara serve`). The endpoint reads `X-Vaara-Tenant` and
`X-Vaara-Upstream` per request, pushes them into ContextVars, and
dispatches into the existing `_handle_request` path so the policy,
perimeter, OVERT, and progress-notification handling all light up
unchanged. Notifications (no JSON-RPC `id`) return 202 Accepted.
Bodies above 1 MiB return 413.
- `vaara-mcp-proxy --upstream NAME=CMD` (repeatable) for fan-out.
One Vaara process holds N `UpstreamMCPClient` instances in a name
-> client map. Bare `--upstream CMD` keeps the v0.39 single-
upstream contract; it lands in the "default" slot. Commands that
themselves contain `=` (e.g. `python -m foo --bar=baz`) stay
intact because the name-side regex only matches short alphanumeric
slugs. When more than one upstream is configured, a request with
no `X-Vaara-Upstream` header returns 400 with the list of valid
slots; silent fallback to whichever slot won the sort would be a
failure mode that surfaces only in production. Single-upstream
deployments keep the silent-default contract.
- `tenant_id` is first-class through the request, decision, audit,
and attestation layers:
- `ScoreRequest`, `AuditEventRequest`, and `PolicyReloadRequest`
accept a `tenant_id` body field, with `X-Vaara-Tenant` as the
HTTP-header alternative. Body wins over header.
- `AuditRecord` gains a `tenant_id` field, excluded from
`compute_hash()` so pre-v0.40 chains still re-verify on load.
- `AuditTrail` keeps an `action_id -> tenant_id` map seeded by
`record_action_requested`, so every follow-up record
(`risk_scored`, `decision`, `execution`, `escalation`,
`outcome`, `policy_override`) inherits the same scope without
every caller threading `tenant_id` through every signature.
The map is soft-capped (50k entries, 12.5% eviction on
pressure) so long-running deployments cannot leak memory.
- `SQLiteAuditBackend.write_record` prefers the per-record
`tenant_id` when set, with the instance-scoped `tenant_id`
(legacy CLI tooling path) as fallback. A single backend
instance can now serve a multi-tenant runtime.
- OVERT envelopes carry `tenant_id` as a `non_content_metadata`
claim when present.
- `vaara.policy.registry.PolicyRegistry`: one `PolicyController` per
tenant, with the empty string slot reserved as the default
fallback for unmatched lookups.
- `vaara serve --policy-dir DIR`: loads one YAML/JSON policy per
file. Filename stem = `tenant_id`; `default.yaml` lands in the
fallback slot. Mutually exclusive with `--policy`.
- `POST /v1/policy/reload` accepts a `tenant_id` body field (or
`X-Vaara-Tenant` header) and routes to the right registry slot;
creates the slot on first reload.

### Changed
- `Pipeline.intercept` takes a `tenant_id` keyword that flows onto
the `ActionRequest` and into the audit trail. Default `""` keeps
the v0.39 single-tenant contract.
- `AdaptiveScorer.evaluate` dispatches allow / deny thresholds per
tenant at call time. A new `policy_lookup` constructor arg (and
`set_policy_lookup` setter for late binding from `ServerState`)
takes a `Callable[[str], Optional[Policy]]`; on every evaluate, the
scorer asks the registry for the calling tenant's policy and uses
its thresholds. An unknown or unmapped tenant falls back to the
scorer-bound defaults that the default-slot listener keeps fresh on
reload. The backend decision dict surfaces the applied
`threshold_allow` and `threshold_deny` so operators can confirm
which tenant's policy ran. MWU expert state, the conformal
calibrator, agent profiles, and sequence patterns stay shared
across tenants; only threshold application is per-tenant in v0.40.

### Scope notes
- The HTTP transport on the proxy is POST-only. GET-SSE for
server-initiated notifications (sampling, server-pushed progress)
is v0.41. The audit + OVERT emission path for upstream-originated
notifications still works unchanged on stdio.
- Classifier bundle and conformal-calibrator hot-reload remain a
restart operation in v0.40. Per-tenant policy reload IS hot; that
is the configuration plane that needed to be live across tenants.
Classifier reload waits on a shared singleton lifecycle plus
per-tenant scoping question (v0.41 candidate).

## [0.39.2] - 2026-05-27

**Theme: SEP-2787 envelope v2 shape, full wire round-trip, versioned
Expand Down Expand Up @@ -1713,7 +1804,8 @@ and backward-compatible. Together they reposition Vaara from a Python
library to a runtime kernel that control planes, audit consumers, and
orchestration frameworks reference. The HTTP contract at
`docs/openapi.yaml` is versioned `/v1/` independently of the project
version, following the OPA pattern.
version, so the wire surface can stabilise without locking the
library cadence.

### Added
- **HTTP API reference server (`vaara[server]` extra).** Exposes the
Expand Down Expand Up @@ -1789,10 +1881,9 @@ it governs.
action class declared, matched sequences known).
- **`vaara.policy.test_cases_io` module.** `load_test_cases(path)`
reads a YAML or JSON cases document and returns a list of
`PolicyTestCase`. Document shape mirrors typical OPA / Conftest
test files: a top-level `cases:` list with `action_class`,
`risk_score`, optional `matched_sequences`, and an `expect:` block
carrying `verdict` and optional `route`.
`PolicyTestCase`. Document shape: a top-level `cases:` list with
`action_class`, `risk_score`, optional `matched_sequences`, and an
`expect:` block carrying `verdict` and optional `route`.
- **`vaara policy validate POLICY_PATH [--json]`** and **`vaara
policy test POLICY_PATH --cases CASES_PATH [--json]`** CLI
subcommands. Both honour standard CI exit codes: validate returns
Expand Down
23 changes: 21 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,13 +168,32 @@ if (r.decision === "deny") throw new Error("blocked");
`vaara.integrations.mcp_proxy.VaaraMCPProxy` sits between an MCP client (Claude Code, Cursor, any MCP-capable host) and an upstream MCP server. Every `tools/call` from the client routes through Vaara's interception pipeline before reaching the upstream. Allowed calls forward transparently and report the upstream outcome back to the scorer. Blocked calls return an MCP `isError: true` response with the block reason. The initialization handshake and `notifications/*` forward unchanged. `tools/list`, `resources/list`, `resources/read`, `prompts/list`, and `prompts/get` route through the operator perimeter before reaching the client or upstream.

```bash
python -m vaara.integrations.mcp_proxy \
vaara-mcp-proxy \
--upstream npx --upstream-arg -y --upstream-arg @sap/mdk-mcp-server \
--db ./mcp_audit.db
```

Point your MCP client at the proxy instead of the upstream. The audit chain captures every tool call without changing client or upstream behavior. Distinct from `mcp_server`, which exposes Vaara itself as an MCP server for agents that consult Vaara as a tool.

<details>
<summary>Fleet shape (v0.40): one proxy, many upstreams, multi-tenant policy</summary>

`vaara-mcp-proxy` also runs over Streamable HTTP with fan-out, so one process can serve a fleet of upstream MCP servers:

```bash
vaara-mcp-proxy \
--transport http \
--http-host 127.0.0.1 \
--http-port 8765 \
--upstream 'github=npx -y @github/mcp-server' \
--upstream 'sap=npx -y @sap/mdk-mcp-server'
```

Each `POST /mcp` reads two headers. `X-Vaara-Upstream` picks the upstream slot. `X-Vaara-Tenant` scopes the policy, audit chain, and OVERT envelope for that call. Single-upstream deployments keep the v0.39 silent-default contract. Multi-upstream deployments require `X-Vaara-Upstream` per call and return 400 with the available slot list when the header is missing.

The reference HTTP API server (`vaara serve --policy-dir DIR`) loads one YAML or JSON policy per file in the directory (filename stem becomes the `tenant_id`, `default.yaml` lands in the fallback slot) and hot-reloads per tenant via `POST /v1/policy/reload` with a `tenant_id` body field or `X-Vaara-Tenant` header. The scorer dispatches allow and deny thresholds per call against the calling tenant's policy at `evaluate()` time.
</details>

<details>
<summary>Operator perimeter: tool, resource, prompt filtering</summary>

Expand All @@ -194,7 +213,7 @@ vaara keygen --dev --out signing.pem
head -c 32 /dev/urandom > op.key

# 3. Run the proxy with OVERT emission turned on.
python -m vaara.integrations.mcp_proxy \
vaara-mcp-proxy \
--upstream npx --upstream-arg -y --upstream-arg @sap/mdk-mcp-server \
--overt-signing-key signing.pem \
--overt-operator-key op.key \
Expand Down
2 changes: 1 addition & 1 deletion clients/ts/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@vaara/client",
"version": "0.39.2",
"version": "0.40.0",
"description": "TypeScript client for the Vaara HTTP API. Conformal risk scoring, hash-chained audit, policy reload, named detectors.",
"main": "dist/index.js",
"types": "dist/index.d.ts",
Expand Down
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "vaara"
version = "0.39.2"
version = "0.40.0"
description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance"
requires-python = ">=3.10"
license = "Apache-2.0"
Expand Down Expand Up @@ -58,6 +58,7 @@ rebuff = ["rebuff>=0.1"]
[project.scripts]
vaara = "vaara.cli:main"
vaara-audit = "vaara.audit_cli:main"
vaara-mcp-proxy = "vaara.integrations.mcp_proxy:main"

[tool.setuptools.packages.find]
where = ["src"]
Expand Down
2 changes: 1 addition & 1 deletion src/vaara/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
oversight.
"""

__version__ = "0.39.2"
__version__ = "0.40.0"

from vaara.pipeline import InterceptionPipeline, InterceptionResult

Expand Down
8 changes: 7 additions & 1 deletion src/vaara/audit/sqlite_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,11 @@ def write_record(self, record: AuditRecord) -> None:
_strict_json_dumps(record.regulatory_articles),
record.previous_hash,
record.record_hash,
self._tenant_id,
# Per-record tenant_id wins so a single backend instance
# can serve a multi-tenant runtime (v0.40+). Empty record
# tenant_id falls back to instance scope for the legacy
# single-tenant init path.
record.tenant_id or self._tenant_id,
record.system_operation,
record.data_usage,
record.decision_making,
Expand Down Expand Up @@ -674,6 +678,7 @@ def _row_to_record(self, row: tuple) -> AuditRecord:
agent_id = self._redaction_cache[agent_id]
# Defensive indexing: rows from older queries may not include
# the v3 columns. Use a guard so loading old DBs still works.
tenant_id = row[11] if len(row) > 11 else ""
sys_op = row[12] if len(row) > 12 else None
data_use = row[13] if len(row) > 13 else None
dec_mk = row[14] if len(row) > 14 else None
Expand All @@ -689,6 +694,7 @@ def _row_to_record(self, row: tuple) -> AuditRecord:
regulatory_articles=json.loads(row[7]),
previous_hash=row[8],
record_hash=row[9],
tenant_id=tenant_id or "",
system_operation=sys_op,
data_usage=data_use,
decision_making=dec_mk,
Expand Down
50 changes: 50 additions & 0 deletions src/vaara/audit/trail.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,9 @@ class AuditRecord:
data_usage: Optional[str] = None
decision_making: Optional[str] = None
limitations: Optional[str] = None
# v0.40: multi-tenant scoping. Empty string = single-tenant deployment.
# Excluded from compute_hash() to preserve pre-v0.40 chain re-verification.
tenant_id: str = ""

def __post_init__(self) -> None:
# Loaded-from-DB records carry a non-empty record_hash. Skip
Expand Down Expand Up @@ -492,6 +495,12 @@ def __init__(
self._by_action: dict[str, list[AuditRecord]] = defaultdict(list)
self._last_hash = ""
self._on_record = on_record
# v0.40 multi-tenant: action_id -> tenant_id, seeded by
# record_action_requested. Subsequent record_* calls (decision,
# execution, escalation) look up the action_id so every record in
# the lifecycle carries the same tenant scope without forcing
# every caller to thread tenant_id through every method signature.
self._tenant_for_action: dict[str, str] = {}
# Counts on_record callback failures so callers can detect
# persistence divergence at runtime (e.g., DB gone, disk full).
# Without this, a silent logger.error is the only signal and the
Expand Down Expand Up @@ -554,9 +563,33 @@ def verify_chain(self) -> Optional[str]:

# ── Recording events ──────────────────────────────────────────

# Defense-in-depth cap for direct-trail callers that bypass the pipeline's
# length cap on tenant_id. The HTTP boundary already caps at 256 via the
# Pydantic schema, but the AuditTrail public API is reachable from
# embedders that construct ActionRequest directly. A 50MB tenant_id would
# otherwise balloon every record on the hash chain and the in-memory
# action -> tenant map.
_MAX_TENANT_ID_LEN = 256
# Soft cap on the action -> tenant map. Long-running multi-tenant
# deployments would otherwise leak memory at one entry per action,
# because OUTCOME_RECORDED arrives well after ACTION_REQUESTED and the
# map cannot be cleared at decision time. When the cap is reached the
# oldest 1/8 of the map is evicted; subsequent lookups for evicted
# actions fall back to "" tenant, which is the legacy single-tenant
# contract — correct fail-soft behaviour.
_MAX_ACTION_TENANT_MAP = 50_000

def record_action_requested(self, request: ActionRequest) -> str:
"""Record that an agent requested an action. Returns the action_id."""
action_id = str(uuid.uuid4())
tenant_id = getattr(request, "tenant_id", "") or ""
if tenant_id:
tenant_id = self._cap_record_str(tenant_id, self._MAX_TENANT_ID_LEN)
if len(self._tenant_for_action) >= self._MAX_ACTION_TENANT_MAP:
evict = max(1, self._MAX_ACTION_TENANT_MAP // 8)
for stale in list(self._tenant_for_action)[:evict]:
self._tenant_for_action.pop(stale, None)
self._tenant_for_action[action_id] = tenant_id

articles = self._get_regulatory_articles(
EventType.ACTION_REQUESTED,
Expand Down Expand Up @@ -600,10 +633,20 @@ def record_action_requested(self, request: ActionRequest) -> str:
"sequence_position": request.sequence_position,
},
regulatory_articles=articles,
tenant_id=tenant_id,
))

return action_id

def _tenant_for(self, action_id: str) -> str:
"""Resolve the tenant scope for an existing action lifecycle.

Returns the tenant_id captured at record_action_requested time so
every follow-up record (risk_scored, decision, execution,
escalation, outcome) carries the same scope automatically.
"""
return self._tenant_for_action.get(action_id, "")

def record_risk_scored(
self,
action_id: str,
Expand Down Expand Up @@ -631,6 +674,7 @@ def record_risk_scored(
tool_name=self._cap_record_str(tool_name, self._MAX_TOOL_NAME_LEN),
data=safe_assessment,
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

def record_decision(
Expand Down Expand Up @@ -666,6 +710,7 @@ def record_decision(
"risk_score": risk_score,
},
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

def record_execution(
Expand Down Expand Up @@ -702,6 +747,7 @@ def record_execution(
data={"result_summary": self._cap_record_dict_bytes(
safe_result, self._MAX_EXECUTION_RESULT_JSON_BYTES
)},
tenant_id=self._tenant_for(action_id),
))

def record_escalation(
Expand Down Expand Up @@ -731,6 +777,7 @@ def record_escalation(
"risk_score": risk_score,
},
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

def record_escalation_resolved(
Expand Down Expand Up @@ -760,6 +807,7 @@ def record_escalation_resolved(
"justification": self._cap_record_str(justification, self._MAX_JUSTIFICATION_LEN),
},
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

def record_outcome(
Expand Down Expand Up @@ -794,6 +842,7 @@ def record_outcome(
),
},
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

# Length caps for caller-controlled free-text fields on this direct
Expand Down Expand Up @@ -923,6 +972,7 @@ def record_policy_override(
"new_decision": new_decision,
},
regulatory_articles=articles,
tenant_id=self._tenant_for(action_id),
))

# ── Querying ──────────────────────────────────────────────────
Expand Down
Loading