vaaraio · vaaraio · May 27, 2026 · May 27, 2026
@@ -30,6 +30,9 @@ jobs:
       - name: Install Vaara (editable, no deps)
         run: pip install -e . --no-deps
 
+      - name: Install server extra (fastapi, uvicorn, httpx) for HTTP transport tests
+        run: pip install 'fastapi>=0.110' 'uvicorn>=0.27' 'httpx>=0.27'
+
       - name: Lint (ruff)
         run: ruff check .
 

@@ -6,6 +6,97 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 
 ## [Unreleased]
 
+## [0.40.0] - 2026-05-28
+
+**Theme: deployment shape. One Vaara process now serves a fleet of
+upstream MCP servers, with multi-tenant policy, audit, and attestation
+on the same substrate.**
+
+The v0.39 sidecar shape ran one Vaara process per upstream. v0.40
+turns that into a single process that speaks Streamable HTTP, holds
+N upstream MCP-server connections, picks the upstream per request
+from a header, scopes every score, audit record, and OVERT envelope
+to a tenant, and reloads per-tenant policy in place.
+
+### Added
+- `vaara-mcp-proxy --transport http --http-host H --http-port P`:
+  Streamable HTTP transport at `POST /mcp`, backed by FastAPI /
+  uvicorn (the `vaara[server]` extra already shipped in v0.39 for
+  `vaara serve`). The endpoint reads `X-Vaara-Tenant` and
+  `X-Vaara-Upstream` per request, pushes them into ContextVars, and
+  dispatches into the existing `_handle_request` path so the policy,
+  perimeter, OVERT, and progress-notification handling all light up
+  unchanged. Notifications (no JSON-RPC `id`) return 202 Accepted.
+  Bodies above 1 MiB return 413.
+- `vaara-mcp-proxy --upstream NAME=CMD` (repeatable) for fan-out.
+  One Vaara process holds N `UpstreamMCPClient` instances in a name
+  -> client map. Bare `--upstream CMD` keeps the v0.39 single-
+  upstream contract; it lands in the "default" slot. Commands that
+  themselves contain `=` (e.g. `python -m foo --bar=baz`) stay
+  intact because the name-side regex only matches short alphanumeric
+  slugs. When more than one upstream is configured, a request with
+  no `X-Vaara-Upstream` header returns 400 with the list of valid
+  slots; silent fallback to whichever slot won the sort would be a
+  failure mode that surfaces only in production. Single-upstream
+  deployments keep the silent-default contract.
+- `tenant_id` is first-class through the request, decision, audit,
+  and attestation layers:
+  - `ScoreRequest`, `AuditEventRequest`, and `PolicyReloadRequest`
+    accept a `tenant_id` body field, with `X-Vaara-Tenant` as the
+    HTTP-header alternative. Body wins over header.
+  - `AuditRecord` gains a `tenant_id` field, excluded from
+    `compute_hash()` so pre-v0.40 chains still re-verify on load.
+  - `AuditTrail` keeps an `action_id -> tenant_id` map seeded by
+    `record_action_requested`, so every follow-up record
+    (`risk_scored`, `decision`, `execution`, `escalation`,
+    `outcome`, `policy_override`) inherits the same scope without
+    every caller threading `tenant_id` through every signature.
+    The map is soft-capped (50k entries, 12.5% eviction on
+    pressure) so long-running deployments cannot leak memory.
+  - `SQLiteAuditBackend.write_record` prefers the per-record
+    `tenant_id` when set, with the instance-scoped `tenant_id`
+    (legacy CLI tooling path) as fallback. A single backend
+    instance can now serve a multi-tenant runtime.
+  - OVERT envelopes carry `tenant_id` as a `non_content_metadata`
+    claim when present.
+- `vaara.policy.registry.PolicyRegistry`: one `PolicyController` per
+  tenant, with the empty string slot reserved as the default
+  fallback for unmatched lookups.
+- `vaara serve --policy-dir DIR`: loads one YAML/JSON policy per
+  file. Filename stem = `tenant_id`; `default.yaml` lands in the
+  fallback slot. Mutually exclusive with `--policy`.
+- `POST /v1/policy/reload` accepts a `tenant_id` body field (or
+  `X-Vaara-Tenant` header) and routes to the right registry slot;
+  creates the slot on first reload.
+
+### Changed
+- `Pipeline.intercept` takes a `tenant_id` keyword that flows onto
+  the `ActionRequest` and into the audit trail. Default `""` keeps
+  the v0.39 single-tenant contract.
+- `AdaptiveScorer.evaluate` dispatches allow / deny thresholds per
+  tenant at call time. A new `policy_lookup` constructor arg (and
+  `set_policy_lookup` setter for late binding from `ServerState`)
+  takes a `Callable[[str], Optional[Policy]]`; on every evaluate, the
+  scorer asks the registry for the calling tenant's policy and uses
+  its thresholds. An unknown or unmapped tenant falls back to the
+  scorer-bound defaults that the default-slot listener keeps fresh on
+  reload. The backend decision dict surfaces the applied
+  `threshold_allow` and `threshold_deny` so operators can confirm
+  which tenant's policy ran. MWU expert state, the conformal
+  calibrator, agent profiles, and sequence patterns stay shared
+  across tenants; only threshold application is per-tenant in v0.40.
+
+### Scope notes
+- The HTTP transport on the proxy is POST-only. GET-SSE for
+  server-initiated notifications (sampling, server-pushed progress)
+  is v0.41. The audit + OVERT emission path for upstream-originated
+  notifications still works unchanged on stdio.
+- Classifier bundle and conformal-calibrator hot-reload remain a
+  restart operation in v0.40. Per-tenant policy reload IS hot; that
+  is the configuration plane that needed to be live across tenants.
+  Classifier reload waits on a shared singleton lifecycle plus
+  per-tenant scoping question (v0.41 candidate).
+
 ## [0.39.2] - 2026-05-27
 
 **Theme: SEP-2787 envelope v2 shape, full wire round-trip, versioned
@@ -1713,7 +1804,8 @@ and backward-compatible. Together they reposition Vaara from a Python
 library to a runtime kernel that control planes, audit consumers, and
 orchestration frameworks reference. The HTTP contract at
 `docs/openapi.yaml` is versioned `/v1/` independently of the project
-version, following the OPA pattern.
+version, so the wire surface can stabilise without locking the
+library cadence.
 
 ### Added
 - **HTTP API reference server (`vaara[server]` extra).** Exposes the
@@ -1789,10 +1881,9 @@ it governs.
   action class declared, matched sequences known).
 - **`vaara.policy.test_cases_io` module.** `load_test_cases(path)`
   reads a YAML or JSON cases document and returns a list of
-  `PolicyTestCase`. Document shape mirrors typical OPA / Conftest
-  test files: a top-level `cases:` list with `action_class`,
-  `risk_score`, optional `matched_sequences`, and an `expect:` block
-  carrying `verdict` and optional `route`.
+  `PolicyTestCase`. Document shape: a top-level `cases:` list with
+  `action_class`, `risk_score`, optional `matched_sequences`, and an
+  `expect:` block carrying `verdict` and optional `route`.
 - **`vaara policy validate POLICY_PATH [--json]`** and **`vaara
   policy test POLICY_PATH --cases CASES_PATH [--json]`** CLI
   subcommands. Both honour standard CI exit codes: validate returns

@@ -168,13 +168,32 @@ if (r.decision === "deny") throw new Error("blocked");
 `vaara.integrations.mcp_proxy.VaaraMCPProxy` sits between an MCP client (Claude Code, Cursor, any MCP-capable host) and an upstream MCP server. Every `tools/call` from the client routes through Vaara's interception pipeline before reaching the upstream. Allowed calls forward transparently and report the upstream outcome back to the scorer. Blocked calls return an MCP `isError: true` response with the block reason. The initialization handshake and `notifications/*` forward unchanged. `tools/list`, `resources/list`, `resources/read`, `prompts/list`, and `prompts/get` route through the operator perimeter before reaching the client or upstream.
 
 ```bash
-python -m vaara.integrations.mcp_proxy \
+vaara-mcp-proxy \
   --upstream npx --upstream-arg -y --upstream-arg @sap/mdk-mcp-server \
   --db ./mcp_audit.db
 ```
 
 Point your MCP client at the proxy instead of the upstream. The audit chain captures every tool call without changing client or upstream behavior. Distinct from `mcp_server`, which exposes Vaara itself as an MCP server for agents that consult Vaara as a tool.
 
+<details>
+<summary>Fleet shape (v0.40): one proxy, many upstreams, multi-tenant policy</summary>
+
+`vaara-mcp-proxy` also runs over Streamable HTTP with fan-out, so one process can serve a fleet of upstream MCP servers:
+
+```bash
+vaara-mcp-proxy \
+  --transport http \
+  --http-host 127.0.0.1 \
+  --http-port 8765 \
+  --upstream 'github=npx -y @github/mcp-server' \
+  --upstream 'sap=npx -y @sap/mdk-mcp-server'
+```
+
+Each `POST /mcp` reads two headers. `X-Vaara-Upstream` picks the upstream slot. `X-Vaara-Tenant` scopes the policy, audit chain, and OVERT envelope for that call. Single-upstream deployments keep the v0.39 silent-default contract. Multi-upstream deployments require `X-Vaara-Upstream` per call and return 400 with the available slot list when the header is missing.
+
+The reference HTTP API server (`vaara serve --policy-dir DIR`) loads one YAML or JSON policy per file in the directory (filename stem becomes the `tenant_id`, `default.yaml` lands in the fallback slot) and hot-reloads per tenant via `POST /v1/policy/reload` with a `tenant_id` body field or `X-Vaara-Tenant` header. The scorer dispatches allow and deny thresholds per call against the calling tenant's policy at `evaluate()` time.
+</details>
+
 <details>
 <summary>Operator perimeter: tool, resource, prompt filtering</summary>
 
@@ -194,7 +213,7 @@ vaara keygen --dev --out signing.pem
 head -c 32 /dev/urandom > op.key
 
 # 3. Run the proxy with OVERT emission turned on.
-python -m vaara.integrations.mcp_proxy \
+vaara-mcp-proxy \
   --upstream npx --upstream-arg -y --upstream-arg @sap/mdk-mcp-server \
   --overt-signing-key signing.pem \
   --overt-operator-key op.key \

@@ -1,6 +1,6 @@
 {
   "name": "@vaara/client",
-  "version": "0.39.2",
+  "version": "0.40.0",
   "description": "TypeScript client for the Vaara HTTP API. Conformal risk scoring, hash-chained audit, policy reload, named detectors.",
   "main": "dist/index.js",
   "types": "dist/index.d.ts",

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "vaara"
-version = "0.39.2"
+version = "0.40.0"
 description = "Adaptive AI Agent Execution Layer for risk scoring, audit trails, and regulatory compliance"
 requires-python = ">=3.10"
 license = "Apache-2.0"
@@ -58,6 +58,7 @@ rebuff = ["rebuff>=0.1"]
 [project.scripts]
 vaara = "vaara.cli:main"
 vaara-audit = "vaara.audit_cli:main"
+vaara-mcp-proxy = "vaara.integrations.mcp_proxy:main"
 
 [tool.setuptools.packages.find]
 where = ["src"]

@@ -6,7 +6,7 @@
 oversight.
 """
 
-__version__ = "0.39.2"
+__version__ = "0.40.0"
 
 from vaara.pipeline import InterceptionPipeline, InterceptionResult
 

@@ -357,7 +357,11 @@ def write_record(self, record: AuditRecord) -> None:
                     _strict_json_dumps(record.regulatory_articles),
                     record.previous_hash,
                     record.record_hash,
-                    self._tenant_id,
+                    # Per-record tenant_id wins so a single backend instance
+                    # can serve a multi-tenant runtime (v0.40+). Empty record
+                    # tenant_id falls back to instance scope for the legacy
+                    # single-tenant init path.
+                    record.tenant_id or self._tenant_id,
                     record.system_operation,
                     record.data_usage,
                     record.decision_making,
@@ -674,6 +678,7 @@ def _row_to_record(self, row: tuple) -> AuditRecord:
             agent_id = self._redaction_cache[agent_id]
         # Defensive indexing: rows from older queries may not include
         # the v3 columns. Use a guard so loading old DBs still works.
+        tenant_id = row[11] if len(row) > 11 else ""
         sys_op = row[12] if len(row) > 12 else None
         data_use = row[13] if len(row) > 13 else None
         dec_mk = row[14] if len(row) > 14 else None
@@ -689,6 +694,7 @@ def _row_to_record(self, row: tuple) -> AuditRecord:
             regulatory_articles=json.loads(row[7]),
             previous_hash=row[8],
             record_hash=row[9],
+            tenant_id=tenant_id or "",
             system_operation=sys_op,
             data_usage=data_use,
             decision_making=dec_mk,

@@ -258,6 +258,9 @@ class AuditRecord:
     data_usage: Optional[str] = None
     decision_making: Optional[str] = None
     limitations: Optional[str] = None
+    # v0.40: multi-tenant scoping. Empty string = single-tenant deployment.
+    # Excluded from compute_hash() to preserve pre-v0.40 chain re-verification.
+    tenant_id: str = ""
 
     def __post_init__(self) -> None:
         # Loaded-from-DB records carry a non-empty record_hash. Skip
@@ -492,6 +495,12 @@ def __init__(
         self._by_action: dict[str, list[AuditRecord]] = defaultdict(list)
         self._last_hash = ""
         self._on_record = on_record
+        # v0.40 multi-tenant: action_id -> tenant_id, seeded by
+        # record_action_requested. Subsequent record_* calls (decision,
+        # execution, escalation) look up the action_id so every record in
+        # the lifecycle carries the same tenant scope without forcing
+        # every caller to thread tenant_id through every method signature.
+        self._tenant_for_action: dict[str, str] = {}
         # Counts on_record callback failures so callers can detect
         # persistence divergence at runtime (e.g., DB gone, disk full).
         # Without this, a silent logger.error is the only signal and the
@@ -554,9 +563,33 @@ def verify_chain(self) -> Optional[str]:
 
     # ── Recording events ──────────────────────────────────────────
 
+    # Defense-in-depth cap for direct-trail callers that bypass the pipeline's
+    # length cap on tenant_id. The HTTP boundary already caps at 256 via the
+    # Pydantic schema, but the AuditTrail public API is reachable from
+    # embedders that construct ActionRequest directly. A 50MB tenant_id would
+    # otherwise balloon every record on the hash chain and the in-memory
+    # action -> tenant map.
+    _MAX_TENANT_ID_LEN = 256
+    # Soft cap on the action -> tenant map. Long-running multi-tenant
+    # deployments would otherwise leak memory at one entry per action,
+    # because OUTCOME_RECORDED arrives well after ACTION_REQUESTED and the
+    # map cannot be cleared at decision time. When the cap is reached the
+    # oldest 1/8 of the map is evicted; subsequent lookups for evicted
+    # actions fall back to "" tenant, which is the legacy single-tenant
+    # contract — correct fail-soft behaviour.
+    _MAX_ACTION_TENANT_MAP = 50_000
+
     def record_action_requested(self, request: ActionRequest) -> str:
         """Record that an agent requested an action.  Returns the action_id."""
         action_id = str(uuid.uuid4())
+        tenant_id = getattr(request, "tenant_id", "") or ""
+        if tenant_id:
+            tenant_id = self._cap_record_str(tenant_id, self._MAX_TENANT_ID_LEN)
+            if len(self._tenant_for_action) >= self._MAX_ACTION_TENANT_MAP:
+                evict = max(1, self._MAX_ACTION_TENANT_MAP // 8)
+                for stale in list(self._tenant_for_action)[:evict]:
+                    self._tenant_for_action.pop(stale, None)
+            self._tenant_for_action[action_id] = tenant_id
 
         articles = self._get_regulatory_articles(
             EventType.ACTION_REQUESTED,
@@ -600,10 +633,20 @@ def record_action_requested(self, request: ActionRequest) -> str:
                 "sequence_position": request.sequence_position,
             },
             regulatory_articles=articles,
+            tenant_id=tenant_id,
         ))
 
         return action_id
 
+    def _tenant_for(self, action_id: str) -> str:
+        """Resolve the tenant scope for an existing action lifecycle.
+
+        Returns the tenant_id captured at record_action_requested time so
+        every follow-up record (risk_scored, decision, execution,
+        escalation, outcome) carries the same scope automatically.
+        """
+        return self._tenant_for_action.get(action_id, "")
+
     def record_risk_scored(
         self,
         action_id: str,
@@ -631,6 +674,7 @@ def record_risk_scored(
             tool_name=self._cap_record_str(tool_name, self._MAX_TOOL_NAME_LEN),
             data=safe_assessment,
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     def record_decision(
@@ -666,6 +710,7 @@ def record_decision(
                 "risk_score": risk_score,
             },
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     def record_execution(
@@ -702,6 +747,7 @@ def record_execution(
             data={"result_summary": self._cap_record_dict_bytes(
                 safe_result, self._MAX_EXECUTION_RESULT_JSON_BYTES
             )},
+            tenant_id=self._tenant_for(action_id),
         ))
 
     def record_escalation(
@@ -731,6 +777,7 @@ def record_escalation(
                 "risk_score": risk_score,
             },
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     def record_escalation_resolved(
@@ -760,6 +807,7 @@ def record_escalation_resolved(
                 "justification": self._cap_record_str(justification, self._MAX_JUSTIFICATION_LEN),
             },
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     def record_outcome(
@@ -794,6 +842,7 @@ def record_outcome(
                 ),
             },
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     # Length caps for caller-controlled free-text fields on this direct
@@ -923,6 +972,7 @@ def record_policy_override(
                 "new_decision": new_decision,
             },
             regulatory_articles=articles,
+            tenant_id=self._tenant_for(action_id),
         ))
 
     # ── Querying ──────────────────────────────────────────────────