vaaraio · vaaraio · May 30, 2026 · May 30, 2026 · May 30, 2026 · May 30, 2026
@@ -43,3 +43,34 @@ claude-code-audit.db
 .pr_body_*.md
 .issue_body_*.md
 .comment_body_*.md
+
+# Private scratch drafts (replies, research, proposals, BD) — never publish
+.recruiter_*
+.reply_*
+.research_*
+.proposal_*
+.tier1_*
+.brand_book.md
+
+# One-off ops/deploy scratch scripts and payloads — never publish
+.apply_*.sh
+.fix_*.sh
+.deploy_*.sh
+.restart_*.sh
+.style_*.sh
+.pr_create_*.sh
+.pr_comment_*.md
+.tag_payload.json
+.gen_evidence_pair.py
+.tmp_*.py
+
+# Stray shell/editor env dotfiles (not part of the repo)
+.bashrc
+.bash_profile
+.zshrc
+.zprofile
+.profile
+.gitconfig
+.ripgreprc
+.idea/
+.vscode/
@@ -6,6 +6,57 @@ and this project follows [Semantic Versioning](https://semver.org/spec/v2.0.0.ht
 
 ## [Unreleased]
 
+## [0.45.1] - 2026-05-30
+
+**Theme: audit-finding fixes on the remote HTTP connector, the HTTP transport, and the public numbers.**
+
+### Security
+- SSRF egress floor on the `--upstream-url` connector. The remote HTTP connector
+  handed a user-supplied upstream URL straight to `urllib` and followed
+  redirects with the static `Authorization` header attached, so a hostile or
+  compromised upstream (or an attacker controlling a redirect target) could aim
+  the proxy at the cloud instance-metadata service or an internal host and have
+  it fetch the target with the operator's bearer token. The new `_egress_guard`
+  resolves the host and refuses loopback, link-local, RFC1918, IPv6 ULA, and the
+  cloud-metadata address (including its dotless and IPv4-mapped encodings) before
+  any socket opens; a guarded opener caps redirects, re-applies the floor to each
+  hop, and drops the auth header on a cross-origin redirect. Default is SAFE; a
+  trusted internal upstream is opted in via `--allow-private-upstream-hosts`,
+  the `allow_private_hosts` constructor arg, or the
+  `VAARA_MCP_ALLOW_PRIVATE_UPSTREAM` env flag. The metadata address stays refused
+  even with the opt-in.
+- DNS-rebind closure on that egress floor. Resolving the host and then handing
+  the name back to `urllib` left a gap: `urllib` re-resolved at socket-connect,
+  so a name that answered with a public address at the check and a blocked one a
+  moment later (a time-split rebind) reached the blocked target with the auth
+  header attached. The connector now validates and pins the address at connect
+  time and dials the IP literal, so the address that passed the floor is the
+  exact address the socket reaches; HTTPS still verifies the certificate against
+  the original hostname. The pin is re-applied on every redirect hop. An absent
+  `--allow-private-upstream-hosts` flag now leaves the
+  `VAARA_MCP_ALLOW_PRIVATE_UPSTREAM` env opt-in live instead of silently
+  shadowing it with a `False`.
+
+### Fixed
+- HTTP transport no longer serialises concurrent requests. The POST `/mcp`
+  endpoint ran the blocking `_handle_request` inline on the event loop, so one
+  slow upstream stalled every other POST, SSE drain, and `/health` (real
+  concurrency 1). It now runs on a worker thread via `asyncio.to_thread`, with
+  the per-request ContextVars preserved across the hop through
+  `contextvars.copy_context()`.
+- SSE reconnect race that dropped notifications for the live session. On
+  reconnect under the same `Mcp-Session-Id`, the old stream's teardown
+  unregistered the NEW session. `unregister_session` is now identity-checked and
+  only removes the entry when it is still the tearing-down stream's own state.
+- README mislabelled the rule-scorer latency as classifier latency. The
+  140 µs / 210 µs figure is the hot-path rule scorer; the MiniLM classifier is
+  opt-in (`vaara[ml]`) and not in that path. Also surfaces the cross-model
+  held-out recall (66.8%) and its weakest sub-cell (38.9%) the bench docs
+  already disclose.
+- `llms.txt` advertised a two-generations-stale classifier (5,955-entry corpus,
+  97.1% at threshold 0.55). Regenerated from the current v9 numbers and switched
+  the lede to the tamper-evident runtime evidence framing.
+
 ## [0.45.0] - 2026-05-30
 
 **Theme: reach remote MCP upstreams over HTTP, and make the proxy's Streamable HTTP handling conform to the spec.**

@@ -20,14 +20,15 @@ Vaara intercepts agent tool calls, scores each one with a conformal risk interva
 
 ## Numbers
 
-Held-out TEST recall 84.7% (95% Wilson [82.4, 86.7]) at FPR 4.1% [2.9, 5.7]. Phase 1 PAIR scale-up to n=300 per attacker family lands at 88.1% [85.8, 90.1]. Under BIPIA-pressure context, false-positive rate on benign tool calls 1.2% [0.4, 3.6] across four agent backends (Claude Haiku 4.5, Llama-3.1-8B, Mistral-7B, Qwen-2.5-7B). Multi-attacker PAIR ASR 0/25 across three different attacker models with identical seeds. 140 µs mean / 210 µs p99 inference latency on commodity CPU (excluding one-time embedding model load). Every number reproducible end-to-end via `make bench`.
+Held-out TEST recall 84.7% (95% Wilson [82.4, 86.7]) at FPR 4.1% [2.9, 5.7]. Phase 1 PAIR scale-up to n=300 per attacker family lands at 88.1% [85.8, 90.1]. Cross-model held-out recall, where no attacker model in the eval set was in TRAIN, is 66.8% [64.9, 68.7] over n=2,277; the weakest sub-cell is data_exfil against a closed-weight model at 38.9% [35.3, 42.5] (see [vaara-bench-v0.37](bench/vaara-bench-v0.37.md)). Under BIPIA-pressure context, false-positive rate on benign tool calls 1.2% [0.4, 3.6] across four agent backends (Claude Haiku 4.5, Llama-3.1-8B, Mistral-7B, Qwen-2.5-7B). Multi-attacker PAIR ASR 0/25 across three different attacker models with identical seeds. The rule scorer that runs in the hot path adds 140 µs mean / 210 µs p99 per call on commodity CPU; the MiniLM classifier is opt-in (`vaara[ml]`) and is not in that measured path. Every number reproducible end-to-end via `make bench`.
 
 - 12,155-entry adversarial corpus (250 hand-curated + 11,905 LLM-generated), 70/15/15 split stratified by (category, source)
 - Classifier v9 with 236 hand-features + 384-dim MiniLM embeddings at calibrated threshold 0.9150 on held-out TEST n=1,827: recall 84.7% [82.4, 86.7] at FPR 4.1% [2.9, 5.7]
 - Multi-attacker PAIR robustness: 0/25 successes per attacker across Qwen2.5-32B, Qwen2.5-72B, Llama-3.3-70B hitting identical seed indices, Wilson upper 13.3%
 - BIPIA-pressure FPR on benign tool calls 1.2% [0.4, 3.6] across four agent backends, n=244 benign tool calls under `context.source=injected_via_bipia_<class>`
+- Cross-model held-out recall 66.8% [64.9, 68.7] over n=2,277 with no eval-set attacker model in TRAIN; data_exfil generalises unevenly, with a closed-weight sub-cell at 38.9% [35.3, 42.5]. This is the honest worst case; the in-distribution TEST number above is the easier denominator
 - Chain of custody: corpus manifest SHA, split manifest SHA, training commit, bundle SHA, all locked and printed by every script
-- 140 µs mean / 210 µs p99 inference latency, commodity CPU
+- 140 µs mean / 210 µs p99 for the hot-path rule scorer on commodity CPU; the MiniLM classifier is opt-in (`vaara[ml]`) and not in that path
 - Distribution-free conformal coverage on the score
 - MWU regret bound O(sqrt(T log N))
 - [vaara-bench-v0.39](bench/vaara-bench-v0.39.md): current methodology, chain of custody, ship-gate record. v9 retrain on BIPIA-augmented corpus with follows upweighted (`--follow-weight 8.0`), calibrated to T=0.9150 at a 5% FPR target on v035 VAL. BIPIA-pressure FPR collapses from 35.2% on v8 to 1.2% on v9. In-distribution recall flat within Wilson intervals. Found-and-fixed in tree: auto-labeller `example.com` placeholder false-positive rule (42 to 14 true follows across four backends). Historical bench docs live under `bench/` for chain-of-custody continuity.
@@ -162,6 +163,8 @@ vaara-mcp-proxy \
 
 Point your MCP client at the proxy instead of the upstream. The audit chain captures every tool call without changing client or upstream behavior. Distinct from `mcp_server`, which exposes Vaara itself as an MCP server for agents that consult Vaara as a tool.
 
+Upstreams can be local or remote. `--upstream` launches a local stdio MCP server; `--upstream-url NAME=URL` connects to a remote MCP server over the Streamable HTTP transport, and a bare `--upstream-url URL` lands in the `default` slot. Each slot is one transport or the other, never both.
+
 <details>
 <summary>Fleet shape (v0.40): one proxy, many upstreams, multi-tenant policy</summary>
 

@@ -1,6 +1,6 @@
 {
   "name": "@vaara/client",
-  "version": "0.45.0",
+  "version": "0.45.1",
   "mcpName": "io.github.vaaraio/vaara",
   "description": "TypeScript client for the Vaara HTTP API. Conformal risk scoring, hash-chained audit, policy reload, named detectors.",
   "main": "dist/index.js",

@@ -1,10 +1,10 @@
 # Vaara
 
-> Runtime evidence layer for EU AI Act compliance. Open source, no SaaS, no telemetry.
+> Tamper-evident runtime evidence layer for AI agents. Covers EU AI Act compliance and any case where you need to prove what an agent actually did. Open source, no SaaS, no telemetry.
 
-Vaara intercepts agent tool calls, scores each one with a conformal risk interval, and writes a hash-chained audit record. Online learning across five expert signals via Multiplicative Weight Update. Distribution-free conformal coverage on the score.
+Vaara intercepts agent tool calls, scores each one with a conformal risk interval, and writes a hash-chained audit record. Online learning across five expert signals via Multiplicative Weight Update. Distribution-free conformal coverage on the score. An external auditor can verify these properties without trusting your stack.
 
-Position: runtime governance and enforcement layer. Implements OVERT 1.0 (Glacis Technologies, March 2026) as the Arbiter role at AAL-3 Phase 2.
+Position: tamper-evident runtime evidence and enforcement layer. Signed attestation plus execution receipts pair each MCP tool call to the policy that allowed it.
 
 ## Repo and packages
 - [GitHub source](https://github.com/vaaraio/vaara): code, releases, issue tracker
@@ -26,10 +26,12 @@ Position: runtime governance and enforcement layer. Implements OVERT 1.0 (Glacis
 - OVERT 1.0 emitter, verifier CLI, S3P (MEA-2) emitter with Clopper-Pearson intervals, experimental AMD SEV-SNP TEE attestation hook
 
 ## Numbers
-- 5,955-entry adversarial corpus (3,422 attack across 8 categories, 2,533 benign)
-- 97.1% attack recall on held-out distribution-shift split, threshold 0.55
-- PAIR adaptive-attacker calibration: ASR 0/25 against Qwen2.5-32B
-- 140 µs / 210 µs p99 inference latency, commodity CPU
+- 12,155-entry adversarial corpus (250 hand-curated + 11,905 LLM-generated), 70/15/15 split stratified by (category, source)
+- Classifier v9 (236 hand-features + 384-dim MiniLM embeddings) at calibrated threshold 0.9150: held-out TEST recall 84.7% [82.4, 86.7] at FPR 4.1% [2.9, 5.7], n=1,827
+- Cross-model held-out recall 66.8% [64.9, 68.7] over n=2,277 with no eval-set attacker model in TRAIN; weakest sub-cell (data_exfil, closed-weight) 38.9% [35.3, 42.5]
+- BIPIA-pressure FPR on benign tool calls 1.2% [0.4, 3.6] across four agent backends
+- Multi-attacker PAIR ASR 0/25 per attacker across Qwen2.5-32B, Qwen2.5-72B, Llama-3.3-70B at identical seeds
+- 140 µs mean / 210 µs p99 for the hot-path rule scorer, commodity CPU; the MiniLM classifier is opt-in (`vaara[ml]`) and not in that path
 
 ## Optional
 - [Article 14 runtime](https://futurium.ec.europa.eu/ga/apply-ai-alliance/community-content/article-14-runtime-why-oversight-agentic-ai-has-be-evidenced-action-not-model): position post on EU Apply AI Alliance Futurium

@@ -4,8 +4,8 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "vaara"
-version = "0.45.0"
-description = "Tamper-evident runtime evidence layer for AI agents: risk scoring, audit trails, and regulatory compliance"
+version = "0.45.1"
+description = "Tamper-evident runtime evidence layer for AI agents: conformal risk scoring, hash-chained audit trails, and signed attestation plus execution receipts per MCP tool call"
 requires-python = ">=3.10"
 license = "Apache-2.0"
 readme = "README.md"

@@ -8,13 +8,13 @@
     "url": "https://github.com/vaaraio/vaara",
     "source": "github"
   },
-  "version": "0.45.0",
+  "version": "0.45.1",
   "packages": [
     {
       "registryType": "pypi",
       "registryBaseUrl": "https://pypi.org",
       "identifier": "vaara",
-      "version": "0.45.0",
+      "version": "0.45.1",
       "runtimeHint": "uvx",
       "transport": {
         "type": "stdio"

@@ -6,7 +6,7 @@
 oversight.
 """
 
-__version__ = "0.45.0"
+__version__ = "0.45.1"
 
 from vaara.pipeline import InterceptionPipeline, InterceptionResult
 

@@ -710,19 +710,25 @@ def _cmd_trail_receipt(args: argparse.Namespace) -> int:
 
 
 def _cmd_compliance_dashboard(args: argparse.Namespace) -> int:
-    from vaara.audit.sqlite_backend import SQLiteAuditTrail
+    from vaara.audit.sqlite_backend import SQLiteAuditBackend
     from vaara.compliance.dashboard import render_html
-    from vaara.compliance.engine import ComplianceEngine
+    from vaara.compliance.engine import create_default_engine
 
     db_path = Path(args.db).expanduser()
     if not db_path.is_file():
         print(f"vaara compliance dashboard: not a file: {db_path}", file=sys.stderr)
         return 2
 
-    trail = SQLiteAuditTrail(str(db_path))
-    engine = ComplianceEngine()
+    backend = SQLiteAuditBackend(str(db_path))
+    try:
+        trail = backend.load_trail()
+    except Exception as exc:
+        print(f"failed to load audit trail: {exc}", file=sys.stderr)
+        return 2
-    backend = SQLiteAuditBackend(str(db_path))
-    try:
-        trail = backend.load_trail()
-    except Exception as exc:
-        print(f"failed to load audit trail: {exc}", file=sys.stderr)
-        return 2
+    with SQLiteAuditBackend(str(db_path)) as backend:
+        try:
+            trail = backend.load_trail()
+        except Exception as exc:
+            print(f"failed to load audit trail: {exc}", file=sys.stderr)
+            return 2
-    backend = SQLiteAuditBackend(str(db_path))
-    try:
-        trail = backend.load_trail()
-    except Exception as exc:
-        print(f"failed to load audit trail: {exc}", file=sys.stderr)
-        return 2
+    with SQLiteAuditBackend(str(db_path)) as backend:
+        try:
+            trail = backend.load_trail()
+        except Exception as exc:
+            print(f"failed to load audit trail: {exc}", file=sys.stderr)
+            return 2
+
+    engine = create_default_engine()
     report = engine.assess(
-        trail=trail,
+        trail,
         system_name=args.system_name,
         system_version=args.system_version,
     )