v0.45.1: SSRF DNS-rebind closure + HTTP transport/SSE audit fixes#173
Conversation
The dashboard subcommand imported SQLiteAuditTrail, a class the sqlite_backend module never exported, so `vaara compliance dashboard` raised ImportError on every invocation. Mirror the working report subcommand: open SQLiteAuditBackend, load_trail(), and pass the resulting AuditTrail into the engine. Add CLI-level smoke tests that exercise the full DB-open-to-HTML-write path, which the existing render_html-only tests never covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…itignore hardening) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The remote HTTP connector handed a user-supplied upstream URL straight to urllib.request.urlopen and followed redirects with the static Authorization header attached. A hostile or compromised upstream, or an attacker who controls a redirect target, could point the proxy at the cloud instance-metadata service or an internal RFC1918 host and have it fetch the target with the operator's bearer token. Add _egress_guard: a host-resolution floor that refuses loopback, link-local (IPv4 169.254/16 and IPv6 fe80::/10), RFC1918, IPv6 ULA (fc00::/7), the cloud-metadata address (incl. its dotless decimal/hex and IPv4-mapped encodings) before any socket opens. The metadata address is refused unconditionally. HttpUpstreamClient checks the URL at construction and routes every request through a guarded OpenerDirector whose redirect handler caps hops at 3, re-applies the floor to each target, and drops Authorization / Cookie on a cross-origin redirect. Default posture is SAFE. A trusted internal upstream is opted in explicitly via --allow-private-upstream-hosts, the allow_private_hosts constructor arg, or the VAARA_MCP_ALLOW_PRIVATE_UPSTREAM env flag; the opt-in never reopens the metadata address or the cross-origin auth drop. Tests: blocked metadata-IP upstream, blocked private-IP upstream, redirect to metadata refused mid-flight, Authorization not carried cross-origin, dotless and IPv4-mapped metadata encodings, opt-in path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The HTTP POST /mcp endpoint called proxy._handle_request(payload) inline. That is a blocking sync call that waits on the upstream up to its request timeout, so it parked the event loop for the whole call: real concurrency was 1, and one slow upstream stalled every other POST /mcp, GET /mcp SSE drain, and /health. Run it on a worker thread with asyncio.to_thread. The per-request ContextVars (_REQUEST_UPSTREAM / _REQUEST_TENANT / _REQUEST_SESSION / _REQUEST_INTENT) are set on the endpoint task's context, which a bare to_thread target would not inherit, so capture contextvars.copy_context() after the sets and run the handler with ctx.run on the worker thread. Upstream selection and tenant tagging keep working across the hop. Test: two concurrent POSTs against a deliberately slow upstream overlap in wall-clock instead of serialising, and each still routes to its own slot, proving the context survived the thread hop. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
unregister_session popped the session map entry by id alone. register_session replaces the entry with a fresh _SessionState on reconnect and closes the old one; the old stream's finally then ran unregister_session and popped the NEW state, silently dropping notifications for the just-reconnected live session. unregister_session now takes an optional expected state and pops only when the map entry is still that object. The GET /mcp SSE handler keeps its own my_state from register_session and passes it as the guard, so a stale teardown is a no-op and the reconnected session keeps receiving notifications. Test: register, reconnect under the same id, run the old stream's identity-checked unregister, and confirm the live session stays registered and still receives a targeted notification. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s-model holdout The README reported '140 us / 210 us inference latency (excluding one-time embedding model load)', which implied the MiniLM classifier sits in the measured path. The only latency artifact (bench/latency_results.json) times InterceptionPipeline.intercept() with the default rule scorer; the ML classifier is opt-in via vaara[ml] and is not loaded there. Relabel the figure as the hot-path rule scorer and note the classifier is out of that path. The headline also showed only the in-distribution TEST recall. Add the cross-model held-out number that bench/vaara-bench-v0.37 already publishes: 66.8% over n=2,277 with no eval-set attacker model in TRAIN, and the weakest sub-cell (data_exfil against a closed-weight model) at 38.9%. The easier in-distribution denominator stays, now next to the harder one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… evidence The Numbers block still advertised the v7-era classifier: a 5,955-entry corpus, 97.1% recall at threshold 0.55, and PAIR ASR only against Qwen2.5-32B. None of that matches the shipped v9 bundle. Regenerate from the current README and bench docs: 12,155-entry corpus, v9 at threshold 0.9150, held-out TEST recall 84.7% at FPR 4.1%, the 66.8% cross-model held-out number with its 38.9% weakest sub-cell, BIPIA-pressure FPR 1.2%, and multi-attacker PAIR ASR 0/25 across three attacker models. Latency is relabelled as the hot-path rule scorer, with the MiniLM classifier marked opt-in and out of that path. Switch the lede and position line to the tamper-evident runtime evidence framing the README and pyproject already use. OVERT references stay under Integrations and Optional as conformance surface. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bind safe) The SSRF egress floor resolved the upstream host and validated its addresses, then handed the hostname back to urllib, which re-resolved at socket-connect. A time-split DNS rebind (a name answering with a public address at the check and a blocked one a moment later) reached the blocked target with the operator's Authorization header attached. The guarded opener now installs pinned HTTP/HTTPS handlers that re-resolve, re-validate, and pin the address at connect time, then dial the IP literal so no second DNS lookup can occur. HTTPS verifies the certificate against the original hostname via SNI. The pin runs on the POST path, the standing GET SSE path, and every redirect hop. Also: --allow-private-upstream-hosts defaulted to False, which shadowed the VAARA_MCP_ALLOW_PRIVATE_UPSTREAM env opt-in on the CLI path. It now defaults to None, so an absent flag leaves the env opt-in live. Adds time-split rebind regression tests asserting the connector refuses the rebound address before opening a socket and dials the validated IP literal rather than the hostname. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nd SSE audit fixes Stamps the audit-finding fix bundle as 0.45.1: the DNS-rebind closure on the --upstream-url egress floor (pin the validated IP at connect time), the HTTP transport concurrency fix, the identity-checked SSE unregister, and the public-numbers corrections. Full suite 1067 passed, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughVersion 0.45.1 introduces SSRF egress protections for remote HTTP MCP connectors with DNS rebind defense, fixes concurrent HTTP request serialization via worker threads, resolves SSE reconnect notification loss through identity-guarded session teardown, refactors compliance dashboard to use a backend factory pattern, and publishes updated release documentation with tamper-evidence framing and refreshed evaluation metrics. ChangesHTTP Remote Upstream Security & Concurrency Fixes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| for record in trail._records: | ||
| backend.write_record(record) | ||
| finally: | ||
| backend.close() |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/vaara/integrations/mcp_proxy.py (1)
607-624:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOffload JSON-RPC notifications too.
The new
to_threadhandoff only covers requests with anid. Notifications still callproxy._handle_client_notification(payload)inline, so a slow upstreamnotify()can still block the event loop and stall other HTTP traffic on this worker.Proposed fix
if isinstance(payload, dict) and "id" not in payload: try: - proxy._handle_client_notification(payload) + ctx = contextvars.copy_context() + await asyncio.to_thread( + ctx.run, proxy._handle_client_notification, payload, + ) except ProxyError: logger.exception("Failed to forward HTTP notification") return Response(status_code=202)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/vaara/integrations/mcp_proxy.py` around lines 607 - 624, The JSON-RPC notification branch currently calls proxy._handle_client_notification(payload) inline and can block the event loop; change it to run inside the same context-backed worker thread pattern used for _handle_request: capture ctx = contextvars.copy_context() and call await asyncio.to_thread(ctx.run, proxy._handle_client_notification, payload) instead of the direct call, preserving the existing ProxyError exception handling and still returning Response(status_code=202) after scheduling/completing the offloaded call.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/vaara/cli.py`:
- Around line 722-727: The SQLiteAuditBackend instance created by
SQLiteAuditBackend(str(db_path)) is never closed, leaking the DB connection;
update the code to ensure backend is closed after load_trail() whether it
succeeds or raises (e.g., use a with-statement if SQLiteAuditBackend supports
context management or call backend.close() in a finally block after calling
backend.load_trail()), referencing SQLiteAuditBackend and load_trail so the
connection is always released on both success and in the except path.
In `@src/vaara/integrations/_egress_guard.py`:
- Around line 68-86: The current logic lets allow_private bypass the entire
_ip_is_blocked() set; change it so addresses matched by _ip_is_never_allowed(ip)
are rejected unconditionally, and only the private-ish predicate
(_ip_is_privateish or the private/link-local/loopback pieces) is gated by
allow_private. Concretely: wherever allow_private short-circuits calling
_ip_is_blocked, instead first call _ip_is_never_allowed(ip) and return True if
it matches; if not matched then, if allow_private is True, only apply
_ip_is_privateish(ip) to decide allow/deny, otherwise call the full
_ip_is_blocked(ip) (or keep the original checks in _ip_is_blocked). Ensure
_ip_is_blocked itself still contains the full set of checks (including
_is_metadata, is_reserved, is_multicast, is_unspecified) so non-private ranges
remain blocked even when allow_private is set.
---
Outside diff comments:
In `@src/vaara/integrations/mcp_proxy.py`:
- Around line 607-624: The JSON-RPC notification branch currently calls
proxy._handle_client_notification(payload) inline and can block the event loop;
change it to run inside the same context-backed worker thread pattern used for
_handle_request: capture ctx = contextvars.copy_context() and call await
asyncio.to_thread(ctx.run, proxy._handle_client_notification, payload) instead
of the direct call, preserving the existing ProxyError exception handling and
still returning Response(status_code=202) after scheduling/completing the
offloaded call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 01b1284e-2033-4913-b2a5-3b09f1d5cade
📒 Files selected for processing (19)
.gitignoreCHANGELOG.mdREADME.mdclients/ts/package.jsonllms.txtpyproject.tomlserver.jsonsrc/vaara/__init__.pysrc/vaara/cli.pysrc/vaara/integrations/_egress_guard.pysrc/vaara/integrations/_mcp_notify.pysrc/vaara/integrations/_mcp_upstream_http.pysrc/vaara/integrations/mcp_proxy.pytests/test_compliance_dashboard.pytests/test_http_concurrency.pytests/test_mcp_egress_guard.pytests/test_mcp_notify.pytests/test_mcp_upstream_http.pytests/test_mcp_upstream_rebind.py
| backend = SQLiteAuditBackend(str(db_path)) | ||
| try: | ||
| trail = backend.load_trail() | ||
| except Exception as exc: | ||
| print(f"failed to load audit trail: {exc}", file=sys.stderr) | ||
| return 2 |
There was a problem hiding this comment.
Close the SQLite backend after loading the trail.
Line 722 opens SQLiteAuditBackend, but neither the success path nor the load_trail() error path closes it. When main() is invoked in-process, that can leak the connection and leave the SQLite file locked for later operations.
♻️ Proposed fix
- backend = SQLiteAuditBackend(str(db_path))
- try:
- trail = backend.load_trail()
- except Exception as exc:
- print(f"failed to load audit trail: {exc}", file=sys.stderr)
- return 2
+ with SQLiteAuditBackend(str(db_path)) as backend:
+ try:
+ trail = backend.load_trail()
+ except Exception as exc:
+ print(f"failed to load audit trail: {exc}", file=sys.stderr)
+ return 2📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| backend = SQLiteAuditBackend(str(db_path)) | |
| try: | |
| trail = backend.load_trail() | |
| except Exception as exc: | |
| print(f"failed to load audit trail: {exc}", file=sys.stderr) | |
| return 2 | |
| with SQLiteAuditBackend(str(db_path)) as backend: | |
| try: | |
| trail = backend.load_trail() | |
| except Exception as exc: | |
| print(f"failed to load audit trail: {exc}", file=sys.stderr) | |
| return 2 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vaara/cli.py` around lines 722 - 727, The SQLiteAuditBackend instance
created by SQLiteAuditBackend(str(db_path)) is never closed, leaking the DB
connection; update the code to ensure backend is closed after load_trail()
whether it succeeds or raises (e.g., use a with-statement if SQLiteAuditBackend
supports context management or call backend.close() in a finally block after
calling backend.load_trail()), referencing SQLiteAuditBackend and load_trail so
the connection is always released on both success and in the except path.
| def _ip_is_blocked(ip: ipaddress._BaseAddress) -> bool: | ||
| """True iff this resolved address must never be reached by default. | ||
|
|
||
| Covers the metadata addresses plus loopback, link-local (IPv4 169.254/16 | ||
| and IPv6 fe80::/10), private (RFC1918 and ULA fc00::/7), unspecified, | ||
| reserved, and multicast. | ||
| """ | ||
| mapped = getattr(ip, "ipv4_mapped", None) | ||
| if mapped is not None: # ::ffff:a.b.c.d judged on the embedded v4 address | ||
| ip = mapped | ||
| return ( | ||
| _is_metadata(ip) | ||
| or ip.is_loopback | ||
| or ip.is_link_local | ||
| or ip.is_private | ||
| or ip.is_unspecified | ||
| or ip.is_reserved | ||
| or ip.is_multicast | ||
| ) |
There was a problem hiding this comment.
Keep non-private address classes blocked even when allow_private is enabled.
allow_private currently bypasses the full _ip_is_blocked() set, so opting into loopback/RFC1918 also re-opens 0.0.0.0, multicast, and reserved ranges. That is broader than the flag name and the PR contract, and it weakens the egress floor for callers that only intended to trust internal hosts.
Suggested direction
-def _ip_is_blocked(ip: ipaddress._BaseAddress) -> bool:
+def _ip_is_never_allowed(ip: ipaddress._BaseAddress) -> bool:
mapped = getattr(ip, "ipv4_mapped", None)
if mapped is not None:
ip = mapped
return (
_is_metadata(ip)
- or ip.is_loopback
- or ip.is_link_local
- or ip.is_private
or ip.is_unspecified
or ip.is_reserved
or ip.is_multicast
)
+
+
+def _ip_is_privateish(ip: ipaddress._BaseAddress) -> bool:
+ mapped = getattr(ip, "ipv4_mapped", None)
+ if mapped is not None:
+ ip = mapped
+ return ip.is_loopback or ip.is_link_local or ip.is_privateThen reject _ip_is_never_allowed(ip) unconditionally and only gate _ip_is_privateish(ip) behind allow_private.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/vaara/integrations/_egress_guard.py` around lines 68 - 86, The current
logic lets allow_private bypass the entire _ip_is_blocked() set; change it so
addresses matched by _ip_is_never_allowed(ip) are rejected unconditionally, and
only the private-ish predicate (_ip_is_privateish or the
private/link-local/loopback pieces) is gated by allow_private. Concretely:
wherever allow_private short-circuits calling _ip_is_blocked, instead first call
_ip_is_never_allowed(ip) and return True if it matches; if not matched then, if
allow_private is True, only apply _ip_is_privateish(ip) to decide allow/deny,
otherwise call the full _ip_is_blocked(ip) (or keep the original checks in
_ip_is_blocked). Ensure _ip_is_blocked itself still contains the full set of
checks (including _is_metadata, is_reserved, is_multicast, is_unspecified) so
non-private ranges remain blocked even when allow_private is set.
…ckend close (#174) * fix(proxy): tighten egress opt-in, offload notifications, close audit backend Three findings from the #173 review, fixed before cutting the v0.45.1 release: - _egress_guard: allow_private bypassed the whole block set, so opting into private upstream hosts also re-opened 0.0.0.0, multicast, and reserved ranges. The never-routable classes (metadata, unspecified, reserved, multicast) are now always refused; only loopback/link-local/private are gated by the opt-in. Adds a regression test asserting the opt-in still refuses those classes. - mcp_proxy HTTP transport: the JSON-RPC notification branch still called _handle_client_notification inline, so a slow upstream notify() could park the event loop the request-path fix just freed. Offloaded to a worker thread on the same copied context. - cli: the three SQLiteAuditBackend call sites (compliance dashboard, assess, trail receipt) never closed the backend, leaking the connection and locking the DB file under in-process invocation. Context-managed. 1072 passed, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): record the three #174 hardening fixes under [0.45.1] Egress opt-in narrowed to the private classes, HTTP notification path offloaded, and the SQLiteAuditBackend leak in the three CLI trail readers, all under the existing 0.45.1 security theme. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(registry): bump second server manifest to 0.45.1 server-vaara-server.json was left at 0.45.0 in the release bump; both registry slots must carry 0.45.1 for mcp-publisher to land the right version. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v0.45.1: audit-finding fixes on the remote HTTP connector, the HTTP transport, and the public numbers
Security
--upstream-urlconnector. A hostile or compromised upstream (or an attacker-controlled redirect target) could aim the proxy at the cloud instance-metadata service or an internal host and have it fetch the target with the operator's bearer token._egress_guardnow resolves the host and refuses loopback, link-local, RFC1918, IPv6 ULA, and the cloud-metadata address (including dotless and IPv4-mapped encodings) before any socket opens; a guarded opener caps redirects, re-applies the floor to each hop, and drops the auth header on a cross-origin redirect. Default is SAFE; a trusted internal upstream is opted in via--allow-private-upstream-hosts, theallow_private_hostsconstructor arg, orVAARA_MCP_ALLOW_PRIVATE_UPSTREAM. The metadata address stays refused even with the opt-in.urllibre-resolved at socket-connect, so a time-split rebind (public address at the check, blocked address a moment later) reached the blocked target with the auth header attached. The connector now validates and pins the address at connect time and dials the IP literal, so the address that passed the floor is the exact address the socket reaches; HTTPS still verifies the certificate against the original hostname. Re-applied on every redirect hop. An absent--allow-private-upstream-hostsflag now leaves the env opt-in live instead of shadowing it with aFalse.Fixed
/mcpran the blocking_handle_requestinline on the event loop (real concurrency 1). It now runs on a worker thread viaasyncio.to_thread, with per-request ContextVars preserved across the hop.Mcp-Session-Id, the old stream's teardown unregistered the new session.unregister_sessionis now identity-checked.Verification
Full detail in
CHANGELOG.mdunder[0.45.1].Summary by CodeRabbit
New Features
Bug Fixes
Documentation