Skip to content

Studio: stop seeded admin to cross-origin callers#5739

Merged
danielhanchen merged 11 commits into
mainfrom
fix/bootstrap-cross-origin-leak
May 26, 2026
Merged

Studio: stop seeded admin to cross-origin callers#5739
danielhanchen merged 11 commits into
mainfrom
fix/bootstrap-cross-origin-leak

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

  • Default web mode runs CORSMiddleware with allow_origins=["*"] + allow_credentials=True, which reflects an attacker-controlled Origin back on every request and sets Access-Control-Allow-Credentials: true. The / SPA fallback serves index.html with an inline window.__UNSLOTH_BOOTSTRAP__ script carrying the seeded admin password while a password change is pending. The combination lets any cross-origin page fetch('/') with credentials and read the bootstrap admin password out of the HTML body.
  • The API smoke CORS: GET / leaks bootstrap pw to cross-origin caller audit (tests/studio/studio_api_smoke.py:224) already tracks this regression but does not gate CI. This PR closes the leak.

Fix

  • New _is_same_origin_request helper in studio/backend/main.py: treats the absence of Origin as same-origin (top-level same-document navigation on most engines doesn't send Origin), and treats a present Origin that doesn't exactly match request.url.scheme://request.url.netloc as cross-origin.
  • _build_index_response now takes the Request and only calls _inject_bootstrap when same-origin. Cross-origin callers get the same HTML minus the bootstrap script tag, so they can't recover the seeded admin password through the SPA fallback.
  • Adds Vary: Origin so an intermediary cache cannot serve a same-origin response (with bootstrap) to a later cross-origin caller (and vice versa).
  • The CORSMiddleware policy itself is untouched -- this PR only stops the secret from being written into the response body in the first place. Tightening the CORS allowlist is orthogonal and higher-risk.

Test plan

  • studio/backend/tests/test_index_bootstrap_origin.py covers missing / matching / evil / scheme-mismatch / port-mismatch origins -> 5 passed
  • Manual: with a fresh install in a pending-password-change state, curl http://127.0.0.1:8888/ returns HTML containing window.__UNSLOTH_BOOTSTRAP__; curl -H 'Origin: https://evil.example' http://127.0.0.1:8888/ returns HTML WITHOUT it
  • Manual: same-origin browser nav to / still picks up bootstrap and auto-fills the change-password form

Notes

  • tests/studio/studio_api_smoke.py:224 will flip from AUDIT to OK once this lands; the test infrastructure already exercises the cross-origin GET path.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential security vulnerability where sensitive bootstrap information, such as the admin password, could be leaked to cross-origin callers. It introduces a new helper function, _is_same_origin_request, to verify if a request originates from the same scheme, host, and port. The _build_index_response function was updated to only inject bootstrap data for same-origin requests and now includes a Vary: Origin header to ensure proper caching behavior. Additionally, comprehensive unit tests were added to validate the same-origin detection logic across various scenarios. I have no feedback to provide.

@shimmyshimmer shimmyshimmer changed the title Studio: stop leaking seeded admin password to cross-origin callers Studio: stop seeded admin to cross-origin callers May 23, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c33cc54a83

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/main.py Outdated
origin = request.headers.get("origin")
if not origin:
return True
return origin == f"{request.url.scheme}://{request.url.netloc}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize origin tuple before same-origin comparison

The new check compares Origin to f"{request.url.scheme}://{request.url.netloc}" as a raw string, which can reject valid same-origin requests when equivalent origins serialize differently (for example https://example.com vs https://example.com:443, or when a proxy terminates TLS and the app sees http). In those cases the index bootstrap script is stripped even for legitimate callers, so first-run/admin password bootstrap can fail and users get stuck in the forced password-change flow. Compare parsed origin components with scheme/host normalization and default-port handling (and trusted forwarded scheme/host where applicable) instead of exact string equality.

Useful? React with 👍 / 👎.

danielhanchen and others added 3 commits May 23, 2026 14:00
The "/" SPA fallback serves index.html with an inline
``window.__UNSLOTH_BOOTSTRAP__`` script containing the seeded admin
password while a password change is pending. Default web mode runs
``CORSMiddleware`` with ``allow_origins=["*"]`` + ``allow_credentials=
True``, which reflects an attacker-controlled ``Origin`` back on every
request and sets Access-Control-Allow-Credentials true. The combination
let any cross-origin page ``fetch('/')`` with credentials and read the
bootstrap admin password out of the HTML body. The API smoke
``CORS: GET / leaks bootstrap pw to cross-origin caller`` audit already
tracked this (tests/studio/studio_api_smoke.py:224) but did not gate CI.

Gate ``_inject_bootstrap`` on a same-origin check: legitimate top-level
navigations omit ``Origin`` on most engines, so the absence of the
header is treated as same-origin; when the header IS present and does
not match ``request.url.scheme://request.url.netloc`` exactly, we now
skip injecting the bootstrap tag. ``Vary: Origin`` is added so an
intermediary cache cannot serve a same-origin response (with bootstrap)
to a later cross-origin caller (and vice versa).

Coverage: ``test_index_bootstrap_origin.py`` exercises the helper with
missing / matching / evil / scheme-mismatch / port-mismatch origins.
@danielhanchen danielhanchen force-pushed the fix/bootstrap-cross-origin-leak branch from c33cc54 to 3a65d39 Compare May 23, 2026 14:00

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a65d3991c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/main.py Outdated
origin = request.headers.get("origin")
if not origin:
return True
return origin == f"{request.url.scheme}://{request.url.netloc}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Canonicalize origin before enforcing same-origin gate

Compare-origin by raw string equality here misclassifies legitimate same-origin traffic when the browser-origin serialization differs from request.url (for example default-port elision like https://host vs https://host:443, or reverse-proxy TLS termination where the app sees http internally). In those deployments _inject_bootstrap is skipped for real first-party users, so the seeded bootstrap credentials are not delivered and the initial password-change/bootstrap flow can break for valid clients.

Useful? React with 👍 / 👎.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94903e7b2b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/main.py Outdated
# strip it since Origin never contains credentials.
if "@" in netloc:
netloc = netloc.rsplit("@", 1)[1]
host, _, port_str = netloc.partition(":")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse netloc with IPv6-safe host/port splitter

_canonical_origin splits netloc with partition(":"), which breaks valid IPv6 hosts like [::1]:8888 by treating the first colon inside the address as the host/port separator; this makes port_str non-numeric, returns None, and forces _is_same_origin_request to classify legitimate same-origin IPv6 requests as cross-origin. In IPv6 localhost or dual-stack deployments, bootstrap injection will be incorrectly skipped and the password-change bootstrap flow can fail for real first-party users.

Useful? React with 👍 / 👎.

A plain string-compare between the Origin header and request.url.netloc
misclassifies legitimate same-origin requests as cross-origin in three
scenarios:

- Browser strips the default port from Origin (https://example.com)
  but Starlette's netloc keeps it (example.com:443). Per RFC 6454 the
  default port is dropped on the wire, so the strings will not match
  even though the requests share an origin.
- Host case differs (Origin: http://Example.com vs netloc:
  example.com). Per RFC 3986 host comparison is case-insensitive.
- Scheme case differs (HTTP:// vs http://). Per RFC 3986 the scheme
  is also case-insensitive.

These are usability degradations rather than security gaps (legitimate
user denied the bootstrap injection, no attacker gain), but worth
shipping so non-default Studio deployments keep the change-password
auto-fill.

Adds _canonical_origin(scheme, netloc) -> (scheme, host, port) and
compares the canonical tuples. Default-port lookup covers
http/https/ws/wss; userinfo (user:pass@) is stripped per RFC 3986
since Origin never carries credentials. Origin: "null" (sandboxed
iframes, file:// pages) and unparseable values collapse to cross-
origin so the bootstrap pw is never leaked through those paths either.

Tests: 14 cases (was 5). Covers the original same/missing/evil/
scheme/port matrix plus default-port stripping in both directions,
host + scheme case folding, Origin: null, garbage values, and
userinfo-in-netloc.
@danielhanchen danielhanchen force-pushed the fix/bootstrap-cross-origin-leak branch from 94903e7 to 8ae52c1 Compare May 24, 2026 13:44

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7cca7b47e9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/main.py Outdated
# strip it since Origin never contains credentials.
if "@" in netloc:
netloc = netloc.rsplit("@", 1)[1]
host, _, port_str = netloc.partition(":")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse netloc using IPv6-safe host/port extraction

_canonical_origin splits netloc with partition(":"), which breaks bracketed IPv6 authorities (for example [::1]:8888): the first colon inside the address is treated as the host/port separator, port_str becomes non-numeric, and the function returns None. That causes _is_same_origin_request to mark legitimate IPv6 same-origin requests as cross-origin, so bootstrap injection is skipped and first-run password bootstrap can fail on IPv6 localhost/dual-stack setups.

Useful? React with 👍 / 👎.

danielhanchen and others added 3 commits May 24, 2026 13:56
The canonical-origin helper used ``netloc.partition(":")`` which
mis-parses bracketed IPv6 hosts (``[::1]:8902`` -> host=``[``,
port-str=``:1]:8902``). The int() then raises and the canonicaliser
returns None, so every IPv6 same-origin request is misclassified as
cross-origin and Studio refuses to inject the bootstrap pw on a
legitimate top-level nav when launched with ``unsloth studio -H ::1``.

Bracket-aware split per RFC 3986 §3.2.2, plus extra regression tests
for IPv6, opaque (data:/blob:/file:), comma-joined multi-Origin and
localhost-vs-127 cases.
urlparse raises ValueError on malformed bracketed Origin values
(unclosed [, invalid IPv6 hex, text after ]) and on a few NFKC
edge cases since Py 3.8. Without a guard, a request carrying
Origin: http://[malformed surfaced as HTTP 500 from the SPA
handler rather than being treated as cross-origin per the
docstring's safer-default rule. Wrap both urlparse calls in
try/except ValueError and return False on parse failure.

Also distinguish a missing Origin header (top-level same-document
GET, treat as same-origin) from an explicit empty string (not a
valid serialised origin per RFC 6454 §6.1, treat as cross-origin).

Four new regression tests pinned down by the PR audit: malformed
IPv6 bracket, invalid IPv6 hex, bracket with trailing garbage,
and the empty Origin header.
@danielhanchen

Copy link
Copy Markdown
Member Author

Round 3 audit: one robustness bug found and fixed.

urlparse raises ValueError on malformed bracket Origins (Origin: http://[malformed, http://[::g]:80, http://[2001:db8::1]extra:80) since Py 3.8 CVE-2024-11168 hardening. Without a guard, the exception propagated out of _is_same_origin_request and turned the SPA bootstrap into HTTP 500 for any caller sending those values, contrary to the docstring's "safer default" rule. Wrapped both urlparse calls in try/except and fall to cross-origin on parse failure.

Also separated missing Origin (top-level same-doc GET, same-origin) from an explicit empty string (no scheme + host per RFC 6454 §6.1, cross-origin).

Four regression tests added (tests/test_index_bootstrap_origin_extra.py): malformed bracket, invalid IPv6 hex, bracket with trailing garbage, empty Origin header. All 31 origin tests pass.

Same-origin semantics verified against RFC 6454 §4 + §6.1: lowercased scheme + host, default-port stripping, IPv6 brackets, null rejected, localhost/127.0.0.1 not collapsed, multi-Origin comma-joined falls to cross-origin, opaque origins (data:, blob:, file:) cross-origin. Vary: Origin header confirmed. chatPagesAllowedOrigins not used in this code path; the upstream CORSMiddleware allow-list is unchanged and orthogonal to the bootstrap gate.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen

Copy link
Copy Markdown
Member Author

Round 4 cross-cutting fix: merged origin/main into this branch (no conflicts) to bring in PR #5735 (orphan tool_call XML strip widening + 263-line test_tool_xml_strip.py). All 8 PRs in this audit cohort had been forked off a pre-#5735 main, so a squash-merge of any of them would have silently reverted the widened _TOOL_XML_RE regex and deleted the dedicated test file. Verified: diff against origin/main now shows zero unintended changes to routes/inference.py and test_tool_xml_strip.py outside the actual PR scope.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen danielhanchen merged commit 034ff51 into main May 26, 2026
32 checks passed
@danielhanchen danielhanchen deleted the fix/bootstrap-cross-origin-leak branch May 26, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant