Studio: stop seeded admin to cross-origin callers#5739
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a potential security vulnerability where sensitive bootstrap information, such as the admin password, could be leaked to cross-origin callers. It introduces a new helper function, _is_same_origin_request, to verify if a request originates from the same scheme, host, and port. The _build_index_response function was updated to only inject bootstrap data for same-origin requests and now includes a Vary: Origin header to ensure proper caching behavior. Additionally, comprehensive unit tests were added to validate the same-origin detection logic across various scenarios. I have no feedback to provide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c33cc54a83
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| origin = request.headers.get("origin") | ||
| if not origin: | ||
| return True | ||
| return origin == f"{request.url.scheme}://{request.url.netloc}" |
There was a problem hiding this comment.
Normalize origin tuple before same-origin comparison
The new check compares Origin to f"{request.url.scheme}://{request.url.netloc}" as a raw string, which can reject valid same-origin requests when equivalent origins serialize differently (for example https://example.com vs https://example.com:443, or when a proxy terminates TLS and the app sees http). In those cases the index bootstrap script is stripped even for legitimate callers, so first-run/admin password bootstrap can fail and users get stuck in the forced password-change flow. Compare parsed origin components with scheme/host normalization and default-port handling (and trusted forwarded scheme/host where applicable) instead of exact string equality.
Useful? React with 👍 / 👎.
The "/" SPA fallback serves index.html with an inline
``window.__UNSLOTH_BOOTSTRAP__`` script containing the seeded admin
password while a password change is pending. Default web mode runs
``CORSMiddleware`` with ``allow_origins=["*"]`` + ``allow_credentials=
True``, which reflects an attacker-controlled ``Origin`` back on every
request and sets Access-Control-Allow-Credentials true. The combination
let any cross-origin page ``fetch('/')`` with credentials and read the
bootstrap admin password out of the HTML body. The API smoke
``CORS: GET / leaks bootstrap pw to cross-origin caller`` audit already
tracked this (tests/studio/studio_api_smoke.py:224) but did not gate CI.
Gate ``_inject_bootstrap`` on a same-origin check: legitimate top-level
navigations omit ``Origin`` on most engines, so the absence of the
header is treated as same-origin; when the header IS present and does
not match ``request.url.scheme://request.url.netloc`` exactly, we now
skip injecting the bootstrap tag. ``Vary: Origin`` is added so an
intermediary cache cannot serve a same-origin response (with bootstrap)
to a later cross-origin caller (and vice versa).
Coverage: ``test_index_bootstrap_origin.py`` exercises the helper with
missing / matching / evil / scheme-mismatch / port-mismatch origins.
for more information, see https://pre-commit.ci
c33cc54 to
3a65d39
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a65d3991c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| origin = request.headers.get("origin") | ||
| if not origin: | ||
| return True | ||
| return origin == f"{request.url.scheme}://{request.url.netloc}" |
There was a problem hiding this comment.
Canonicalize origin before enforcing same-origin gate
Compare-origin by raw string equality here misclassifies legitimate same-origin traffic when the browser-origin serialization differs from request.url (for example default-port elision like https://host vs https://host:443, or reverse-proxy TLS termination where the app sees http internally). In those deployments _inject_bootstrap is skipped for real first-party users, so the seeded bootstrap credentials are not delivered and the initial password-change/bootstrap flow can break for valid clients.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94903e7b2b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # strip it since Origin never contains credentials. | ||
| if "@" in netloc: | ||
| netloc = netloc.rsplit("@", 1)[1] | ||
| host, _, port_str = netloc.partition(":") |
There was a problem hiding this comment.
Parse netloc with IPv6-safe host/port splitter
_canonical_origin splits netloc with partition(":"), which breaks valid IPv6 hosts like [::1]:8888 by treating the first colon inside the address as the host/port separator; this makes port_str non-numeric, returns None, and forces _is_same_origin_request to classify legitimate same-origin IPv6 requests as cross-origin. In IPv6 localhost or dual-stack deployments, bootstrap injection will be incorrectly skipped and the password-change bootstrap flow can fail for real first-party users.
Useful? React with 👍 / 👎.
A plain string-compare between the Origin header and request.url.netloc misclassifies legitimate same-origin requests as cross-origin in three scenarios: - Browser strips the default port from Origin (https://example.com) but Starlette's netloc keeps it (example.com:443). Per RFC 6454 the default port is dropped on the wire, so the strings will not match even though the requests share an origin. - Host case differs (Origin: http://Example.com vs netloc: example.com). Per RFC 3986 host comparison is case-insensitive. - Scheme case differs (HTTP:// vs http://). Per RFC 3986 the scheme is also case-insensitive. These are usability degradations rather than security gaps (legitimate user denied the bootstrap injection, no attacker gain), but worth shipping so non-default Studio deployments keep the change-password auto-fill. Adds _canonical_origin(scheme, netloc) -> (scheme, host, port) and compares the canonical tuples. Default-port lookup covers http/https/ws/wss; userinfo (user:pass@) is stripped per RFC 3986 since Origin never carries credentials. Origin: "null" (sandboxed iframes, file:// pages) and unparseable values collapse to cross- origin so the bootstrap pw is never leaked through those paths either. Tests: 14 cases (was 5). Covers the original same/missing/evil/ scheme/port matrix plus default-port stripping in both directions, host + scheme case folding, Origin: null, garbage values, and userinfo-in-netloc.
94903e7 to
8ae52c1
Compare
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7cca7b47e9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| # strip it since Origin never contains credentials. | ||
| if "@" in netloc: | ||
| netloc = netloc.rsplit("@", 1)[1] | ||
| host, _, port_str = netloc.partition(":") |
There was a problem hiding this comment.
Parse netloc using IPv6-safe host/port extraction
_canonical_origin splits netloc with partition(":"), which breaks bracketed IPv6 authorities (for example [::1]:8888): the first colon inside the address is treated as the host/port separator, port_str becomes non-numeric, and the function returns None. That causes _is_same_origin_request to mark legitimate IPv6 same-origin requests as cross-origin, so bootstrap injection is skipped and first-run password bootstrap can fail on IPv6 localhost/dual-stack setups.
Useful? React with 👍 / 👎.
The canonical-origin helper used ``netloc.partition(":")`` which
mis-parses bracketed IPv6 hosts (``[::1]:8902`` -> host=``[``,
port-str=``:1]:8902``). The int() then raises and the canonicaliser
returns None, so every IPv6 same-origin request is misclassified as
cross-origin and Studio refuses to inject the bootstrap pw on a
legitimate top-level nav when launched with ``unsloth studio -H ::1``.
Bracket-aware split per RFC 3986 §3.2.2, plus extra regression tests
for IPv6, opaque (data:/blob:/file:), comma-joined multi-Origin and
localhost-vs-127 cases.
for more information, see https://pre-commit.ci
urlparse raises ValueError on malformed bracketed Origin values (unclosed [, invalid IPv6 hex, text after ]) and on a few NFKC edge cases since Py 3.8. Without a guard, a request carrying Origin: http://[malformed surfaced as HTTP 500 from the SPA handler rather than being treated as cross-origin per the docstring's safer-default rule. Wrap both urlparse calls in try/except ValueError and return False on parse failure. Also distinguish a missing Origin header (top-level same-document GET, treat as same-origin) from an explicit empty string (not a valid serialised origin per RFC 6454 §6.1, treat as cross-origin). Four new regression tests pinned down by the PR audit: malformed IPv6 bracket, invalid IPv6 hex, bracket with trailing garbage, and the empty Origin header.
|
Round 3 audit: one robustness bug found and fixed.
Also separated missing Four regression tests added ( Same-origin semantics verified against RFC 6454 §4 + §6.1: lowercased scheme + host, default-port stripping, IPv6 brackets, |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
Round 4 cross-cutting fix: merged origin/main into this branch (no conflicts) to bring in PR #5735 (orphan tool_call XML strip widening + 263-line test_tool_xml_strip.py). All 8 PRs in this audit cohort had been forked off a pre-#5735 main, so a squash-merge of any of them would have silently reverted the widened _TOOL_XML_RE regex and deleted the dedicated test file. Verified: diff against origin/main now shows zero unintended changes to routes/inference.py and test_tool_xml_strip.py outside the actual PR scope. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary
CORSMiddlewarewithallow_origins=["*"]+allow_credentials=True, which reflects an attacker-controlledOriginback on every request and setsAccess-Control-Allow-Credentials: true. The/SPA fallback serves index.html with an inlinewindow.__UNSLOTH_BOOTSTRAP__script carrying the seeded admin password while a password change is pending. The combination lets any cross-origin pagefetch('/')with credentials and read the bootstrap admin password out of the HTML body.CORS: GET / leaks bootstrap pw to cross-origin calleraudit (tests/studio/studio_api_smoke.py:224) already tracks this regression but does not gate CI. This PR closes the leak.Fix
_is_same_origin_requesthelper instudio/backend/main.py: treats the absence ofOriginas same-origin (top-level same-document navigation on most engines doesn't sendOrigin), and treats a presentOriginthat doesn't exactly matchrequest.url.scheme://request.url.netlocas cross-origin._build_index_responsenow takes theRequestand only calls_inject_bootstrapwhen same-origin. Cross-origin callers get the same HTML minus the bootstrap script tag, so they can't recover the seeded admin password through the SPA fallback.Vary: Originso an intermediary cache cannot serve a same-origin response (with bootstrap) to a later cross-origin caller (and vice versa).Test plan
studio/backend/tests/test_index_bootstrap_origin.pycovers missing / matching / evil / scheme-mismatch / port-mismatch origins -> 5 passedcurl http://127.0.0.1:8888/returns HTML containingwindow.__UNSLOTH_BOOTSTRAP__;curl -H 'Origin: https://evil.example' http://127.0.0.1:8888/returns HTML WITHOUT it/still picks up bootstrap and auto-fills the change-password formNotes
tests/studio/studio_api_smoke.py:224will flip fromAUDITtoOKonce this lands; the test infrastructure already exercises the cross-origin GET path.