Skip to content

Studio: add remote MCP server support#5750

Merged
danielhanchen merged 17 commits into
unslothai:mainfrom
NilayYadav:mcp-servers
May 27, 2026
Merged

Studio: add remote MCP server support#5750
danielhanchen merged 17 commits into
unslothai:mainfrom
NilayYadav:mcp-servers

Conversation

@NilayYadav
Copy link
Copy Markdown
Contributor

@NilayYadav NilayYadav commented May 24, 2026

Summary

  • Add /api/mcp/servers CRUD for remote MCP server configs (display name, URL, optional headers, OAuth flag);
    rows persist in studio.db
  • On chat send with the new mcp_enabled flag, fetch tools from every enabled server in parallel and expose them
    as OpenAI function tools (mcp__<server_id>__<tool>); calls are routed back through fastmcp
  • New "MCP Servers" section in chat settings: per-chat enable toggle + manage dialog (add/edit/delete, test
    connection, refresh tools, custom headers, OAuth switch)
  • OAuth tokens persisted per server under <studio_root>/mcp-oauth-tokens/ so the browser sign-in survives Studio restarts

Why

Studio's chat tool surface was fixed (web_search, python, terminal) no way to bring in capabilities from
remote MCP servers (GitHub, Linear, Vercel, etc.) without forking the codebase.

Testing

pytest -q studio/backend/tests/test_mcp_servers.py
Manual registered a no-auth MCP server in chat settings, "Test connection" returned the tool count;
toggled "Use MCP Servers" for the chat, model invoked a server tool, response rendered with the server · tool
label
Manual OAuth registered an OAuth-required MCP server (e.g. GitHub MCP) with use_oauth=on, first call opened
the browser flow; killed and restarted Studio, second call did not re-prompt (token reloaded from
mcp-oauth-tokens/)
API — curl -H "Authorization: Bearer <token>" http://localhost:8888/api/mcp/servers returns saved configs; POST /api/mcp/servers/{id}/refresh returns {"ok": true, "tool_count": N}

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef9d341a1b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py
Comment thread studio/backend/core/inference/tools.py Outdated
display = server.get("display_name") or server["id"]
specs: list[dict] = []
for tool in mcp_tools:
name = f"{MCP_TOOL_PREFIX}{server['id']}__{tool.get('name') or ''}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate MCP tool names before forwarding to OpenAI

The composed function name uses the remote MCP tool name verbatim, but no character validation is applied. If any enabled server exposes a name containing characters outside the OpenAI function-name charset (e.g. dots, spaces, colons), the whole chat completion request can fail with a 400 before streaming starts instead of skipping/normalizing that tool; this turns one incompatible remote tool into a hard failure for all tool-enabled chats.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the Model Context Protocol (MCP), enabling the integration of external tools into the chat interface. Key changes include a new backend client for MCP server interaction, SQLite storage for server configurations, and FastAPI routes for managing these servers. The frontend has been updated with a management dialog and a toggle in the chat settings to enable MCP tools. Feedback focuses on optimizing performance by moving the filtering of enabled servers from Python logic into a dedicated database query method.

conn.close()


def list_servers() -> list[dict]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider adding a list_enabled_servers() method to the database module. This would allow fetching only the active servers directly from the database, avoiding the overhead of fetching all servers and filtering them in Python during every chat request.

Suggested change
def list_servers() -> list[dict]:
def list_enabled_servers() -> list[dict]:
conn = get_connection()
try:
rows = conn.execute(
"SELECT * FROM mcp_servers WHERE is_enabled = 1 ORDER BY created_at"
).fetchall()
return [dict(row) for row in rows]
finally:
conn.close()
References
  1. To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop or query and return computed values for callers to reuse.

or {"type": "object", "properties": {}},
},
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of fetching all servers and filtering in Python, use a dedicated database method to fetch only enabled servers. This improves efficiency, especially as the number of registered servers grows.

Suggested change
)
servers = mcp_servers_db.list_enabled_servers()
References
  1. To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 49e10421cc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +609 to +615
return call_tool_sync(
url = server["url"],
headers = parse_server_headers(server),
name = tool_name,
args = arguments,
timeout = effective_timeout,
use_oauth = bool(server.get("use_oauth")),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate cancellation into MCP tool execution

This new MCP branch ignores cancel_event, so an in-flight remote tool call cannot be interrupted when the user cancels/disconnects; the worker thread stays blocked until timeout (default up to 300s). In the tool-streaming paths, cancellation is polled between next() calls, so this blocking call delays teardown and can tie up worker capacity under slow/hung MCP servers. Please thread cancellation through the MCP call path (or use shorter cancellable waits) to match existing tool behavior.

Useful? React with 👍 / 👎.

Comment on lines +1522 to +1524
checked={mcpEnabledForChat}
onCheckedChange={setMcpEnabledForChat}
disabled={enabledServerCount === 0}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow disabling MCP toggle even when no servers are enabled

Disabling the switch whenever enabledServerCount === 0 makes it impossible to turn MCP off if it was previously enabled and the user later disables/deletes all servers. In that state, mcpEnabledForChat can remain stuck true, and the request builder still emits enable_tools: true + mcp_enabled: true, pushing chats through the tool path with no available MCP tools until the user re-enables a server just to turn the toggle off.

Useful? React with 👍 / 👎.

NilayYadav and others added 4 commits May 24, 2026 17:55
…nslothai#5750

OpenAI requires function.name to match ^[a-zA-Z0-9_-]{1,64}$ before
streaming starts. The existing 64-char length check is necessary but
not sufficient: MCP servers can return tool names containing '.', '/',
spaces, etc. that would 400 the whole chat request. Validate the
composed mcp__<server_id>__<tool> name against the regex, skip + warn
on miss, and drop duplicate tool names from the same server (which
would also 400 the request as "duplicates").

Also propagate the agentic-loop cancel_event into MCP tool execution
so a /cancel POST during a long-running MCP call (e.g. GitHub MCP
search across a large repo) actually interrupts the in-flight HTTP
call instead of waiting out the 300 s timeout. The watcher polls the
threading.Event at 50 ms cadence inside the asyncio loop (matches
routes/inference.py's existing cancel-watcher cadence) and races
against the call task with asyncio.wait FIRST_COMPLETED.

Tests added:
  - test_mcp_specs_skip_invalid_openai_function_names: drops bad chars
  - test_mcp_specs_skip_empty_tool_name
  - test_mcp_specs_drops_duplicate_names
  - test_call_tool_sync_respects_pre_set_cancel_event

Also fix test_desktop_auth.py's router stub that listed every existing
router but missed mcp_servers_router, so importing main.py fails after
this PR adds it to routes/__init__.py.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

…nabled standalone

Round 2 of cross-platform validation surfaced two more P1 findings:

1. OAuth tokens never get cleared. fastmcp keys tokens by MCP URL, not by
   server row, and delete / URL change / use_oauth toggle only updated
   the SQLite row. Re-registering the same URL would silently reuse the
   old account's credentials. Adds clear_oauth_tokens_async() in
   mcp_client.py and calls it from the delete + put route handlers when
   the row had use_oauth=True and either the URL changes or OAuth is
   turned off.

2. mcp_enabled=true was ignored unless the caller also sent
   enable_tools=true. The frontend always sends both together so the UI
   path was fine, but a direct API caller sending only mcp_enabled would
   silently get no MCP tools, which contradicts the field's documented
   "append tools from every enabled MCP server" behavior. Loosens the
   use_tools gate in both the GGUF and safetensors paths so mcp_enabled
   opens the tool loop on its own; when the caller did not also opt
   into built-ins, the built-in list starts empty.

Tests added:
  - test_clear_oauth_tokens_async_no_op_safe
  - test_delete_server_calls_oauth_cleanup_when_oauth_was_on
  - test_delete_server_skips_oauth_cleanup_when_oauth_off
  - test_update_server_clears_oauth_on_url_change
  - test_update_server_clears_oauth_when_oauth_disabled

26 backend MCP tests pass; full studio/backend suite 1710 passed locally.
Cross-platform CI (Linux, macOS, Windows) green on staging fork.
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Round 3 of cross-platform validation:

1. PUT /api/mcp/servers/<id> would 500 with TypeError when the body
   explicitly set is_enabled or use_oauth to null. Pydantic accepts
   None for an Optional[bool] and _changes_from_payload then passed
   None into mcp_servers_db.update_server, which int(None)d. Reject
   explicit null at the validation layer with 400 instead.

2. POST /api/mcp/servers/test caught HTTPException under
   "except Exception", so an invalid URL came back as HTTP 200 with
   {"ok": false, "error": "400: ..."} instead of a real 400. The
   create + update paths return 400 for the same input. Move
   validation outside the transport try/except so it surfaces 400.

Tests added:
  - test_changes_from_payload_rejects_null_is_enabled
  - test_changes_from_payload_rejects_null_use_oauth
  - test_test_endpoint_surfaces_url_validation_as_400
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member

Reviewed end to end. Before this PR Studio's chat tool surface was fixed (web_search, python, terminal); after this PR users can register remote MCP servers under chat settings and the model can call their tools as mcp__<server_id>__<tool>. This is a real feature, the diff is complete, and the existing tools/training paths are not touched.

Validation done

  • Full studio/backend pytest suite locally: 1710 passed, 46 skipped. No new regressions vs main; the 14 pre-existing test_training_worker_flash_attn failures reproduce on origin/main and are unrelated.
  • Spun up a real fastmcp.FastMCP HTTP server, registered it through the live /api/mcp/servers routes, and round-tripped list, create, test, refresh, and a tool call via execute_tool. All return the expected payloads.
  • Drove the chat UI in headless Chromium against the backend with the PR changes installed. Captured 14 screenshots through the full add/test/save flow at https://huggingface.co/datasets/danielhanchen/minimax-2.7-analysis/tree/main/pr5750_mcp_screenshots.
  • Cross-platform GitHub Actions on ubuntu-latest + macos-14 + windows-latest (Python 3.11) running the MCP unit tests + an end-to-end probe (FastMCP server, regex skip, cancel propagation, OAuth cleanup, null body, /test 400): green on all 3 OSes across 3 rounds.

What I changed on this branch

I pushed three commits onto your branch covering the P1 findings I and the parallel reviewers landed on:

  1. mcp_specs_for_server now validates the composed mcp__<id>__<name> against OpenAI's ^[a-zA-Z0-9_-]{1,64}$ regex and skips offenders with a warning. Same path also drops empty names and duplicates. Without this an MCP server returning repo.search or read/file would 400 the entire chat request before streaming.
  2. call_tool_sync now takes an optional cancel_event and races it against the network call so the existing /cancel POST actually interrupts a long-running MCP tool instead of waiting out the 300s timeout. execute_tool forwards the agentic-loop event.
  3. routes/mcp_servers.delete_mcp_server and update_mcp_server now call clear_oauth_tokens_async(old_url) when the old row had OAuth on and the URL is changing or OAuth is being disabled. fastmcp keys tokens by MCP URL so without this, re-registering the same URL silently reused the old account's credentials.
  4. mcp_enabled=true now opens the tool loop on its own (both GGUF and safetensors paths). The frontend always sends enable_tools=true alongside, but direct API callers sending only mcp_enabled previously got nothing despite the field's documented "append tools" behaviour.
  5. PUT /api/mcp/servers/{id} with is_enabled=null or use_oauth=null was hitting int(None) -> TypeError 500. Reject explicit null at the validation layer with 400.
  6. POST /api/mcp/servers/test was catching HTTPException under except Exception and returning 200 with {ok: false} for an invalid URL. Moved the URL/header validation outside the transport try/except so it now 400s like create + update.
  7. The PR was branched off before Studio: strip orphan tool_call XML leaking into visible content #5735 merged, so studio/backend/tests/test_tool_xml_strip.py was missing. Merged main into the branch so it ships again. Also added mcp_servers_router to test_desktop_auth.py's router stub, which otherwise raises ImportError when studio.backend.main is imported as a package.

11 new tests cover the above. Pre-commit-ci already re-ran on the merge.

Findings I considered but did not change

  • _validate_url accepts loopback / RFC1918 / link-local: by design, since registering a local MCP server (e.g. http://127.0.0.1:9810/mcp/) is a primary use case. Same trust boundary as the user's other Studio actions.
  • McpServerResponse returns stored headers in cleartext: required so the edit dialog can re-populate the form. Only authenticated Studio users see it.
  • _flatten_result reads structured_content not structuredContent: verified fastmcp's CallToolResult dataclass uses snake_case, so the existing code is correct. Tool results with structured-only output also have content populated by fastmcp.
  • /v1/messages Anthropic Messages path does not honour mcp_enabled: AnthropicMessagesRequest has no mcp_enabled field; surfacing MCP there is a separate enhancement and out of scope for this PR.

…t gate

Round 4 surfaces two more interaction bugs between the new MCP path
and existing safetensors tool plumbing:

1. OpenAI accepts ^[a-zA-Z0-9_-]{1,64}$ for function.name, and round 1
   widened the MCP regex to that set, so MCP tools can now be advertised
   as `mcp__srv__list-issues`. But the XML tool-call parser in
   tool_call_parser.py used `\w+` (no hyphen), so the model could call
   the tool but Studio could not parse the call. Same in
   routes/inference.py's `_TOOL_XML_RE` stripper, which would leave
   hyphenated tool-call XML in the visible content. Both regexes now
   use `[\w-]+`.

2. safetensors_agentic treats `tools=[]` as "allow all" (documented
   contract, exercised by test_empty_tools_list_does_not_enforce_allowlist).
   When a caller sends `enable_tools=true` + `enabled_tools=[]` +
   `mcp_enabled=true` and MCP discovery returns 0, the resolved tool
   list is genuinely empty and built-in tools (web_search / python /
   terminal) could execute via the model's emitted call. Fix at the
   route gate instead of breaking the documented contract: set
   `use_tools=False` when the resolved list is empty, in both GGUF and
   safetensors paths. Existing callers who omit `enabled_tools` still
   get ALL_TOOLS and are unaffected.

Tests added (32 total):
  - test_tool_xml_parser_handles_hyphenated_function_names
  - test_tool_xml_strip_handles_hyphenated_function_names
  - test_safetensors_agentic_empty_allowlist_still_means_allow_all
    (documents the contract round 4 preserved)

1716 passed locally; cross-platform CI on staging fork still green.
@danielhanchen
Copy link
Copy Markdown
Member

Round 4 pushed. Two more interaction bugs between the new MCP path and existing safetensors tool plumbing:

  1. tool_call_parser.py and _TOOL_XML_RE in routes/inference.py used \w+ for the function-name capture. Round 1 widened the MCP regex to OpenAI's ^[a-zA-Z0-9_-]{1,64}$ so MCP tools can be advertised as mcp__srv__list-issues, but the model's emitted <function=mcp__srv__list-issues> would then fail to parse on the safetensors XML path, and the XML stripper would leave the tool call in chat history. Both regexes updated to [\w-]+.

  2. safetensors_agentic's contract is tools=[] means "no constraint" (exercised by test_empty_tools_list_does_not_enforce_allowlist). If a caller sends enable_tools=true with enabled_tools=[] and mcp_enabled=true and MCP discovery returns 0 tools, the resolved list is genuinely empty and the model can call web_search/python/terminal even though the caller opted out of them. Fixed at the route gate (both GGUF and safetensors paths): set use_tools=False when the resolved list is empty, instead of changing the agentic-loop contract. Existing callers who omit enabled_tools still get ALL_TOOLS and are unaffected.

Tests added (32 total): test_tool_xml_parser_handles_hyphenated_function_names, test_tool_xml_strip_handles_hyphenated_function_names, test_safetensors_agentic_empty_allowlist_still_means_allow_all (documents the preserved contract).

studio/backend suite locally: 1716 passed. Cross-platform CI on staging fork (Linux + macOS + Windows) green on round 4. Updated probe at https://github.com/danielhanchen/unsloth-staging-2/actions/runs/26389767937.

I covered findings 1, 2, 3, 4, 8, 9, 11 from the parallel reviewer aggregation. Remaining open findings I considered and left in place: loopback / private URL acceptance (registering a local MCP server is a primary use case), McpServerResponse returning stored headers (needed for the edit dialog to repopulate), structured_content vs structuredContent (verified fastmcp's CallToolResult dataclass uses snake_case so the existing code is correct), /v1/messages Anthropic Messages path ignoring mcp_enabled (AnthropicMessagesRequest has no such field; out of scope), and OAuth token storage keyed by URL not server row (fastmcp's storage contract; the round-2 cleanup handler covers the practical delete / URL-change confusion).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63e4444a67

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/routes/inference.py Outdated
Comment on lines 2386 to 2388
(_tools_on or payload.mcp_enabled)
and llama_backend.supports_tools
and not has_gguf_image
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor disable-tools override in MCP gate

The new tool-loop gate uses (_tools_on or payload.mcp_enabled), which lets mcp_enabled=true bypass the process-level --disable-tools policy (_tools_on == False). In that configuration, requests still enter the server-side tool loop and can execute MCP tools, contradicting the documented tool_policy=False behavior (“forced tools off for every request”). This regression appears in both GGUF and safetensors gating logic, so deployments relying on CLI tool disablement are no longer protected from remote tool execution.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 25, 2026 11:41
…params + cancel race

Round 5 of parallel-reviewer aggregation surfaced six additional
findings; five are real and fixed here:

1. Hyphenated MCP parameter names (`<parameter=issue-number>`) were
   dropped by the XML parser's `\w+` regex. Extended to `[\w-]+` in
   both core/inference/tool_call_parser.py and core/tool_healing.py.
   The latter is GGUF's own copy of the parser/strip patterns and was
   missed by round 4.

2. core/tool_healing.py's `strip_tool_call_markup` still used
   `<function=\w+>` so hyphenated MCP tool-call XML leaked into the
   GGUF visible content even after round 4 fixed the shared parser.

3+4. `mcp_enabled` re-opened the tool loop even when the operator
   passed `unsloth run --disable-tools` (CLI policy False). Round 2's
   `(_tools_on or payload.mcp_enabled)` gate ignored the raw process
   policy. Now reads `state.tool_policy.get_tool_policy()` and gates
   mcp_enabled on `_cli_policy is not False`. Applied to both GGUF
   and safetensors paths.

5. GGUF's agentic loop called `execute_tool(tool_name, ...)` without
   checking the model-emitted name against the per-request tool list,
   while the safetensors loop already enforces this. Added the same
   allow-list check so a model that hallucinates a filtered MCP name
   or a built-in the caller opted out of returns "not enabled" instead
   of executing.

Bonus P2 fixes:
  - `call_tool_sync` now checks `cancel_event.is_set()` BEFORE
    creating the call task, so a pre-set cancellation does not open
    the HTTP transport.
  - `clear_oauth_tokens_async` moved the OAuth import + construction
    inside the protected try block; a fastmcp.client.auth load error
    used to escape and 500 the delete / update route.

NOT fixed (verified false or out of scope):
  - finding unslothai#10 "structured_content vs structuredContent": fastmcp's
    CallToolResult dataclass uses snake_case (verified live against
    structured-only tool result; fields are
    `dict_keys(['content', 'structured_content', 'meta', 'data', 'is_error'])`).
  - finding unslothai#11 "asyncio.run from running loop": call_tool_sync is
    invoked from `asyncio.to_thread` worker threads which have no
    event loop; asyncio.run() is safe there.

Tests added (37 total): hyphenated param names, tool_healing strip,
GGUF allow-list gate, cancel pre-set short-circuit, OAuth cleanup
constructor-error swallowing. 1721 passed locally, no regressions.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d537a6dee7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

toolsEnabled: loadBool(CHAT_TOOLS_ENABLED_KEY, false),
codeToolsEnabled: loadBool(CHAT_CODE_TOOLS_ENABLED_KEY, false),
imageToolsEnabled: loadBool(CHAT_IMAGE_TOOLS_ENABLED_KEY, false),
mcpEnabledForChat: loadBool(CHAT_MCP_ENABLED_KEY, false),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reset MCP toggle when clearing the active checkpoint

This new persisted flag is loaded into runtime state, but clearCheckpoint() still only resets toolsEnabled, codeToolsEnabled, and imageToolsEnabled. As a result, after a user unloads/clears a model, mcpEnabledForChat can stay true and the next local chat request will continue sending mcp_enabled: true, causing unintended MCP discovery/tool behavior even though other tool toggles were cleared. Please clear mcpEnabledForChat alongside the other tool toggles in the checkpoint-reset path.

Useful? React with 👍 / 👎.

@danielhanchen
Copy link
Copy Markdown
Member

Round 5 pushed. Five more interaction bugs caught by another reviewer aggregation pass, plus one piece of end-to-end browser evidence.

  1. Hyphenated MCP parameter names dropped silently. _TC_PARAM_START_RE in tool_call_parser.py still used \w+ for the parameter-name capture. MCP schemas with kebab-case keys (GitHub MCP list-issues takes repo-name, issue-number, etc.) emitted <parameter=repo-name> blocks that the parser saw as empty, so the model's tool call landed with arguments={} and the call failed at the server. Regex widened to [\w-]+ to match the function-name fix.

  2. GGUF carries its own parser copy. core/tool_healing.py is a near-duplicate of tool_call_parser.py reused by the llama.cpp / llama-server agentic loop. Round 4 only fixed the safetensors path; the GGUF strip/parse pair still rejected hyphenated function and parameter names. Updated tool_healing.py to match: _TC_FUNC_START_RE, _TC_PARAM_START_RE, and the closing-pair stripping regex all use [\w-]+. Now both unsloth studio --frontend transformers and --frontend llama-cpp produce the same MCP behaviour.

  3. unsloth studio --disable-tools was bypassed by mcp_enabled. _effective_enable_tools(payload) checks the CLI tool-policy override, but the MCP branch went straight to payload.mcp_enabled. An operator running unsloth studio run ... --disable-tools could still have arbitrary MCP calls fire if the chat client set mcp_enabled: true. Both route paths (routes/inference.py GGUF around line 2380 and safetensors around line 2890) now compute _mcp_allowed = bool(payload.mcp_enabled) and _cli_policy is not False before unioning into use_tools.

  4. GGUF agentic loop ignored the per-request allow-list. When enabled_tools is set and the model emits a tool call for a tool that was filtered out (typically a stale name from system-prompt history or an MCP server that was disabled after the message was queued), the safetensors loop already short-circuits with an "Error: tool not enabled" string. The GGUF loop in llama_cpp.py (~line 5077) dispatched it directly to execute_tool, allowing built-ins like python or terminal to run even when the caller had opted out. Added the same allow-list check before invoking the tool.

  5. call_tool_sync cancel race + OAuth import safety. Two small mcp_client.py cleanups: (a) check cancel_event.is_set() before creating the call task so a /cancel POST that landed while we were still in the asyncio.to_thread queue does not open a fresh HTTP connection; (b) move the optional fastmcp.client.auth.OAuth import inside the same try/except as the call so a missing optional dependency surfaces as a clean "Error:" string in chat instead of a 500.

Tests added (5 new, 37 total in test_mcp_servers.py): test_tool_xml_param_parser_handles_hyphens, test_tool_healing_handles_hyphenated_xml, test_mcp_enabled_respects_cli_disable_policy, test_gguf_agentic_blocks_disabled_tool, test_call_tool_sync_short_circuits_on_pre_set_cancel, plus the test_clear_oauth_tokens_swallows_constructor_errors regression.

studio/backend MCP suites locally: 108 passed (test_mcp_servers.py 72, test_tool_xml_strip.py 22, test_safetensors_tool_loop.py 14). Full backend suite: 1746 passed outside the three test files that depend on host terminal width and Windows-specific GPU resolution (test_studio_api.py::test_help_output, test_training_worker_flash_attn.py, test_windows_gpu_detection_mock.py) which fail the same way on main.

Cross-platform CI on staging fork (Linux + macOS + Windows): green on round 5 in 1m43s. Run: https://github.com/danielhanchen/unsloth-staging-2/actions/runs/26399698777. The probe now exercises the round-5 fixes too: hyphenated parameter parsing, tool_healing strip+parse parity, tool_policy CLI override, and the cancel pre-set short-circuit against a live FastMCP server.

End-to-end browser walkthrough on an AWS B200 host: installed the PR with UNSLOTH_STUDIO_HOME and a baseline at main in parallel, drove both through the chat settings flow with Playwright, and captured before/after screenshots and a 7-frame walkthrough GIF of the add-MCP-server journey (open settings, expand MCP section, open dialog, fill display name + URL pointing at a local FastMCP server, "Test connection" returns "Connected (3 tools)", save, server persists after refresh). Side-by-side comparisons and the GIF are in pr5750_before_after_comparison/ of the dataset alongside the raw before/after PNGs.

@danielhanchen
Copy link
Copy Markdown
Member

Earlier rounds drove the new UI against a local FastMCP server, which proves the wiring but does not prove the "remote" part of the title. Re-verified end to end against four real public MCP servers picked from the public no-auth lists (none of them owned or hosted by me):

Server URL Tools Notes
DeepWiki https://mcp.deepwiki.com/mcp 3 read_wiki_structure, read_wiki_contents, ask_question
Context7 https://mcp.context7.com/mcp 2 resolve-library-id, query-docs (hyphenated, exercises round 4 + 5 regex fixes)
Roundtable https://mcp.roundtable.now/mcp 13 All hyphenated (list-models, consult-council, design-architecture, etc.)
GitMCP (unslothai/unsloth) https://gitmcp.io/unslothai/unsloth 4 fetch_unsloth_documentation, search_unsloth_code, etc.

For each, I exercised every step end to end:

  1. POST /api/mcp/servers/test -- all four returned {"ok": true, "tool_count": N} with the expected counts (3 / 2 / 13 / 4).
  2. POST /api/mcp/servers/ -- persisted all four to studio.db with stable IDs.
  3. Manage MCP Servers dialog -- UI rendered all four entries with enable toggles, refresh, edit, delete buttons; the chat-settings panel showed "4 servers enabled -- Manage..." underneath the "Use MCP Servers" master toggle.
  4. Added a fifth (https://gitmcp.io/unslothai/unsloth_zoo) through the dialog; "Test connection" returned the "Connected (4 tools)" toast against the live server within ~2 s; "Add server" persisted it; "MCP server added" toast appeared; dialog refreshed to five rows.
  5. execute_tool round-trip through Studio's dispatcher for one tool on each server, including the hyphenated mcp__context7__resolve-library-id and mcp__roundtable__list-models -- all returned actual content from the upstream MCP servers (DeepWiki gave a coherent answer about Unsloth, GitMCP returned the README, Context7 resolved the fastapi library ID, Roundtable listed its models). This is the part round 4 + 5 fixed for hyphenated tool names.

Screenshots and a 7-frame walkthrough GIF of the live add-public-server flow against gitmcp.io are in pr5750_public_mcp_servers/ of the same dataset. Public-server URLs are nothing exotic; they were picked from publicly maintained "no-auth remote MCP" lists (sylviangth/awesome-remote-mcp-servers, mcpservers.org) so anyone wanting to reproduce the verification can hit the same endpoints.

The earlier two probes I ran out of caution but did not change:

  • Semgrep MCP (https://mcp.semgrep.ai/mcp) -- returns 401 without an auth token, exercising the round 3 /test 400 surface for an unreachable target. Did not register.
  • 402.bot (https://api.402.bot/mcp) -- TLS handshake fails from this host with TLSV1_ALERT_INTERNAL_ERROR; their endpoint is currently down for an unrelated reason. Did not register.

All four registered public servers also survive a refresh, which is what proved persistence in the first round of probing. PR head is still d537a6dee7; the verification was against the same backend that staging CI confirmed green for Linux + macOS + Windows.

…onflicts in chat-adapter + chat-runtime-store
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen
Copy link
Copy Markdown
Member

Round 6 pushed (3726238). Two things this round.

Conflict resolution. Main moved past the PR base (24 commits since round 5). Merge gave clean conflicts only in two frontend files, both around adjacent fields the PR adds for MCP next to fields main added for the Anthropic web_fetch pill:

  • studio/frontend/src/features/chat/api/chat-adapter.ts -- destructured mcpEnabledForChat from this PR collides with webFetchToolsEnabled from main; kept both.
  • studio/frontend/src/features/chat/stores/chat-runtime-store.ts -- five collision blocks for the localStorage key constant, the type field, the action signature, the initial state, and the action implementation; kept the PR side and the main side side-by-side in each block (the two features are independent).

After the merge: 108 MCP-relevant tests pass locally on the merged head (test_mcp_servers.py 72, test_tool_xml_strip.py 22, test_safetensors_tool_loop.py 14). Cross-platform CI on staging fork re-ran green on ubuntu-latest + macos-14 + windows-latest in 2m15s on the merged tree.

End-to-end MCP dispatch through a real model. Round 1-2 evidence stopped at "tools dispatch through execute_tool". This round drove the whole loop through a real GGUF model (Qwen3-4B-Instruct-2507-GGUF Q4_K_M, served by llama-server via the GGUF agentic loop in llama_cpp.py) and confirmed every step.

Registered 8 servers via the Studio UI (4 new public no-auth ones added this round). Drove one prompt per server through POST /api/inference/chat/completions with mcp_enabled: true, tool_choice: "required". For each, the SSE stream produced a tool_end event with the upstream content, and the model's follow-up turn quoted the real response:

Prompt Tool fired Result bytes Elapsed
Fetch Unsloth docs (GitMCP) mcp__<id>__fetch_unsloth_documentation 18,495 5.4 s
Ask DeepWiki about Studio's GGUF backend mcp__<id>__ask_question 2,085 19.8 s
Context7 resolve fastapi mcp__<id>__resolve-library-id 1,616 4.8 s
MS Learn Azure AI Foundry search mcp__<id>__microsoft_docs_search 19,678 5.4 s
Cloudflare docs wrangler deploy search mcp__<id>__search_cloudflare_documentation 12,879 4.8 s
Hugging Face qwen3 model search mcp__<id>__hub_repo_search 11,554 3.6 s

Two notable things from the table:

  1. mcp__<id>__resolve-library-id is the literal hyphenated-name regex case that round 4 + 5 fixed. The model emitted <tool_call>{"name": "mcp__<id>__resolve-library-id", ...}</tool_call>, the parser accepted it, the dispatcher routed it, the upstream returned, and the model quoted /fastapi/fastapi back. Without the regex widening these would either lose all parameters or skip the call entirely.
  2. The dispatch logs include tool_status heartbeats during the upstream call -- exactly what the cancel-watcher path consumes, so the round 1 / round 5 cancel-propagation work has live exercise too.

Also drove the same flow through the actual UI: the chat thread renders the "Used tool: <server_id> . fetch_unsloth_documentation" card, then the model's follow-up answer is composed from the real upstream content (Unsloth README rendered as a clean one-sentence summary, including the "70% less VRAM and 2x faster training" claim that only appears in the upstream README, not in the model's training data). 220 tok/s on a B200, context bar shows 16.2k / 32.8k after the MCP fold-in.

PR is now MERGEABLE again at head 3726238.

@danielhanchen
Copy link
Copy Markdown
Member

Animated walkthroughs of the PR in action, in case the screenshots above are easier to read as motion.

1. Add a real public MCP server end to end through the new dialog. Live probe against https://gitmcp.io/unslothai/unsloth_zoo; the "Connected (4 tools)" toast is the real upstream list_tools response, not a stub.

Add a public MCP server through the Manage dialog

2. Loaded GGUF model dispatches the new MCP tool through chat. Qwen3-4B-Instruct-2507 GGUF Q4_K_M emits a <tool_call> for mcp__<server_id>__fetch_unsloth_documentation; Studio renders the "Used tool" pill and folds the upstream response into the model's follow-up turn. The one-sentence summary at the end is composed from the actual GitMCP response (the "70% less VRAM and 2x faster training" phrasing is from the live Unsloth README, not in the model's training corpus).

Loaded GGUF model dispatches an MCP tool through chat

3. Single-image six-panel summary of the manual add-server UX, in case the GIF is too quick. Each step captioned with what to click and what to expect.

Six-panel how-to add a remote MCP server

Assets are pinned to a commit on a staging fork orphan branch so the links don't drift if the branch is later force-pushed.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen danielhanchen merged commit 9a907a8 into unslothai:main May 27, 2026
43 of 45 checks passed
rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 27, 2026
unslothai#5750 added remote MCP server support, which conflicted with our
import block in chat-settings-sheet.tsx. Kept both branches' imports
(MCP dialog + servers API from main, ServiceTier + Input from this PR).

393/393 backend tests pass; frontend type-check + vite build clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants