UPSTREAM PR #18334: webui: add MCP (Model Context Protocol) support by loci-dev · Pull Request #679 · auroralabs-loci/llama.cpp

loci-dev · 2025-12-24T00:49:00Z

Summary

This PR adds MCP (Model Context Protocol) support to llama-server's web ui.
Servers using stdio transport only for now, with the server managing these processes for the frontend, which connects to them through a WebSocket per conversation per server.

Features

WebSocket server on HTTP port + 1 for real-time MCP communication
MCP bridge that spawns and manages MCP server subprocesses (stdio or docker)
Frontend UI for MCP server management with tool exploration
Tool calling integration in chat completions with streaming support
Auto-reconnection with exponential backoff for resilience

New CLI Option

# Use default config location (~/.llama.cpp/mcp.json)
./llama-server -m model.gguf

# Or specify config path
./llama-server -m model.gguf --mcp-config /path/to/mcp.json

Configuration Example

{
  "mcpServers": {
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@brave/brave-search-mcp-server", "--transport", "stdio"],
      "env": {
        "BRAVE_API_KEY": "... get your key at https://api.search.brave.com/app/keys ..."
      }
    },
    "python": {
      "command": "uvx",
      "args": ["mcp-run-python", "--deps", "numpy,pandas,pydantic,requests,httpx,sympy,aiohttp", "stdio"],
      "env": {}
    }
  }
}

Architecture

server-ws.cpp/h - WebSocket server implementation
server-mcp-bridge.cpp/h - Routes WebSocket connections to MCP subprocesses
server-mproc.cpp/h - Cross-platform subprocess management
server-mcp.h - MCP protocol type definitions
Frontend: MCP service, stores, and UI components

API Endpoints

GET /mcp/servers - List available MCP servers
GET /mcp/ws-port - Get WebSocket port number
WS /mcp?server=<name> - WebSocket connection (on HTTP port + 1)

Test plan

Unit tests added (tools/server/tests/unit/test_mcp.py)
Manual testing with @modelcontextprotocol/server-filesystem
Test tool calling in chat UI
Test connect/disconnect in MCP picker
Verify WebSocket reconnection after server restart

🤖 Generated with Claude Code

Add JSON-RPC 2.0 type definitions and MCP server configuration structures for the Model Context Protocol implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add mcp_process class for spawning and managing MCP server subprocesses with bidirectional stdio communication. Handles process lifecycle, environment variables for unbuffered output, and cross-platform support. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add custom WebSocket server using raw sockets (no external library). Implements RFC 6455 handshake, frame parsing, masking, and message handling. Runs on HTTP port + 1 to avoid conflicts with httplib. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add server_mcp_bridge class that routes WebSocket messages to MCP server subprocesses. Manages per-connection state, configuration loading with hot-reload, and JSON-RPC 2.0 message forwarding. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Integrate MCP bridge and WebSocket server into main server: - Add --mcp-config CLI argument for configuration path - Add /mcp/servers and /mcp/ws-port HTTP endpoints - Register WebSocket event handlers for MCP - Update server-http to properly join thread on stop 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add TypeScript types for MCP protocol (JSON-RPC 2.0) and WebSocket service for communicating with MCP servers: - MCP types: tool definitions, JSON-RPC request/response/notification - McpService: WebSocket client with auto-reconnect and request timeout - API types: tool call interfaces for chat completions - Vite config: proxy WebSocket connections to MCP port - ESLint: allow underscore-prefixed unused args (common convention) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add reactive Svelte 5 stores for managing MCP state: - mcpStore: Global MCP connection state, tool discovery, tool calling - conversationMcpStore: Per-conversation MCP server enable/disable Uses SvelteMap/SvelteSet for proper Svelte 5 reactivity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add components for displaying MCP tool calls and results: - ToolCallBlock: Collapsible display of tool call with arguments/results - ToolResultDisplay: Format and render tool execution results - tool-results.ts: Utility functions for parsing tool result messages 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add UI components for managing MCP server connections: - ChatFormActionMcp: Server selector dropdown in chat input - McpPanel: Full panel for viewing connected servers and tools 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Integrate MCP tool calling into the chat flow: - chat.ts: Add tool parameter injection and MCP tool execution - chat.svelte.ts: Track tool calls, results, and processing state - ChatMessageAssistant: Display tool calls with status and duration - ChatMessages: Build tool result map, filter tool result messages - ChatScreen: Wire up tool result event handlers - Add duration guard for negative timestamp differences 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add Python tests for MCP functionality: - test_mcp_servers_endpoint: Test /mcp/servers HTTP endpoint - test_mcp_ws_port_endpoint: Test /mcp/ws-port HTTP endpoint - test_mcp_initialize_handshake: Test MCP JSON-RPC initialization - test_mcp_tools_list: Test tools/list method - test_mcp_tool_call: Test tools/call method 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add documentation and example configuration for MCP: - README: Document MCP configuration, usage, and WebSocket port - mcp_config.example.json: Example config with filesystem and brave-search - Rebuild webui bundle with MCP support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Force popover to open above (side="top") for consistent positioning - Search input at bottom (flips based on popover position) - Small solid dots for connection status (green/gray) - Hover row to reveal connect/disconnect action icons - Remove Connect All/Disconnect All footer buttons - Fix double X button in search input (hide native WebKit clear) - Add tooltips for status and actions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Don't show "Streaming..." status while arguments are being streamed. Only show "Calling tool..." when actually waiting for MCP server response. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Reorder assistant message layout so tool call blocks appear before the model badge and statistics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove unused parameter names from MCP HTTP lambda handlers - Remove conditional websocket import (it's a required dependency) Fixes unused-parameter warning and pyright type-check errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Adds optional "cwd" field to mcp.json server configurations to set the working directory for stdio MCP servers. - Add cwd field to mcp_server_config struct - Unix: call chdir() before execvp() in child process - Windows: pass lpCurrentDirectory to CreateProcessA() - Update mcp_config.example.json with usage example 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

loci-review · 2025-12-24T01:48:19Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #679 - MCP Support Integration

Overview

This PR introduces Model Context Protocol support to llama-server through new WebSocket infrastructure, subprocess management, and CLI argument extensions. The changes add approximately 1,200 ns to application startup through argument parsing modifications in common/arg.cpp, specifically affecting the common_params_parser_init function. The implementation adds a new --mcp-config argument and associated validation logic while maintaining backward compatibility.

Key Findings

Argument Parsing Impact:
The most significant changes occur in lambda functions within common_params_parser_init. Lambda 104 shows a 1,164 microsecond increase in response time across llama-tts and llama-cvector-generator binaries, though its self-execution time decreased by 7 ns. Lambda 106 exhibits a 215 ns increase in self-execution time, while Lambda 125 adds 59 ns. These modifications stem from new MCP configuration validation logic, string operations for path handling, and environment variable processing. The call depth analysis reveals 20,000+ level stacks, indicating template instantiation overhead in the argument parsing framework.

Inference Performance:
No functions in the inference pipeline (llama_decode, llama_encode, llama_tokenize) show measurable changes. The modifications are isolated to initialization code paths, leaving the token generation performance unaffected. Expected tokens per second remains unchanged as the core inference functions maintain their baseline performance characteristics.

Power Consumption:
Binary-level analysis shows minimal impact: llama-tts decreased by 730 nJ (0.28%), while llama-cvector-generator increased by 112 nJ (0.044%). All other binaries show zero measurable change. The power consumption variations align with the startup-only nature of the modifications, confirming that runtime energy efficiency remains stable.

Code Changes:
The implementation adds MCP-specific argument handling through lambda closures that validate configuration file paths, parse environment variables, and integrate with the existing common_params structure. The new --mcp-config argument follows established patterns for file-based configuration, using the same validation approach as other file arguments like --model and --lora. The WebSocket server components (server-ws.cpp/h, server-mcp-bridge.cpp/h) operate independently of the inference pipeline, maintaining separation between tool calling infrastructure and model execution.

loci-review · 2025-12-24T03:10:46Z

Explore the complete analysis inside the Version Insights

Performance Analysis Summary: PR #679 MCP Support

Overview

This PR adds Model Context Protocol (MCP) support to llama-server, introducing WebSocket infrastructure, subprocess management, and frontend UI components. The changes span 42 files with 5,857 additions and 127 deletions.

Key Findings

Performance-Critical Areas Impact

Inference Pipeline Functions:
No changes detected in core inference functions (llama_decode, llama_encode, llama_tokenize, llama_model_load_from_file, llama_kv_cache operations). The inference pipeline remains unmodified, resulting in zero impact on tokens per second for model inference workloads.

Argument Parsing Degradation:
The analysis identified severe response time increases in common_params_parser_init lambda operators within build.bin.llama-tts and build.bin.llama-cvector-generator:

Lambda at arg.cpp:2760:2769 increased from 1,811 ns to 1,165,496 ns (absolute change: +1,164,000 ns)
Lambda at arg.cpp:3193:3195 increased from 24 ns to 13,464 ns (absolute change: +13,440 ns)
Lambda at arg.cpp:2788:2790 increased from 22 ns to 237 ns (absolute change: +215 ns)

These functions handle CLI argument parsing during server initialization. The degradation affects startup time only, not runtime inference performance. The PR adds one new argument (--mcp-config) which contributes minimally to the existing systemic complexity in the argument parser infrastructure.

Server Infrastructure Changes:
New components introduced:

server-ws.cpp (835 lines): WebSocket server with custom SHA-1 and frame parsing, adds 1-2 ms per connection handshake
server-mproc.cpp (606 lines): Cross-platform subprocess management with 1-5 ms fork/exec overhead per MCP server
server-mcp-bridge.cpp (278 lines): Connection routing with sub-millisecond message forwarding

The server-http.cpp stop() method now includes thread.join() for proper cleanup, adding synchronization wait during shutdown only.

Power Consumption Analysis

Power consumption changes are minimal across all binaries:

build.bin.llama-tts: 260,709 nJ → 259,979 nJ (-0.28%, -730 nJ decrease)
build.bin.llama-cvector-generator: 255,801 nJ → 255,913 nJ (+0.04%, +112 nJ increase)
build.bin.llama-run: 223,113 nJ → 223,112 nJ (-0.00%)
Core libraries (libggml-base.so, libggml-cpu.so, libggml.so, libmtmd.so): No change

The power consumption variations are within measurement noise and do not indicate meaningful efficiency changes.

Architecture Impact

The PR introduces parallel execution paths for MCP tool calling that operate independently of the inference engine. WebSocket connections and MCP subprocesses run in separate threads, ensuring tool invocations do not block model inference. Memory overhead is approximately 50 KB base plus 25 KB per active MCP server connection.

ochafik and others added 15 commits December 24, 2025 00:38

server: add MCP protocol type definitions

6a58f91

Add JSON-RPC 2.0 type definitions and MCP server configuration structures for the Model Context Protocol implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

webui: move model/stats bar below tool blocks

745fb7b

Reorder assistant message layout so tool call blocks appear before the model badge and statistics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

loci-dev temporarily deployed to PROD__AL_DEMO December 24, 2025 00:49 — with GitHub Actions Inactive

ochafik and others added 3 commits December 24, 2025 01:15

Update index.html.gz

462cc65

loci-dev temporarily deployed to PROD__AL_DEMO December 24, 2025 02:15 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 8 times, most recently from 15838f1 to 006b713 Compare December 24, 2025 23:08

loci-dev force-pushed the main branch 30 times, most recently from 07aff19 to 1f52e52 Compare January 2, 2026 20:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #18334: webui: add MCP (Model Context Protocol) support#679

UPSTREAM PR #18334: webui: add MCP (Model Context Protocol) support#679
loci-dev wants to merge 18 commits intomainfrom
upstream-PR18334-branch_ochafik-web-ui-mcp

loci-dev commented Dec 24, 2025

Uh oh!

loci-review bot commented Dec 24, 2025

Uh oh!

loci-review bot commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

loci-dev commented Dec 24, 2025

Summary

Features

New CLI Option

Configuration Example

Architecture

API Endpoints

Test plan

Uh oh!

loci-review bot commented Dec 24, 2025

Performance Analysis Summary: PR #679 - MCP Support Integration

Overview

Key Findings

Uh oh!

loci-review bot commented Dec 24, 2025

Performance Analysis Summary: PR #679 MCP Support

Overview

Key Findings

Performance-Critical Areas Impact

Power Consumption Analysis

Architecture Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants