Skip to content

Don't fail when mounted proxy servers are unreachable#2637

Closed
FaranIdo wants to merge 5 commits intoPrefectHQ:mainfrom
FaranIdo:fix/proxy-dead-server-multi-catch
Closed

Don't fail when mounted proxy servers are unreachable#2637
FaranIdo wants to merge 5 commits intoPrefectHQ:mainfrom
FaranIdo:fix/proxy-dead-server-multi-catch

Conversation

@FaranIdo
Copy link
Copy Markdown

When a FastMCP server has mounted proxy servers (via FastMCP.as_proxy()), connection failures to any remote server would prevent local tools, resources, and prompts from being listed or called - even those on the main server or working remotes. This happened because the proxy managers only caught METHOD_NOT_FOUND errors, not connection failures which surface as RuntimeError from the Client.

The fix catches both McpError and RuntimeError in the proxy managers and server iteration loops, allowing the server to gracefully skip unavailable remotes while continuing to serve local and reachable components.

# Local tools now work even when a mounted proxy is unreachable
main_app = FastMCP("MainApp")

@main_app.tool
def local_tool() -> str:
    return "works"

dead_proxy = FastMCP.as_proxy(
    Client(transport=SSETransport("http://unreachable:9999")),
    name="dead_remote"
)
main_app.mount(dead_proxy)

# Previously: listing/calling tools would fail
# Now: local_tool is accessible, dead remote is silently skipped

@marvin-context-protocol marvin-context-protocol Bot added bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. server Related to FastMCP server implementation or server-side functionality. labels Dec 17, 2025
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

marvin-context-protocol Bot commented Dec 17, 2025

Test Failure Analysis

Summary: The Windows tests are failing because the new test file test_proxy_shared_prefix.py is missing the required Windows skip marker for tests that connect to unreachable servers.

Root Cause: The new test file contains 6 tests that intentionally connect to an unreachable server at http://127.0.0.1:9999/sse/ to verify graceful fallback behavior. However, on Windows, asyncio has known issues with networking timeouts when connecting to unreachable servers, causing these tests to hang or fail inconsistently.

The existing codebase already has this workaround in tests/server/test_mount.py:252-254:

@pytest.mark.skipif(
    sys.platform == "win32", reason="Windows asyncio networking timeouts."
)
async def test_mount_with_unreachable_proxy_servers(self, caplog):

Suggested Solution: Add the same Windows skip marker to all tests in the new tests/server/test_proxy_shared_prefix.py file. Add this import at the top:

import sys

Then add this decorator to each test method in the TestDeadProxySharedPrefix class:

@pytest.mark.skipif(
    sys.platform == "win32", reason="Windows asyncio networking timeouts."
)

This should be applied to all 6 test methods:

  • test_dead_proxy_first_same_prefix_tools (line 15)
  • test_dead_proxy_first_no_prefix_tools (line 47)
  • test_dead_proxy_first_same_prefix_resources (line 79)
  • test_dead_proxy_first_same_prefix_prompts (line 111)
  • test_dead_proxy_first_no_prefix_resources (line 143)
  • test_dead_proxy_first_no_prefix_prompts (line 175)
Background: Why Windows Fails

Windows has a well-documented issue with asyncio when handling connection timeouts to unreachable servers. The problem is related to how Windows' proactor event loop handles socket operations and timeouts differently than Unix systems. This is why the FastMCP test suite already:

  1. Runs Windows tests without parallel execution (no -n auto flag in CI)
  2. Has explicit skip markers for unreachable server tests

See the CI workflow configuration:

  • Ubuntu: pytest -m "not integration and not client_process" --numprocesses auto
  • Windows: pytest -m "not integration and not client_process" (no parallelization)

Related issues:

Related Files
  • tests/server/test_proxy_shared_prefix.py - New test file that needs Windows skip markers (all 6 tests)
  • tests/server/test_mount.py:252-254 - Existing test with the correct Windows skip pattern
  • .github/workflows/run-tests.yml:55-59 - CI configuration showing Windows-specific test execution without parallelization

Updated to reflect latest workflow run failure analysis

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Dec 17, 2025

Walkthrough

The change adds RuntimeError handlers across provider resolution and execution paths in the server module. Updated methods include get_tool, _get_resource_or_template_or_none (resource and template checks), get_resource, get_resource_template, get_prompt, _call_tool, _read_resource (resource and template reads), and _get_prompt. Each new except RuntimeError logs a warning when a provider is unavailable and continues searching other mounted providers or components. Existing NotFoundError handling and fallback behavior are preserved.

Possibly related PRs

  • jlowin/fastmcp PR 2622 — Modifies provider-resolution and provider-driven server logic in the same server file, touching the same get_tool, resource/prompt/template, _call_tool, and _read_resource flows.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding error handling to prevent failures when proxy servers are unreachable.
Description check ✅ Passed The description includes clear problem explanation, solution approach, code example, and addresses the technical context from PR #2635, though the contributors checklist is incomplete.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5aa93a and 2b684e0.

⛔ Files ignored due to path filters (1)
  • tests/server/test_proxy_shared_prefix.py is excluded by none and included by none
📒 Files selected for processing (1)
  • src/fastmcp/server/server.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/fastmcp/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/fastmcp/**/*.py: Write Python code with ≥3.10 type annotations required throughout
Never use bare except - be specific with exception types

Files:

  • src/fastmcp/server/server.py
🧬 Code graph analysis (1)
src/fastmcp/server/server.py (4)
src/fastmcp/server/context.py (1)
  • warning (457-471)
src/fastmcp/resources/resource.py (1)
  • key (280-282)
src/fastmcp/utilities/components.py (1)
  • key (69-77)
src/fastmcp/resources/template.py (1)
  • key (225-227)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run tests: Python 3.10 on ubuntu-latest
  • GitHub Check: Run tests: Python 3.13 on ubuntu-latest
  • GitHub Check: Run tests: Python 3.10 on windows-latest
  • GitHub Check: Run tests with lowest-direct dependencies
🔇 Additional comments (9)
src/fastmcp/server/server.py (9)

716-718: LGTM! RuntimeError handling for unreachable providers is correct.

The exception handler properly logs provider unavailability and continues to the next provider, allowing local tools and other providers to be tried.


760-765: LGTM! Logging has been added as requested.

The RuntimeError handlers now include warning logs for provider unavailability during resource and template resolution, addressing the past review comment and maintaining consistency with other handlers in the file.

Also applies to: 775-780


821-826: LGTM! Resource retrieval properly handles unavailable providers.

The exception handler follows the established pattern and allows the server to continue checking other providers when connection failures occur.


867-872: LGTM! Template retrieval properly handles unavailable providers.

The exception handling is consistent with other get_* methods and allows graceful fallback to other providers.


913-916: LGTM! Prompt retrieval properly handles unavailable providers.

The exception handler follows the established pattern and completes the RuntimeError handling across all component retrieval methods.


1565-1568: LGTM! Tool execution properly handles unavailable providers.

This RuntimeError handler in the execution path (not just retrieval) addresses the core issue described in the PR: preventing tool execution failures when a mounted proxy is unreachable.


1681-1684: LGTM! Resource read execution properly handles unavailable providers.

The exception handler in the resource read path ensures that resource operations can succeed by falling back to other providers when a proxy is unreachable.


1702-1707: LGTM! Resource template read execution properly handles unavailable providers.

The exception handler completes the RuntimeError handling for all resource read paths (both concrete resources and templates), ensuring consistent resilience.


1815-1818: LGTM! Prompt rendering execution properly handles unavailable providers.

The exception handler in the prompt render path completes the RuntimeError handling across all execution operations (tools, resources, templates, and prompts), as described in the PR objectives.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/fastmcp/server/proxy.py (3)

111-122: Use from e instead of from None to preserve error context for debugging.

Using from None suppresses the exception chain, which makes it harder to debug why the proxy call failed. The original exception context (connection timeout, refused connection, etc.) would be valuable for troubleshooting.

             except (McpError, RuntimeError) as e:
                 logger.warning(f"Failed to call tool {key!r} from proxy: {e}")
-                raise NotFoundError(f"Tool {key!r} not found") from None
+                raise NotFoundError(f"Tool {key!r} not found") from e

189-215: Consider using from e for exception chaining consistency.

Similar to the tool call path, the exception chain is suppressed with from None. For consistency and debuggability, consider preserving the chain.

             except (McpError, RuntimeError) as e:
                 logger.warning(f"Failed to read resource {uri!r} from proxy: {e}")
-                raise NotFoundError(f"Resource {uri!r} not found") from None
+                raise NotFoundError(f"Resource {uri!r} not found") from e

261-273: Consider using from e for exception chaining.

Same recommendation as other paths - preserving the exception chain aids debugging.

             except (McpError, RuntimeError) as e:
                 logger.warning(f"Failed to get prompt {name!r} from proxy: {e}")
-                raise NotFoundError(f"Prompt {name!r} not found") from None
+                raise NotFoundError(f"Prompt {name!r} not found") from e
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e5a8fe and 89ab410.

⛔ Files ignored due to path filters (2)
  • tests/server/test_mount.py is excluded by none and included by none
  • tests/test_proxy_dead_server.py is excluded by none and included by none
📒 Files selected for processing (2)
  • src/fastmcp/server/proxy.py (7 hunks)
  • src/fastmcp/server/server.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/fastmcp/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/fastmcp/**/*.py: Write Python code with ≥3.10 type annotations required throughout
Never use bare except - be specific with exception types

Files:

  • src/fastmcp/server/server.py
  • src/fastmcp/server/proxy.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: jlowin/fastmcp PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T03:06:14.522Z
Learning: Applies to src/fastmcp/**/*.py : Never use bare except - be specific with exception types
📚 Learning: 2025-12-17T03:06:14.522Z
Learnt from: CR
Repo: jlowin/fastmcp PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T03:06:14.522Z
Learning: Applies to src/fastmcp/**/*.py : Never use bare except - be specific with exception types

Applied to files:

  • src/fastmcp/server/server.py
📚 Learning: 2025-12-17T03:06:14.522Z
Learnt from: CR
Repo: jlowin/fastmcp PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-17T03:06:14.522Z
Learning: Applies to src/fastmcp/**/__init__.py : Be intentional about re-exports; core types that define a module's purpose should be exported; specialized features can live in submodules; only re-export to fastmcp.* for fundamental types

Applied to files:

  • src/fastmcp/server/proxy.py
🧬 Code graph analysis (2)
src/fastmcp/server/server.py (1)
src/fastmcp/exceptions.py (1)
  • NotFoundError (34-35)
src/fastmcp/server/proxy.py (3)
src/fastmcp/resources/resource.py (2)
  • key (276-283)
  • ResourceContent (35-130)
src/fastmcp/exceptions.py (2)
  • NotFoundError (34-35)
  • ResourceError (14-15)
src/fastmcp/resources/resource_manager.py (1)
  • read_resource (289-330)
🪛 Ruff (0.14.8)
src/fastmcp/server/server.py

1873-1873: Consider moving this statement to an else block

(TRY300)

src/fastmcp/server/proxy.py

122-122: Avoid specifying long messages outside the exception class

(TRY003)


194-196: Avoid specifying long messages outside the exception class

(TRY003)


210-212: Avoid specifying long messages outside the exception class

(TRY003)


215-215: Avoid specifying long messages outside the exception class

(TRY003)


273-273: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run tests: Python 3.13 on ubuntu-latest
  • GitHub Check: Run tests with lowest-direct dependencies
  • GitHub Check: Run tests: Python 3.10 on ubuntu-latest
  • GitHub Check: Run tests: Python 3.10 on windows-latest
🔇 Additional comments (7)
src/fastmcp/server/server.py (4)

970-971: Appropriate exception broadening for proxy resilience.

The addition of McpError and RuntimeError to the catch clause allows the tool lookup to gracefully return None when a mounted proxy is unreachable. This aligns with the method's purpose of returning None if not found.


1743-1744: Correct exception handling for mounted server resilience.

Catching (NotFoundError, McpError, RuntimeError) enables the tool call to continue searching other mounted servers when one fails. This is essential for the PR objective of not failing when mounted proxies are unreachable.


1866-1875: Resource resolution correctly handles proxy failures.

The restructured code now catches (NotFoundError, McpError, RuntimeError) after attempting to read from mounted servers. The static analysis hint about moving to an else block (TRY300) can be safely ignored here since the flow is clear and the return result on line 1873 correctly exits on success.


2043-2044: Prompt resolution correctly handles proxy failures.

Consistent with the tool and resource resolution changes, this allows prompt lookup to continue searching other mounted servers when one fails due to connection issues.

src/fastmcp/server/proxy.py (3)

89-91: LGTM! Graceful degradation for tool listing.

Catching (McpError, RuntimeError) and logging a warning allows the proxy to continue serving local/mounted tools when the remote backend is unreachable.


147-148: Consistent warning logging for resource and template listing failures.

The pattern of catching (McpError, RuntimeError) and logging warnings is applied consistently across resource and template listing operations.

Also applies to: 167-168


240-241: Consistent warning logging for prompt listing failures.

Follows the same pattern as other listing operations.

Catch RuntimeError (connection failures) in provider iteration loops
and continue to next provider instead of failing. This allows working
servers to handle requests even when a dead proxy with shared/no prefix
is mounted first.

Fixed locations:
- get_tool, get_resource, get_resource_template, get_prompt
- _call_tool, _read_resource (concrete and templates), _get_prompt_content
- _get_resource_or_template_or_none

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FaranIdo FaranIdo force-pushed the fix/proxy-dead-server-multi-catch branch from d637048 to d9c3a50 Compare December 18, 2025 09:52
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d637048 and d9c3a50.

⛔ Files ignored due to path filters (1)
  • tests/server/test_proxy_shared_prefix.py is excluded by none and included by none
📒 Files selected for processing (1)
  • src/fastmcp/server/server.py (10 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/fastmcp/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/fastmcp/**/*.py: Write Python code with ≥3.10 type annotations required throughout
Never use bare except - be specific with exception types

Files:

  • src/fastmcp/server/server.py
🧠 Learnings (1)
📚 Learning: 2025-11-26T21:51:44.174Z
Learnt from: CR
Repo: jlowin/fastmcp PR: 0
File: .cursor/rules/core-mcp-objects.mdc:0-0
Timestamp: 2025-11-26T21:51:44.174Z
Learning: Review and update related Manager classes (ToolManager, ResourceManager, PromptManager) when modifying MCP object definitions

Applied to files:

  • src/fastmcp/server/server.py
🧬 Code graph analysis (1)
src/fastmcp/server/server.py (4)
src/fastmcp/server/context.py (1)
  • warning (457-471)
src/fastmcp/resources/resource.py (1)
  • key (276-283)
src/fastmcp/resources/template.py (1)
  • key (221-228)
src/fastmcp/utilities/components.py (1)
  • key (66-73)
🪛 GitHub Actions: Run static analysis
src/fastmcp/server/server.py

[error] 1-1: Process completed with exit code 1 during ruff-format hook.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run tests: Python 3.10 on ubuntu-latest
  • GitHub Check: Run tests with lowest-direct dependencies
  • GitHub Check: Run tests: Python 3.10 on windows-latest
  • GitHub Check: Run tests: Python 3.13 on ubuntu-latest
🔇 Additional comments (2)
src/fastmcp/server/server.py (2)

1003-1006: LGTM! Consistent error handling across provider operations.

These RuntimeError handlers follow a consistent pattern:

  • Catch the exception with proper type annotation
  • Log a descriptive warning with context
  • Continue to the next provider gracefully

This allows the server to remain operational when mounted proxies are unreachable, which addresses the PR objective.

Also applies to: 1047-1052, 1093-1096, 1616-1619, 1732-1735, 1753-1758, 1866-1869


904-906: RuntimeError handlers should be consistent with logging across all providers.

The RuntimeError exception is the correct choice here—it's what the MCP client library intentionally raises for server connection/availability failures. However, most handlers log warnings (e.g., lines 904–906, 1003–1006) while some don't (lines 948–950, 960–962). Add logging to the handlers in _get_resource_or_template_or_none for consistency:

except RuntimeError:
    # Connection failures (e.g., dead proxy) - continue to next provider
    logger.warning(f"Provider unavailable when getting resource/template {uri!r}")
    continue

Comment thread src/fastmcp/server/server.py Outdated
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@FaranIdo
Copy link
Copy Markdown
Author

This fix is still needed after the Provider refactoring in #2635.

That PR handles exceptions for listing operations (list_tools, list_resources, list_prompts) - if a provider is unavailable, it gets skipped. But execution operations like
call_tool, read_resource, and get_prompt don't have the same handling.

The issue shows up when a dead proxy is mounted before a working server with the same prefix (or no prefix):

main_app.mount(dead_proxy, "shared")      # first, has no tools
main_app.mount(working_server, "shared")  # second, has "my_tool"

await client.list_tools()                  # works - returns ["shared_my_tool"]
await client.call_tool("shared_my_tool")   # fails!

Even though my_tool only exists in working_server, the call fails because both providers share the same prefix. dead_proxy gets checked first, MountedProvider.get_tool() tries to
connect to see if the tool exists there, and the RuntimeError bubbles up before we ever check working_server.

The fix catches RuntimeError in provider iteration loops and continues to the next provider, same as listing operations already do.

@FaranIdo
Copy link
Copy Markdown
Author

@jlowin, any chance you can please review this?

@jlowin
Copy link
Copy Markdown
Member

jlowin commented Dec 21, 2025

@FaranIdo we are in the middle of a comprehensive backend refactor that affects how both mounts and proxies work. please be patient.

@jlowin
Copy link
Copy Markdown
Member

jlowin commented Dec 26, 2025

I believe this is already handled automatically during provider iteration now; the results are gathered with exceptions suppressed so they can be logged without interrupting execution.

@jlowin jlowin closed this Dec 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working. Reports of errors, unexpected behavior, or broken functionality. server Related to FastMCP server implementation or server-side functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants