Skip to content

Add ResponseLimitingMiddleware for tool response size control#3072

Merged
jlowin merged 1 commit intoPrefectHQ:mainfrom
dgenio:feature/response-limiting-middleware-2004
Feb 6, 2026
Merged

Add ResponseLimitingMiddleware for tool response size control#3072
jlowin merged 1 commit intoPrefectHQ:mainfrom
dgenio:feature/response-limiting-middleware-2004

Conversation

@dgenio
Copy link
Copy Markdown
Contributor

@dgenio dgenio commented Feb 4, 2026

Large tool responses can overwhelm LLM context windows or cause memory issues. This PR adds middleware to enforce configurable size limits on tool outputs, with intelligent handling of structured vs unstructured responses.

The middleware truncates text responses that exceed the limit while preserving UTF-8 character boundaries. For structured responses (tools with output_schema returning complex objects), it raises a ToolError since truncation would corrupt the schema. Both behaviors are configurable.

from fastmcp.server.middleware.response_limiting import ResponseLimitingMiddleware

# Limit all tool responses to 500KB
mcp.add_middleware(ResponseLimitingMiddleware(max_size=500_000))

# Limit only specific tools, raise errors instead of truncating
mcp.add_middleware(ResponseLimitingMiddleware(
    max_size=100_000,
    tools=["search", "fetch_data"],
    raise_on_unstructured=True,
))

Key features:

  • Configurable size limit (default 1MB)
  • Tool-specific filtering via tools parameter
  • UTF-8 safe truncation with customizable suffix
  • Size metadata added to result's meta field for monitoring
  • Configurable raise_on_structured and raise_on_unstructured behavior

Closes #2004

@marvin-context-protocol marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. server Related to FastMCP server implementation or server-side functionality. labels Feb 4, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d3ba5fbb4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 4, 2026

Walkthrough

Adds a new ResponseLimitingMiddleware and docs. The middleware enforces a configurable byte max_size per tool or globally (optional tools list). On tool calls it measures the JSON-serialized result size; if over the limit it logs a warning and returns a truncated ToolResult with a single TextContent. Truncation concatenates TextContent blocks or falls back to serialized text, appends a configurable truncation_suffix, preserves UTF-8 boundaries, and handles the case where only the suffix fits. The class is publicly exported.

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks the required Contributors and Review checklists from the template, making it incomplete. Add the Contributors Checklist and Review Checklist sections from the template, including all checkbox items and marking completed items as checked.
Linked Issues check ❓ Inconclusive The PR description mentions features like raise_on_structured, raise_on_unstructured, and size metadata that differ from the final implementation noted in comments. Clarify the implementation status: confirm whether raise_on_structured, raise_on_unstructured, and meta field features are actually implemented or remove them from the description.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding ResponseLimitingMiddleware for tool response size control.
Out of Scope Changes check ✅ Passed All changes (middleware implementation, documentation, and exports) are directly related to implementing the response limiting middleware feature specified in issue #2004.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/fastmcp/server/middleware/response_limiting.py (2)

87-114: Silent exception handling may obscure serialization issues.

Both _measure_size and _estimate_content_size catch broad exceptions silently. While this provides resilience, it can hide unexpected serialization problems. Consider logging at DEBUG level for troubleshooting.

🔧 Suggested improvement
     def _measure_size(self, result: ToolResult) -> int:
         """Measure the serialized size of a ToolResult in bytes."""
         try:
             serialized = pydantic_core.to_json(result, fallback=str)
             return len(serialized)
-        except Exception:
+        except Exception as e:
             # Fallback: estimate from content
+            self._logger.debug(f"Serialization failed, using estimate: {e}")
             return self._estimate_content_size(result)
             try:
                 structured_bytes = pydantic_core.to_json(
                     result.structured_content, fallback=str
                 )
                 total += len(structured_bytes)
-            except Exception:
-                pass
+            except Exception as e:
+                self._logger.debug(f"Could not serialize structured_content: {e}")
         return total

254-258: Consider using specific type annotation for better type safety.

The base class uses MiddlewareContext[mt.CallToolRequestParams] for on_call_tool. Using Any here loses type information that could catch errors at static analysis time.

🔧 Suggested improvement
+import mcp.types as mt
+
 ...
 
     async def on_call_tool(
         self,
-        context: MiddlewareContext[Any],
+        context: MiddlewareContext[mt.CallToolRequestParams],
         call_next: CallNext[Any, ToolResult],
     ) -> ToolResult:

Comment thread docs/servers/middleware.mdx Outdated
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The static analysis workflow failed due to code formatting and linting issues in the test file.

Root Cause: The test file has:

  1. An unused import: CallToolResult from fastmcp.client.client (not used anywhere in the tests)
  2. Function signature formatting that doesn't match the project's style (ruff format requirements)

Suggested Solution: Apply the formatting fixes that ruff/prek identified. Run locally:

uv run prek run --all-files

This will automatically fix:

  • Remove the unused CallToolResult import on line 11
  • Reformat two function signatures to match code style

After running this command, the changes should be committed and pushed to pass CI.

Detailed Analysis

The prek pre-commit hooks ran with --all-files and found issues:

Ruff check: Found 1 error (auto-fixed)

  • Unused import: from fastmcp.client.client import CallToolResult

Ruff format: Reformatted 1 file

  • Function signature formatting for test_custom_truncation_suffix (lines 153-155)
  • Function signature formatting for test_size_meta_added_to_result (lines 176-178)

The exact diff needed:

-from fastmcp.client.client import CallToolResult

-    async def test_custom_truncation_suffix(
-        self, mcp_server: FastMCP, large_text: str
-    ):
+    async def test_custom_truncation_suffix(self, mcp_server: FastMCP, large_text: str):

-    async def test_size_meta_added_to_result(self, mcp_server: FastMCP, small_text: str):
+    async def test_size_meta_added_to_result(
+        self, mcp_server: FastMCP, small_text: str
+    ):
Related Files
  • tests/server/middleware/test_response_limiting.py - Contains the formatting issues that need to be fixed
  • .pre-commit-config.yaml - Defines the ruff and formatting hooks that enforce code style

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

marvin-context-protocol Bot commented Feb 4, 2026

Test Failure Analysis

Summary: Test test_multiple_content_blocks fails because the truncation logic doesn't handle multiple TextContent blocks correctly, resulting in an incorrect "Binary content cannot be truncated" error.

Root Cause: The _truncate_text_content method in response_limiting.py has a logic flaw on lines 166-194. When it encounters multiple TextContent blocks:

  1. It truncates the first TextContent block and sets truncated=True
  2. Lines 190-192 skip all remaining content after truncation:
    elif truncated:
        # Skip remaining content after truncation
        continue
  3. This means subsequent TextContent blocks (like "Second block" in the test) are completely excluded
  4. However, the JSON serialization overhead means the result (151 bytes) still exceeds the limit (100 bytes)
  5. The check on lines 322-333 then raises a ToolError with the message "Binary content cannot be truncated" - but there's no binary content, just multiple text blocks that weren't fully processed

Suggested Solution: Fix the _truncate_text_content method to properly handle multiple TextContent blocks:

  1. Option A (Recommended): Remove or modify lines 190-192 to keep iterating through remaining blocks. After truncating the first block, continue processing other TextContent blocks (they should be excluded from new_content to stay under the limit, but the logic should be clearer).

  2. Option B: Update the error message on line 330-332 to be more accurate. The error should distinguish between "still over limit due to binary content" vs "still over limit due to JSON overhead/multiple blocks".

  3. Update the test expectations: The test on line 339-356 expects truncation to succeed, but with multiple text blocks and such a small limit (100 bytes), it may be more realistic to either:

    • Increase the limit to accommodate the JSON overhead
    • Or expect the test to raise an error

Files to modify:

  • src/fastmcp/server/middleware/response_limiting.py: Fix the truncation logic around lines 166-194
  • tests/server/middleware/test_response_limiting.py: Either adjust test expectations or increase the max_size parameter

Update (2026-02-04 16:02 UTC): This failure persists after commit 918c3b5. The test continues to fail across all Python versions (3.10, 3.13) and platforms (ubuntu, windows). No code changes have been made to address the root cause since my initial analysis.

Detailed Analysis

Test Setup (line 339-356):

mcp.add_middleware(ResponseLimitingMiddleware(max_size=100))

@mcp.tool()
def multi_block() -> ToolResult:
    return ToolResult(
        content=[
            TextContent(type="text", text="First block " + "x" * 500),
            TextContent(type="text", text="Second block"),
        ]
    )

Actual Result:

  • Original size: 691 bytes
  • After truncation: 151 bytes (still exceeds 100 byte limit)
  • Error: "Response size (151 bytes) still exceeds limit (100 bytes) after truncation. Binary content cannot be truncated."

Why it fails:
The second TextContent block ("Second block") is being skipped entirely due to the truncated=True check, but the JSON serialization of the ToolResult structure itself adds overhead that keeps the result above 100 bytes even with just the truncated first block.

Related Files
  • src/fastmcp/server/middleware/response_limiting.py: Lines 153-211 (_truncate_text_content method)
  • src/fastmcp/server/middleware/response_limiting.py: Lines 322-333 (size check after truncation)
  • tests/server/middleware/test_response_limiting.py: Lines 339-356 (failing test)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Comment thread docs/servers/middleware.mdx Outdated
Comment thread docs/servers/middleware.mdx Outdated
Comment thread docs/servers/middleware.mdx Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The failing test test_uv_transport is unrelated to this PR's changes and appears to be a flaky infrastructure test.

Root Cause: The test_uv_transport test is timing out while waiting for a UV subprocess to complete. The test spawns a temporary FastMCP server using UvStdioTransport, which:

  1. Creates a virtual environment in a temp directory
  2. Installs dependencies (including fastmcp) via uv
  3. Runs the server and makes a test tool call

The timeout occurs during subprocess cleanup after the test completes successfully (the assertion at line 54 passes). The stack trace shows the process is hanging in os.waitpid() during transport cleanup.

Why This Isn't Related to Your Changes:

  • Your PR only adds response limiting middleware in src/fastmcp/server/middleware/response_limiting.py
  • The failing test (tests/client/transports/test_uv_transport.py) tests client transport functionality, not middleware
  • The test has passed in 3 out of 4 workflow runs on this PR branch
  • This is a known flaky test pattern - client process tests are run serially with -x flag specifically because they can be unstable

Suggested Solution: Re-run the failed workflow. This is a transient infrastructure issue, not a code problem. The test is flaky due to process cleanup timing in the CI environment.

Detailed Analysis

Log Evidence

The test successfully completes its work:

  • Server starts and connects: DEBUG Stdio transport connected
  • Dependencies install successfully
  • The test assertion would have passed (no assertion failure shown)

The timeout occurs during teardown:

Stack of waitpid-0 (140628828088000)
File "/asyncio/unix_events.py", line 1392, in _do_waitpid
    pid, status = os.waitpid(expected_pid, 0)
Failed: Timeout (>10.0s) from pytest-timeout.

Evidence of Flakiness

  1. This PR's workflow runs: 3 successes, 1 failure (this one)
  2. Test design: Marked as @pytest.mark.client_process and runs with -x flag (stop on first failure), indicating known fragility
  3. Main branch status: Recent runs mostly pass, with occasional unrelated failures
Related Files
  • tests/client/transports/test_uv_transport.py:54 - Test that timed out (unrelated to PR changes)
  • src/fastmcp/server/middleware/response_limiting.py - New middleware (what this PR actually changes)
  • tests/server/middleware/test_response_limiting.py - Tests for the new middleware (all passing)

Recommendation: ✅ Safe to ignore this failure and merge once re-run succeeds. The PR code is solid.

@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: The Windows Python 3.10 test is timing out on test_run_mcp_config, but this failure is not related to the ResponseLimitingMiddleware changes in this PR.

Root Cause: The failing test (tests/cli/test_run.py::TestMCPConfig::test_run_mcp_config) creates a temporary FastMCP server and attempts to connect to it via stdio transport. The test times out after 5 seconds on Windows while waiting for the stdio connection to establish. This is a pre-existing Windows-specific flakiness in the test infrastructure, not a bug introduced by this PR.

Evidence:

  1. The ResponseLimitingMiddleware is not auto-imported (not in src/fastmcp/server/middleware/__init__.py)
  2. No existing code imports or uses the new middleware
  3. The middleware only affects tool call responses after they execute, not server startup or stdio connections
  4. The timeout occurs during server connection setup, before any tools are called

Suggested Solution:

The PR changes are sound. The test failure is a known Windows flakiness issue. You have two options:

  1. Merge as-is (recommended) - The failure is unrelated to this PR's functionality
  2. Re-run the tests - Sometimes Windows stdio tests pass on retry due to timing variations

The test_run_mcp_config test should be investigated separately as it appears to have Windows-specific issues with subprocess stdio communication.

Detailed Analysis

The timeout occurs in this test flow:

  1. Test creates a temp Python file with a FastMCP server (line 101-115)
  2. Creates an MCP config pointing to that file (line 119-124)
  3. Calls create_mcp_config_server which tries to spawn the server process
  4. Attempts stdio connection via Client (line 128-132)
  5. Timeout occurs at 5 seconds - the stdio connection never completes on Windows

From the logs:

timeout: 5.0s
timeout method: thread
timeout func_only: False
collected 4073 items / 17 deselected / 4056 selected

tests\cli\test_run.py ........+++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~ Stack of AnyIO worker thread (4940) ~~~~~~~~~~~~~~~~~~~~~
...
DEBUG    Stdio transport connected             stdio.py:191

The test gets stuck after "Stdio transport connected" but before the actual tool listing completes.

Related Files

Changed by this PR (all unrelated to the failure):

  • src/fastmcp/server/middleware/response_limiting.py - New middleware (not imported anywhere)
  • tests/server/middleware/test_response_limiting.py - Tests for the middleware (passing)
  • docs/servers/middleware.mdx - Documentation

Failing test:

  • tests/cli/test_run.py:99-133 - test_run_mcp_config - Tests MCP config file server creation (Windows stdio flakiness)

Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dgenio! This is a great idea and response size limiting is a natural fit for the middleware system, and the structured vs. unstructured distinction is the right design axis. A few things to address from automated review before merge:

Truncation math mismatch

_truncate_text_content computes excess = current_size - self.max_size where current_size is the full serialized ToolResult (JSON envelope, content array, keys, etc.), but then subtracts excess + suffix_bytes from the raw text byte length. These aren't in the same coordinate space — the JSON overhead doesn't cancel out cleanly, so for small max_size values the truncated result could still exceed the limit. The post-truncation size check catches this and raises an error, but it means truncation silently degrades to an error for small limits. Consider accounting for serialization overhead in the truncation target, or iteratively measure-and-trim. (or document the gap, which IMO is acceptable)

Type safety in on_call_tool

The base class uses MiddlewareContext[mt.CallToolRequestParams] and CallNext[mt.CallToolRequestParams, ToolResult] — this PR uses Any for both, which loses static type checking. Should match the base class signature.

hasattr/getattr in _estimate_content_size

elif hasattr(block, "data") and self.include_binary:
    data = getattr(block, "data", None)

Project convention is isinstance checks with type narrowing. Use isinstance(block, ImageContent) (and EmbeddedResource if needed).

Wrapped-result detection is fragile

_has_structured_content inspects the dict shape ({"result": <value>}) to detect FastMCP's x-fastmcp-wrap-result pattern. Any user tool that naturally returns {"result": "some_string"} as structured output would be misidentified as unstructured and truncated.

The tool's output_schema is actually reachable through public API — context.fastmcp_context.fastmcp.get_tool(name) returns the Tool object, which has output_schema with the x-fastmcp-wrap-result key when wrapping is in play. Something like:

ctx = context.fastmcp_context
if ctx:
    tool = await ctx.fastmcp.get_tool(tool_name)
    if tool and tool.output_schema:
        is_wrapped = tool.output_schema.get("x-fastmcp-wrap-result", False)

This would make the detection reliable instead of inferring from dict shape.

Test assertion too loose

assert len(result.content[0].text.encode("utf-8")) <= max_size + 500

500-byte tolerance on a 1000-byte limit means the response could be 50% over. The middleware should guarantee the serialized result fits within max_size, and the test should verify that.

Remove custom_logger and log_level parameters

The logging/timing middleware have these because logging is their purpose. For ResponseLimitingMiddleware, logging is incidental — just use the module-level logger at appropriate levels (WARNING for truncation events). Users can control it through standard Python logging config on the fastmcp.server.middleware.response_limiting logger. We don't want to establish a convention where every middleware that happens to emit a log line needs logger configurability.

Minor

  • The binary search in _truncate_utf8 works but text.encode('utf-8')[:max_bytes].decode('utf-8', errors='ignore') is simpler with the same correctness.

Design is right and docs are well-written. Happy to re-review once these are addressed.

Comment thread docs/servers/middleware.mdx Outdated
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: Test in is timing out on Windows during SQLite database initialization in the .

Root Cause: The test times out while initializing an , which internally creates a for OAuth state storage. On Windows, SQLite operations can hang during concurrent database access due to file locking issues. The timeout occurs at diskcache/core.py:165 during con.execute(select).fetchall(), indicating the SQLite connection is blocked.

This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues.

Suggested Solution:

The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to tests/server/auth/providers/test_aws.py:88:

import sys
import pytest

class TestAWSCognitoProvider:
    # ... other tests ...
    
    @pytest.mark.timeout(15)  # Increase timeout for Windows SQLite operations
    @pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
    def test_oidc_discovery_integration(self):
        """Test that OIDC discovery endpoints are used correctly."""
        # ... rest of test ...

Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues.

Detailed Analysis

The timeout occurs during provider initialization:

File "D:\a\fastmcp\fastmcp\tests\server\auth\providers\test_aws.py", line 91, in test_oidc_discovery_integration
    provider = AWSCognitoProvider(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\providers\aws.py", line 151, in __init__
    super().__init__(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\oauth_proxy\proxy.py", line 413, in __init__
    key_value=DiskStore(directory=settings.home / "oauth-proxy"),
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 591, in __init__
    self._sql  # pylint: disable=pointless-statement
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 636, in _con
    settings = con.execute(select).fetchall()

The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the isolate_settings_home fixture in conftest.py:58-68 was added to prevent SQLite locking issues, it appears this particular test is still susceptible to timeouts on Windows.

This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes.

Related Files
  • tests/server/auth/providers/test_aws.py:88 - Failing test
  • src/fastmcp/server/auth/providers/aws.py:151 - AWSCognitoProvider initialization
  • src/fastmcp/server/auth/oauth_proxy/proxy.py:413 - DiskStore creation
  • tests/conftest.py:58-68 - isolate_settings_home fixture that attempts to prevent SQLite locking

@dgenio
Copy link
Copy Markdown
Contributor Author

dgenio commented Feb 5, 2026

Thanks for the thorough review @jlowin! I've addressed all feedback in commit d6793bb:

  1. Truncation math - Switched to iterative measure-and-trim to account for JSON serialization overhead. The loop measures the actual serialized result after each truncation attempt.

  2. Type safety - Fixed on_call_tool signature to use MiddlewareContext[mt.CallToolRequestParams] and CallNext[mt.CallToolRequestParams, ToolResult] instead of Any.

  3. isinstance checks - Replaced hasattr/getattr with proper isinstance checks for ImageContent, EmbeddedResource, TextResourceContents, and BlobResourceContents.

  4. Wrapped-result detection - Now uses context.fastmcp_context.fastmcp.get_tool(name) to reliably check output_schema.get("x-fastmcp-wrap-result") instead of inferring from dict shape.

  5. Test assertion - Tightened to verify actual serialized size fits within max_size (no more 500-byte tolerance).

  6. Removed logger params - Removed custom_logger and log_level; now uses module-level logger at WARNING level. Users can configure via standard Python logging on fastmcp.server.middleware.response_limiting.

  7. Simplified _truncate_utf8 - Replaced binary search with encoded[:max_bytes].decode('utf-8', errors='ignore').

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

Test Failure Analysis

Summary: Test test_oidc_discovery_integration in tests/server/auth/providers/test_aws.py is timing out on Windows during SQLite database initialization in the DiskStore.

Root Cause: The test times out while initializing an AWSCognitoProvider, which internally creates a DiskStore for OAuth state storage. On Windows, SQLite operations can hang during concurrent database access due to file locking issues. The timeout occurs at diskcache/core.py:165 during con.execute(select).fetchall(), indicating the SQLite connection is blocked.

This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues.

Suggested Solution:

The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to tests/server/auth/providers/test_aws.py:88:

import sys
import pytest

class TestAWSCognitoProvider:
    # ... other tests ...
    
    @pytest.mark.timeout(15)  # Increase timeout for Windows SQLite operations
    @pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
    def test_oidc_discovery_integration(self):
        """Test that OIDC discovery endpoints are used correctly."""
        # ... rest of test ...

Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues.

Detailed Analysis

The timeout occurs during provider initialization:

File "D:\a\fastmcp\fastmcp\tests\server\auth\providers\test_aws.py", line 91, in test_oidc_discovery_integration
    provider = AWSCognitoProvider(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\providers\aws.py", line 151, in __init__
    super().__init__(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\oauth_proxy\proxy.py", line 413, in __init__
    key_value=DiskStore(directory=settings.home / "oauth-proxy"),
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 591, in __init__
    self._sql  # pylint: disable=pointless-statement
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 636, in _con
    settings = con.execute(select).fetchall()

The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the isolate_settings_home fixture in conftest.py:58-68 was added to prevent SQLite locking issues, it appears this particular test is still susceptible to timeouts on Windows.

This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes.

Related Files
  • tests/server/auth/providers/test_aws.py:88 - Failing test
  • src/fastmcp/server/auth/providers/aws.py:151 - AWSCognitoProvider initialization
  • src/fastmcp/server/auth/oauth_proxy/proxy.py:413 - DiskStore creation
  • tests/conftest.py:58-68 - isolate_settings_home fixture that attempts to prevent SQLite locking

Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgenio sorry in my previous review I focused on correctness of code but not what it actually does -- the design is way too complex for what this needs to do. This should be ~10 lines, not ~400.

The middleware currently maintains separate code paths for structured vs unstructured content, binary inclusion/exclusion, wrapped-value detection via schema inspection, iterative truncation loops with magic numbers, structured content syncing — all to preserve response structure during truncation. That's the wrong goal. If a response is too big, the right thing is to flatten it to text and truncate. There's definitionally no need to preserve the structure of something that's being truncated anyway.

The simpler approach:

async def on_call_tool(self, context, call_next):
    result = await call_next(context)
    
    if self.tools is not None and context.message.name not in self.tools:
        return result

    serialized = pydantic_core.to_json(result, fallback=str)
    if len(serialized) <= self.max_size:
        return result

    # Over limit: extract text, truncate, return single TextContent
    texts = [b.text for b in result.content if isinstance(b, TextContent)]
    text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
    return self._truncate_to_result(text)

This eliminates raise_on_structured, raise_on_unstructured, include_binary, _has_structured_content, _is_wrapped_simple_value, _estimate_content_size, the iterative truncation loop, the wrapped-value schema inspection, and the structured content syncing logic. The config surface drops to three parameters: max_size, truncation_suffix, tools.

The whole file should be the class with init, on_call_tool, and one helper to do the truncate-and-wrap math. Tests and docs simplify accordingly.

@dgenio
Copy link
Copy Markdown
Contributor Author

dgenio commented Feb 6, 2026

Completely rewrote per your feedback. The diff is now -664 / +119 lines from the previous version.

Core change: the entire on_call_tool is now ~15 lines — serialize, check size, extract text, truncate.

async def on_call_tool(self, context, call_next):
    result = await call_next(context)
    if self.tools is not None and context.message.name not in self.tools:
        return result
    serialized = pydantic_core.to_json(result, fallback=str)
    if len(serialized) <= self.max_size:
        return result
    texts = [b.text for b in result.content if isinstance(b, TextContent)]
    text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
    return self._truncate_to_result(text)

Removed: raise_on_structured, raise_on_unstructured, include_binary, structure preservation logic, wrapped-result detection, iterative truncation, schema inspection. Config is now just max_size, truncation_suffix, tools.

🤖 Generated with Claude

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/fastmcp/server/middleware/response_limiting.py (2)

68-89: Hardcoded overhead = 50 is fragile and the final result is never verified against max_size.

The JSON wrapper overhead depends on the ToolResult model's serialization (field names, default values like isError, potential meta, etc.) and can change across library versions. A hardcoded constant makes the guarantee that the truncated result fits within max_size unreliable.

A more robust approach: build the candidate ToolResult, serialize it, and if it still exceeds max_size, trim further in a loop (as the PR comments describe was intended).

Sketch of a verify-after-truncation approach
     def _truncate_to_result(self, text: str) -> ToolResult:
         """Truncate text to fit within max_size and wrap in ToolResult."""
         suffix_bytes = len(self.truncation_suffix.encode("utf-8"))
-        # Account for JSON wrapper overhead: {"content":[{"type":"text","text":"..."}]}
-        overhead = 50
-        target_size = self.max_size - suffix_bytes - overhead
-
-        if target_size <= 0:
-            # Edge case: max_size too small for even the suffix
-            truncated = self.truncation_suffix
-        else:
-            # Truncate to target size, preserving UTF-8 boundaries
-            encoded = text.encode("utf-8")
-            if len(encoded) <= target_size:
-                truncated = text + self.truncation_suffix
-            else:
-                truncated = (
-                    encoded[:target_size].decode("utf-8", errors="ignore")
-                    + self.truncation_suffix
-                )
-
-        return ToolResult(content=[TextContent(type="text", text=truncated)])
+        # Build an initial candidate, then measure and trim if needed
+        encoded = text.encode("utf-8")
+        # Start with a conservative estimate; refine via actual measurement
+        candidate_text = (
+            encoded[: max(0, self.max_size)].decode("utf-8", errors="ignore")
+            + self.truncation_suffix
+        )
+        for _ in range(5):  # bounded iterations
+            candidate = ToolResult(
+                content=[TextContent(type="text", text=candidate_text)]
+            )
+            serialized_size = len(pydantic_core.to_json(candidate, fallback=str))
+            if serialized_size <= self.max_size:
+                return candidate
+            # Trim further
+            overshoot = serialized_size - self.max_size
+            trim_target = len(candidate_text.encode("utf-8")) - overshoot - suffix_bytes
+            if trim_target <= 0:
+                candidate_text = self.truncation_suffix
+                break
+            candidate_text = (
+                candidate_text.encode("utf-8")[:trim_target].decode(
+                    "utf-8", errors="ignore"
+                )
+                + self.truncation_suffix
+            )
+        return ToolResult(content=[TextContent(type="text", text=candidate_text)])

116-123: Non-text content (images, embedded resources) is silently discarded without logging.

When the response exceeds max_size and contains mixed content types, only TextContent blocks are kept — everything else is dropped with no log entry or indication in the returned result. This could be surprising if a tool returns images alongside text.

Consider logging the count/types of dropped content blocks so operators can diagnose unexpected data loss.

Suggested enhancement
         texts = [b.text for b in result.content if isinstance(b, TextContent)]
+        dropped = [b for b in result.content if not isinstance(b, TextContent)]
+        if dropped:
+            logger.warning(
+                "Tool %r: dropping %d non-text content block(s) during truncation",
+                context.message.name,
+                len(dropped),
+            )
         text = (
             "\n\n".join(texts)
             if texts
             else serialized.decode("utf-8", errors="replace")
         )

Comment thread src/fastmcp/server/middleware/response_limiting.py
@dgenio dgenio force-pushed the feature/response-limiting-middleware-2004 branch from 52a4a78 to 63993aa Compare February 6, 2026 12:13
@dgenio dgenio force-pushed the feature/response-limiting-middleware-2004 branch from 63993aa to aacc150 Compare February 6, 2026 13:08
@dgenio dgenio requested a review from jlowin February 6, 2026 14:17
Copy link
Copy Markdown
Member

@jlowin jlowin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality. For issues and smaller PR improvements. server Related to FastMCP server implementation or server-side functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add response limiting middleware

2 participants