Add ResponseLimitingMiddleware for tool response size control by dgenio · Pull Request #3072 · PrefectHQ/fastmcp

dgenio · 2026-02-04T13:02:39Z

Large tool responses can overwhelm LLM context windows or cause memory issues. This PR adds middleware to enforce configurable size limits on tool outputs, with intelligent handling of structured vs unstructured responses.

The middleware truncates text responses that exceed the limit while preserving UTF-8 character boundaries. For structured responses (tools with output_schema returning complex objects), it raises a ToolError since truncation would corrupt the schema. Both behaviors are configurable.

from fastmcp.server.middleware.response_limiting import ResponseLimitingMiddleware

# Limit all tool responses to 500KB
mcp.add_middleware(ResponseLimitingMiddleware(max_size=500_000))

# Limit only specific tools, raise errors instead of truncating
mcp.add_middleware(ResponseLimitingMiddleware(
    max_size=100_000,
    tools=["search", "fetch_data"],
    raise_on_unstructured=True,
))

Key features:

Configurable size limit (default 1MB)
Tool-specific filtering via tools parameter
UTF-8 safe truncation with customizable suffix
Size metadata added to result's meta field for monitoring
Configurable raise_on_structured and raise_on_unstructured behavior

Closes #2004

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7d3ba5fbb4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

coderabbitai · 2026-02-04T13:07:24Z

Walkthrough

Adds a new ResponseLimitingMiddleware and docs. The middleware enforces a configurable byte max_size per tool or globally (optional tools list). On tool calls it measures the JSON-serialized result size; if over the limit it logs a warning and returns a truncated ToolResult with a single TextContent. Truncation concatenates TextContent blocks or falls back to serialized text, appends a configurable truncation_suffix, preserves UTF-8 boundaries, and handles the case where only the suffix fits. The class is publicly exported.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description lacks the required Contributors and Review checklists from the template, making it incomplete.	Add the Contributors Checklist and Review Checklist sections from the template, including all checkbox items and marking completed items as checked.
Linked Issues check	❓ Inconclusive	The PR description mentions features like raise_on_structured, raise_on_unstructured, and size metadata that differ from the final implementation noted in comments.	Clarify the implementation status: confirm whether raise_on_structured, raise_on_unstructured, and meta field features are actually implemented or remove them from the description.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding ResponseLimitingMiddleware for tool response size control.
Out of Scope Changes check	✅ Passed	All changes (middleware implementation, documentation, and exports) are directly related to implementing the response limiting middleware feature specified in issue `#2004`.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/fastmcp/server/middleware/response_limiting.py (2)

87-114: Silent exception handling may obscure serialization issues.

Both _measure_size and _estimate_content_size catch broad exceptions silently. While this provides resilience, it can hide unexpected serialization problems. Consider logging at DEBUG level for troubleshooting.

🔧 Suggested improvement

     def _measure_size(self, result: ToolResult) -> int:
         """Measure the serialized size of a ToolResult in bytes."""
         try:
             serialized = pydantic_core.to_json(result, fallback=str)
             return len(serialized)
-        except Exception:
+        except Exception as e:
             # Fallback: estimate from content
+            self._logger.debug(f"Serialization failed, using estimate: {e}")
             return self._estimate_content_size(result)

             try:
                 structured_bytes = pydantic_core.to_json(
                     result.structured_content, fallback=str
                 )
                 total += len(structured_bytes)
-            except Exception:
-                pass
+            except Exception as e:
+                self._logger.debug(f"Could not serialize structured_content: {e}")
         return total

254-258: Consider using specific type annotation for better type safety.

The base class uses MiddlewareContext[mt.CallToolRequestParams] for on_call_tool. Using Any here loses type information that could catch errors at static analysis time.

🔧 Suggested improvement

+import mcp.types as mt
+
 ...
 
     async def on_call_tool(
         self,
-        context: MiddlewareContext[Any],
+        context: MiddlewareContext[mt.CallToolRequestParams],
         call_next: CallNext[Any, ToolResult],
     ) -> ToolResult:

marvin-context-protocol · 2026-02-04T15:54:11Z

Test Failure Analysis

Summary: The static analysis workflow failed due to code formatting and linting issues in the test file.

Root Cause: The test file has:

An unused import: CallToolResult from fastmcp.client.client (not used anywhere in the tests)
Function signature formatting that doesn't match the project's style (ruff format requirements)

Suggested Solution: Apply the formatting fixes that ruff/prek identified. Run locally:

uv run prek run --all-files

This will automatically fix:

Remove the unused CallToolResult import on line 11
Reformat two function signatures to match code style

After running this command, the changes should be committed and pushed to pass CI.

Detailed Analysis

The prek pre-commit hooks ran with --all-files and found issues:

Ruff check: Found 1 error (auto-fixed)

Unused import: from fastmcp.client.client import CallToolResult

Ruff format: Reformatted 1 file

Function signature formatting for test_custom_truncation_suffix (lines 153-155)
Function signature formatting for test_size_meta_added_to_result (lines 176-178)

The exact diff needed:

-from fastmcp.client.client import CallToolResult

-    async def test_custom_truncation_suffix(
-        self, mcp_server: FastMCP, large_text: str
-    ):
+    async def test_custom_truncation_suffix(self, mcp_server: FastMCP, large_text: str):

-    async def test_size_meta_added_to_result(self, mcp_server: FastMCP, small_text: str):
+    async def test_size_meta_added_to_result(
+        self, mcp_server: FastMCP, small_text: str
+    ):

Related Files

tests/server/middleware/test_response_limiting.py - Contains the formatting issues that need to be fixed
.pre-commit-config.yaml - Defines the ruff and formatting hooks that enforce code style

marvin-context-protocol · 2026-02-04T15:59:15Z

Test Failure Analysis

Summary: Test test_multiple_content_blocks fails because the truncation logic doesn't handle multiple TextContent blocks correctly, resulting in an incorrect "Binary content cannot be truncated" error.

Root Cause: The _truncate_text_content method in response_limiting.py has a logic flaw on lines 166-194. When it encounters multiple TextContent blocks:

It truncates the first TextContent block and sets truncated=True

Lines 190-192 skip all remaining content after truncation:

elif truncated:
    # Skip remaining content after truncation
    continue

This means subsequent TextContent blocks (like "Second block" in the test) are completely excluded
However, the JSON serialization overhead means the result (151 bytes) still exceeds the limit (100 bytes)
The check on lines 322-333 then raises a ToolError with the message "Binary content cannot be truncated" - but there's no binary content, just multiple text blocks that weren't fully processed

Suggested Solution: Fix the _truncate_text_content method to properly handle multiple TextContent blocks:

Option A (Recommended): Remove or modify lines 190-192 to keep iterating through remaining blocks. After truncating the first block, continue processing other TextContent blocks (they should be excluded from new_content to stay under the limit, but the logic should be clearer).
Option B: Update the error message on line 330-332 to be more accurate. The error should distinguish between "still over limit due to binary content" vs "still over limit due to JSON overhead/multiple blocks".
Update the test expectations: The test on line 339-356 expects truncation to succeed, but with multiple text blocks and such a small limit (100 bytes), it may be more realistic to either:
- Increase the limit to accommodate the JSON overhead
- Or expect the test to raise an error

Files to modify:

src/fastmcp/server/middleware/response_limiting.py: Fix the truncation logic around lines 166-194
tests/server/middleware/test_response_limiting.py: Either adjust test expectations or increase the max_size parameter

Update (2026-02-04 16:02 UTC): This failure persists after commit 918c3b5. The test continues to fail across all Python versions (3.10, 3.13) and platforms (ubuntu, windows). No code changes have been made to address the root cause since my initial analysis.

Detailed Analysis

Test Setup (line 339-356):

mcp.add_middleware(ResponseLimitingMiddleware(max_size=100))

@mcp.tool()
def multi_block() -> ToolResult:
    return ToolResult(
        content=[
            TextContent(type="text", text="First block " + "x" * 500),
            TextContent(type="text", text="Second block"),
        ]
    )

Actual Result:

Original size: 691 bytes
After truncation: 151 bytes (still exceeds 100 byte limit)
Error: "Response size (151 bytes) still exceeds limit (100 bytes) after truncation. Binary content cannot be truncated."

Why it fails:
The second TextContent block ("Second block") is being skipped entirely due to the truncated=True check, but the JSON serialization of the ToolResult structure itself adds overhead that keeps the result above 100 bytes even with just the truncated first block.

Related Files

src/fastmcp/server/middleware/response_limiting.py: Lines 153-211 (_truncate_text_content method)
src/fastmcp/server/middleware/response_limiting.py: Lines 322-333 (size check after truncation)
tests/server/middleware/test_response_limiting.py: Lines 339-356 (failing test)

coderabbitai

Actionable comments posted: 3

coderabbitai

Actionable comments posted: 3

coderabbitai

Actionable comments posted: 1

marvin-context-protocol · 2026-02-04T20:07:42Z

Test Failure Analysis

Summary: The failing test test_uv_transport is unrelated to this PR's changes and appears to be a flaky infrastructure test.

Root Cause: The test_uv_transport test is timing out while waiting for a UV subprocess to complete. The test spawns a temporary FastMCP server using UvStdioTransport, which:

Creates a virtual environment in a temp directory
Installs dependencies (including fastmcp) via uv
Runs the server and makes a test tool call

The timeout occurs during subprocess cleanup after the test completes successfully (the assertion at line 54 passes). The stack trace shows the process is hanging in os.waitpid() during transport cleanup.

Why This Isn't Related to Your Changes:

Your PR only adds response limiting middleware in src/fastmcp/server/middleware/response_limiting.py
The failing test (tests/client/transports/test_uv_transport.py) tests client transport functionality, not middleware
The test has passed in 3 out of 4 workflow runs on this PR branch
This is a known flaky test pattern - client process tests are run serially with -x flag specifically because they can be unstable

Suggested Solution: Re-run the failed workflow. This is a transient infrastructure issue, not a code problem. The test is flaky due to process cleanup timing in the CI environment.

Detailed Analysis

Log Evidence

The test successfully completes its work:

Server starts and connects: DEBUG Stdio transport connected
Dependencies install successfully
The test assertion would have passed (no assertion failure shown)

The timeout occurs during teardown:

Stack of waitpid-0 (140628828088000)
File "/asyncio/unix_events.py", line 1392, in _do_waitpid
    pid, status = os.waitpid(expected_pid, 0)
Failed: Timeout (>10.0s) from pytest-timeout.

Evidence of Flakiness

This PR's workflow runs: 3 successes, 1 failure (this one)
Test design: Marked as @pytest.mark.client_process and runs with -x flag (stop on first failure), indicating known fragility
Main branch status: Recent runs mostly pass, with occasional unrelated failures

Related Files

tests/client/transports/test_uv_transport.py:54 - Test that timed out (unrelated to PR changes)
src/fastmcp/server/middleware/response_limiting.py - New middleware (what this PR actually changes)
tests/server/middleware/test_response_limiting.py - Tests for the new middleware (all passing)

Recommendation: ✅ Safe to ignore this failure and merge once re-run succeeds. The PR code is solid.

marvin-context-protocol · 2026-02-04T20:46:58Z

Test Failure Analysis

Summary: The Windows Python 3.10 test is timing out on test_run_mcp_config, but this failure is not related to the ResponseLimitingMiddleware changes in this PR.

Root Cause: The failing test (tests/cli/test_run.py::TestMCPConfig::test_run_mcp_config) creates a temporary FastMCP server and attempts to connect to it via stdio transport. The test times out after 5 seconds on Windows while waiting for the stdio connection to establish. This is a pre-existing Windows-specific flakiness in the test infrastructure, not a bug introduced by this PR.

Evidence:

The ResponseLimitingMiddleware is not auto-imported (not in src/fastmcp/server/middleware/__init__.py)
No existing code imports or uses the new middleware
The middleware only affects tool call responses after they execute, not server startup or stdio connections
The timeout occurs during server connection setup, before any tools are called

Suggested Solution:

The PR changes are sound. The test failure is a known Windows flakiness issue. You have two options:

Merge as-is (recommended) - The failure is unrelated to this PR's functionality
Re-run the tests - Sometimes Windows stdio tests pass on retry due to timing variations

The test_run_mcp_config test should be investigated separately as it appears to have Windows-specific issues with subprocess stdio communication.

Detailed Analysis

The timeout occurs in this test flow:

Test creates a temp Python file with a FastMCP server (line 101-115)
Creates an MCP config pointing to that file (line 119-124)
Calls create_mcp_config_server which tries to spawn the server process
Attempts stdio connection via Client (line 128-132)
Timeout occurs at 5 seconds - the stdio connection never completes on Windows

From the logs:

timeout: 5.0s
timeout method: thread
timeout func_only: False
collected 4073 items / 17 deselected / 4056 selected

tests\cli\test_run.py ........+++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++
~~~~~~~~~~~~~~~~~~~~~ Stack of AnyIO worker thread (4940) ~~~~~~~~~~~~~~~~~~~~~
...
DEBUG    Stdio transport connected             stdio.py:191

The test gets stuck after "Stdio transport connected" but before the actual tool listing completes.

Related Files

Changed by this PR (all unrelated to the failure):

src/fastmcp/server/middleware/response_limiting.py - New middleware (not imported anywhere)
tests/server/middleware/test_response_limiting.py - Tests for the middleware (passing)
docs/servers/middleware.mdx - Documentation

Failing test:

tests/cli/test_run.py:99-133 - test_run_mcp_config - Tests MCP config file server creation (Windows stdio flakiness)

jlowin

Thanks @dgenio! This is a great idea and response size limiting is a natural fit for the middleware system, and the structured vs. unstructured distinction is the right design axis. A few things to address from automated review before merge:

Truncation math mismatch

_truncate_text_content computes excess = current_size - self.max_size where current_size is the full serialized ToolResult (JSON envelope, content array, keys, etc.), but then subtracts excess + suffix_bytes from the raw text byte length. These aren't in the same coordinate space — the JSON overhead doesn't cancel out cleanly, so for small max_size values the truncated result could still exceed the limit. The post-truncation size check catches this and raises an error, but it means truncation silently degrades to an error for small limits. Consider accounting for serialization overhead in the truncation target, or iteratively measure-and-trim. (or document the gap, which IMO is acceptable)

Type safety in `on_call_tool`

The base class uses MiddlewareContext[mt.CallToolRequestParams] and CallNext[mt.CallToolRequestParams, ToolResult] — this PR uses Any for both, which loses static type checking. Should match the base class signature.

`hasattr`/`getattr` in `_estimate_content_size`

elif hasattr(block, "data") and self.include_binary:
    data = getattr(block, "data", None)

Project convention is isinstance checks with type narrowing. Use isinstance(block, ImageContent) (and EmbeddedResource if needed).

Wrapped-result detection is fragile

_has_structured_content inspects the dict shape ({"result": <value>}) to detect FastMCP's x-fastmcp-wrap-result pattern. Any user tool that naturally returns {"result": "some_string"} as structured output would be misidentified as unstructured and truncated.

The tool's output_schema is actually reachable through public API — context.fastmcp_context.fastmcp.get_tool(name) returns the Tool object, which has output_schema with the x-fastmcp-wrap-result key when wrapping is in play. Something like:

ctx = context.fastmcp_context
if ctx:
    tool = await ctx.fastmcp.get_tool(tool_name)
    if tool and tool.output_schema:
        is_wrapped = tool.output_schema.get("x-fastmcp-wrap-result", False)

This would make the detection reliable instead of inferring from dict shape.

Test assertion too loose

assert len(result.content[0].text.encode("utf-8")) <= max_size + 500

500-byte tolerance on a 1000-byte limit means the response could be 50% over. The middleware should guarantee the serialized result fits within max_size, and the test should verify that.

Remove `custom_logger` and `log_level` parameters

The logging/timing middleware have these because logging is their purpose. For ResponseLimitingMiddleware, logging is incidental — just use the module-level logger at appropriate levels (WARNING for truncation events). Users can control it through standard Python logging config on the fastmcp.server.middleware.response_limiting logger. We don't want to establish a convention where every middleware that happens to emit a log line needs logger configurability.

Minor

The binary search in _truncate_utf8 works but text.encode('utf-8')[:max_bytes].decode('utf-8', errors='ignore') is simpler with the same correctness.

Design is right and docs are well-written. Happy to re-review once these are addressed.

marvin-context-protocol · 2026-02-05T06:28:44Z

Test Failure Analysis

Summary: Test in is timing out on Windows during SQLite database initialization in the .

Root Cause: The test times out while initializing an , which internally creates a for OAuth state storage. On Windows, SQLite operations can hang during concurrent database access due to file locking issues. The timeout occurs at diskcache/core.py:165 during con.execute(select).fetchall(), indicating the SQLite connection is blocked.

This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues.

Suggested Solution:

The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to tests/server/auth/providers/test_aws.py:88:

import sys
import pytest

class TestAWSCognitoProvider:
    # ... other tests ...
    
    @pytest.mark.timeout(15)  # Increase timeout for Windows SQLite operations
    @pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
    def test_oidc_discovery_integration(self):
        """Test that OIDC discovery endpoints are used correctly."""
        # ... rest of test ...

Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues.

Detailed Analysis

The timeout occurs during provider initialization:

File "D:\a\fastmcp\fastmcp\tests\server\auth\providers\test_aws.py", line 91, in test_oidc_discovery_integration
    provider = AWSCognitoProvider(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\providers\aws.py", line 151, in __init__
    super().__init__(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\oauth_proxy\proxy.py", line 413, in __init__
    key_value=DiskStore(directory=settings.home / "oauth-proxy"),
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 591, in __init__
    self._sql  # pylint: disable=pointless-statement
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 636, in _con
    settings = con.execute(select).fetchall()

The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the isolate_settings_home fixture in conftest.py:58-68 was added to prevent SQLite locking issues, it appears this particular test is still susceptible to timeouts on Windows.

This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes.

Related Files

tests/server/auth/providers/test_aws.py:88 - Failing test
src/fastmcp/server/auth/providers/aws.py:151 - AWSCognitoProvider initialization
src/fastmcp/server/auth/oauth_proxy/proxy.py:413 - DiskStore creation
tests/conftest.py:58-68 - isolate_settings_home fixture that attempts to prevent SQLite locking

dgenio · 2026-02-05T06:28:47Z

Thanks for the thorough review @jlowin! I've addressed all feedback in commit d6793bb:

Truncation math - Switched to iterative measure-and-trim to account for JSON serialization overhead. The loop measures the actual serialized result after each truncation attempt.
Type safety - Fixed on_call_tool signature to use MiddlewareContext[mt.CallToolRequestParams] and CallNext[mt.CallToolRequestParams, ToolResult] instead of Any.
isinstance checks - Replaced hasattr/getattr with proper isinstance checks for ImageContent, EmbeddedResource, TextResourceContents, and BlobResourceContents.
Wrapped-result detection - Now uses context.fastmcp_context.fastmcp.get_tool(name) to reliably check output_schema.get("x-fastmcp-wrap-result") instead of inferring from dict shape.
Test assertion - Tightened to verify actual serialized size fits within max_size (no more 500-byte tolerance).
Removed logger params - Removed custom_logger and log_level; now uses module-level logger at WARNING level. Users can configure via standard Python logging on fastmcp.server.middleware.response_limiting.
Simplified _truncate_utf8 - Replaced binary search with encoded[:max_bytes].decode('utf-8', errors='ignore').

coderabbitai

Actionable comments posted: 1

marvin-context-protocol · 2026-02-05T06:29:26Z

Test Failure Analysis

Summary: Test test_oidc_discovery_integration in tests/server/auth/providers/test_aws.py is timing out on Windows during SQLite database initialization in the DiskStore.

Root Cause: The test times out while initializing an AWSCognitoProvider, which internally creates a DiskStore for OAuth state storage. On Windows, SQLite operations can hang during concurrent database access due to file locking issues. The timeout occurs at diskcache/core.py:165 during con.execute(select).fetchall(), indicating the SQLite connection is blocked.

This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues.

Suggested Solution:

The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to tests/server/auth/providers/test_aws.py:88:

import sys
import pytest

class TestAWSCognitoProvider:
    # ... other tests ...
    
    @pytest.mark.timeout(15)  # Increase timeout for Windows SQLite operations
    @pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
    def test_oidc_discovery_integration(self):
        """Test that OIDC discovery endpoints are used correctly."""
        # ... rest of test ...

Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues.

Detailed Analysis

The timeout occurs during provider initialization:

File "D:\a\fastmcp\fastmcp\tests\server\auth\providers\test_aws.py", line 91, in test_oidc_discovery_integration
    provider = AWSCognitoProvider(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\providers\aws.py", line 151, in __init__
    super().__init__(...)
  File "D:\a\fastmcp\fastmcp\src\fastmcp\server\auth\oauth_proxy\proxy.py", line 413, in __init__
    key_value=DiskStore(directory=settings.home / "oauth-proxy"),
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 591, in __init__
    self._sql  # pylint: disable=pointless-statement
  File "D:\a\fastmcp\fastmcp\.venv\lib\site-packages\diskcache\core.py", line 636, in _con
    settings = con.execute(select).fetchall()

The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the isolate_settings_home fixture in conftest.py:58-68 was added to prevent SQLite locking issues, it appears this particular test is still susceptible to timeouts on Windows.

This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes.

Related Files

tests/server/auth/providers/test_aws.py:88 - Failing test
src/fastmcp/server/auth/providers/aws.py:151 - AWSCognitoProvider initialization
src/fastmcp/server/auth/oauth_proxy/proxy.py:413 - DiskStore creation
tests/conftest.py:58-68 - isolate_settings_home fixture that attempts to prevent SQLite locking

jlowin

@dgenio sorry in my previous review I focused on correctness of code but not what it actually does -- the design is way too complex for what this needs to do. This should be ~10 lines, not ~400.

The middleware currently maintains separate code paths for structured vs unstructured content, binary inclusion/exclusion, wrapped-value detection via schema inspection, iterative truncation loops with magic numbers, structured content syncing — all to preserve response structure during truncation. That's the wrong goal. If a response is too big, the right thing is to flatten it to text and truncate. There's definitionally no need to preserve the structure of something that's being truncated anyway.

The simpler approach:

async def on_call_tool(self, context, call_next):
    result = await call_next(context)
    
    if self.tools is not None and context.message.name not in self.tools:
        return result

    serialized = pydantic_core.to_json(result, fallback=str)
    if len(serialized) <= self.max_size:
        return result

    # Over limit: extract text, truncate, return single TextContent
    texts = [b.text for b in result.content if isinstance(b, TextContent)]
    text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
    return self._truncate_to_result(text)

This eliminates raise_on_structured, raise_on_unstructured, include_binary, _has_structured_content, _is_wrapped_simple_value, _estimate_content_size, the iterative truncation loop, the wrapped-value schema inspection, and the structured content syncing logic. The config surface drops to three parameters: max_size, truncation_suffix, tools.

The whole file should be the class with init, on_call_tool, and one helper to do the truncate-and-wrap math. Tests and docs simplify accordingly.

dgenio · 2026-02-06T11:55:01Z

Completely rewrote per your feedback. The diff is now -664 / +119 lines from the previous version.

Core change: the entire on_call_tool is now ~15 lines — serialize, check size, extract text, truncate.

async def on_call_tool(self, context, call_next):
    result = await call_next(context)
    if self.tools is not None and context.message.name not in self.tools:
        return result
    serialized = pydantic_core.to_json(result, fallback=str)
    if len(serialized) <= self.max_size:
        return result
    texts = [b.text for b in result.content if isinstance(b, TextContent)]
    text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
    return self._truncate_to_result(text)

Removed: raise_on_structured, raise_on_unstructured, include_binary, structure preservation logic, wrapped-result detection, iterative truncation, schema inspection. Config is now just max_size, truncation_suffix, tools.

🤖 Generated with Claude

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/fastmcp/server/middleware/response_limiting.py (2)

68-89: Hardcoded overhead = 50 is fragile and the final result is never verified against max_size.

The JSON wrapper overhead depends on the ToolResult model's serialization (field names, default values like isError, potential meta, etc.) and can change across library versions. A hardcoded constant makes the guarantee that the truncated result fits within max_size unreliable.

A more robust approach: build the candidate ToolResult, serialize it, and if it still exceeds max_size, trim further in a loop (as the PR comments describe was intended).

Sketch of a verify-after-truncation approach

     def _truncate_to_result(self, text: str) -> ToolResult:
         """Truncate text to fit within max_size and wrap in ToolResult."""
         suffix_bytes = len(self.truncation_suffix.encode("utf-8"))
-        # Account for JSON wrapper overhead: {"content":[{"type":"text","text":"..."}]}
-        overhead = 50
-        target_size = self.max_size - suffix_bytes - overhead
-
-        if target_size <= 0:
-            # Edge case: max_size too small for even the suffix
-            truncated = self.truncation_suffix
-        else:
-            # Truncate to target size, preserving UTF-8 boundaries
-            encoded = text.encode("utf-8")
-            if len(encoded) <= target_size:
-                truncated = text + self.truncation_suffix
-            else:
-                truncated = (
-                    encoded[:target_size].decode("utf-8", errors="ignore")
-                    + self.truncation_suffix
-                )
-
-        return ToolResult(content=[TextContent(type="text", text=truncated)])
+        # Build an initial candidate, then measure and trim if needed
+        encoded = text.encode("utf-8")
+        # Start with a conservative estimate; refine via actual measurement
+        candidate_text = (
+            encoded[: max(0, self.max_size)].decode("utf-8", errors="ignore")
+            + self.truncation_suffix
+        )
+        for _ in range(5):  # bounded iterations
+            candidate = ToolResult(
+                content=[TextContent(type="text", text=candidate_text)]
+            )
+            serialized_size = len(pydantic_core.to_json(candidate, fallback=str))
+            if serialized_size <= self.max_size:
+                return candidate
+            # Trim further
+            overshoot = serialized_size - self.max_size
+            trim_target = len(candidate_text.encode("utf-8")) - overshoot - suffix_bytes
+            if trim_target <= 0:
+                candidate_text = self.truncation_suffix
+                break
+            candidate_text = (
+                candidate_text.encode("utf-8")[:trim_target].decode(
+                    "utf-8", errors="ignore"
+                )
+                + self.truncation_suffix
+            )
+        return ToolResult(content=[TextContent(type="text", text=candidate_text)])

116-123: Non-text content (images, embedded resources) is silently discarded without logging.

When the response exceeds max_size and contains mixed content types, only TextContent blocks are kept — everything else is dropped with no log entry or indication in the returned result. This could be surprising if a tool returns images alongside text.

Consider logging the count/types of dropped content blocks so operators can diagnose unexpected data loss.

Suggested enhancement

         texts = [b.text for b in result.content if isinstance(b, TextContent)]
+        dropped = [b for b in result.content if not isinstance(b, TextContent)]
+        if dropped:
+            logger.warning(
+                "Tool %r: dropping %d non-text content block(s) during truncation",
+                context.message.name,
+                len(dropped),
+            )
         text = (
             "\n\n".join(texts)
             if texts
             else serialized.decode("utf-8", errors="replace")
         )

jlowin

Thank you!

marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. server Related to FastMCP server implementation or server-side functionality. labels Feb 4, 2026

chatgpt-codex-connector Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread docs/servers/middleware.mdx Outdated

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread docs/servers/middleware.mdx Outdated

Comment thread docs/servers/middleware.mdx Outdated

Comment thread docs/servers/middleware.mdx Outdated

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

jlowin reviewed Feb 4, 2026

View reviewed changes

Comment thread docs/servers/middleware.mdx Outdated

coderabbitai Bot reviewed Feb 5, 2026

View reviewed changes

Comment thread src/fastmcp/server/middleware/response_limiting.py Outdated

jlowin requested changes Feb 5, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 6, 2026

View reviewed changes

Comment thread src/fastmcp/server/middleware/response_limiting.py

dgenio force-pushed the feature/response-limiting-middleware-2004 branch from 52a4a78 to 63993aa Compare February 6, 2026 12:13

Add ResponseLimitingMiddleware for tool response size control

aacc150

dgenio force-pushed the feature/response-limiting-middleware-2004 branch from 63993aa to aacc150 Compare February 6, 2026 13:08

dgenio requested a review from jlowin February 6, 2026 14:17

jlowin approved these changes Feb 6, 2026

View reviewed changes

jlowin merged commit 30832ce into PrefectHQ:main Feb 6, 2026
9 checks passed

This was referenced Feb 6, 2026

Add output_schema caveat to response limiting docs #3099

Merged

Add missing beta2 features to v3 release tracking #3105

Merged

kterui9019 mentioned this pull request Mar 31, 2026

ResponseLimitingMiddleware truncation breaks tools with outputSchema #3717

Closed

jlowin mentioned this pull request Apr 3, 2026

fix: preserve structured_content when ResponseLimitingMiddleware truncates #3740

Closed

Conversation

dgenio commented Feb 4, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 4, 2026

Test Failure Analysis

Uh oh!

marvin-context-protocol Bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Failure Analysis

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 4, 2026

Test Failure Analysis

Log Evidence

Evidence of Flakiness

Uh oh!

marvin-context-protocol Bot commented Feb 4, 2026

Test Failure Analysis

Uh oh!

jlowin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Truncation math mismatch

Type safety in on_call_tool

hasattr/getattr in _estimate_content_size

Wrapped-result detection is fragile

Test assertion too loose

Remove custom_logger and log_level parameters

Minor

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 5, 2026

Test Failure Analysis

Uh oh!

dgenio commented Feb 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marvin-context-protocol Bot commented Feb 5, 2026

Test Failure Analysis

Uh oh!

jlowin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dgenio commented Feb 6, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

coderabbitai Bot commented Feb 4, 2026 •

edited

Loading

marvin-context-protocol Bot commented Feb 4, 2026 •

edited

Loading

jlowin left a comment •

edited

Loading

Type safety in `on_call_tool`

`hasattr`/`getattr` in `_estimate_content_size`

Remove `custom_logger` and `log_level` parameters

jlowin left a comment •

edited

Loading