Add ResponseLimitingMiddleware for tool response size control#3072
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7d3ba5fbb4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
WalkthroughAdds a new ResponseLimitingMiddleware and docs. The middleware enforces a configurable byte 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/fastmcp/server/middleware/response_limiting.py (2)
87-114: Silent exception handling may obscure serialization issues.Both
_measure_sizeand_estimate_content_sizecatch broad exceptions silently. While this provides resilience, it can hide unexpected serialization problems. Consider logging at DEBUG level for troubleshooting.🔧 Suggested improvement
def _measure_size(self, result: ToolResult) -> int: """Measure the serialized size of a ToolResult in bytes.""" try: serialized = pydantic_core.to_json(result, fallback=str) return len(serialized) - except Exception: + except Exception as e: # Fallback: estimate from content + self._logger.debug(f"Serialization failed, using estimate: {e}") return self._estimate_content_size(result)try: structured_bytes = pydantic_core.to_json( result.structured_content, fallback=str ) total += len(structured_bytes) - except Exception: - pass + except Exception as e: + self._logger.debug(f"Could not serialize structured_content: {e}") return total
254-258: Consider using specific type annotation for better type safety.The base class uses
MiddlewareContext[mt.CallToolRequestParams]foron_call_tool. UsingAnyhere loses type information that could catch errors at static analysis time.🔧 Suggested improvement
+import mcp.types as mt + ... async def on_call_tool( self, - context: MiddlewareContext[Any], + context: MiddlewareContext[mt.CallToolRequestParams], call_next: CallNext[Any, ToolResult], ) -> ToolResult:
Test Failure AnalysisSummary: The static analysis workflow failed due to code formatting and linting issues in the test file. Root Cause: The test file has:
Suggested Solution: Apply the formatting fixes that ruff/prek identified. Run locally: uv run prek run --all-filesThis will automatically fix:
After running this command, the changes should be committed and pushed to pass CI. Detailed AnalysisThe prek pre-commit hooks ran with Ruff check: Found 1 error (auto-fixed)
Ruff format: Reformatted 1 file
The exact diff needed: -from fastmcp.client.client import CallToolResult
- async def test_custom_truncation_suffix(
- self, mcp_server: FastMCP, large_text: str
- ):
+ async def test_custom_truncation_suffix(self, mcp_server: FastMCP, large_text: str):
- async def test_size_meta_added_to_result(self, mcp_server: FastMCP, small_text: str):
+ async def test_size_meta_added_to_result(
+ self, mcp_server: FastMCP, small_text: str
+ ):Related Files
|
Test Failure AnalysisSummary: Test Root Cause: The
Suggested Solution: Fix the
Files to modify:
Update (2026-02-04 16:02 UTC): This failure persists after commit 918c3b5. The test continues to fail across all Python versions (3.10, 3.13) and platforms (ubuntu, windows). No code changes have been made to address the root cause since my initial analysis. Detailed AnalysisTest Setup (line 339-356): mcp.add_middleware(ResponseLimitingMiddleware(max_size=100))
@mcp.tool()
def multi_block() -> ToolResult:
return ToolResult(
content=[
TextContent(type="text", text="First block " + "x" * 500),
TextContent(type="text", text="Second block"),
]
)Actual Result:
Why it fails: Related Files
|
Test Failure AnalysisSummary: The failing test Root Cause: The
The timeout occurs during subprocess cleanup after the test completes successfully (the assertion at line 54 passes). The stack trace shows the process is hanging in Why This Isn't Related to Your Changes:
Suggested Solution: Re-run the failed workflow. This is a transient infrastructure issue, not a code problem. The test is flaky due to process cleanup timing in the CI environment. Detailed AnalysisLog EvidenceThe test successfully completes its work:
The timeout occurs during teardown: Evidence of Flakiness
Related Files
Recommendation: ✅ Safe to ignore this failure and merge once re-run succeeds. The PR code is solid. |
Test Failure AnalysisSummary: The Windows Python 3.10 test is timing out on Root Cause: The failing test ( Evidence:
Suggested Solution: The PR changes are sound. The test failure is a known Windows flakiness issue. You have two options:
The Detailed AnalysisThe timeout occurs in this test flow:
From the logs: The test gets stuck after "Stdio transport connected" but before the actual tool listing completes. Related FilesChanged by this PR (all unrelated to the failure):
Failing test:
|
There was a problem hiding this comment.
Thanks @dgenio! This is a great idea and response size limiting is a natural fit for the middleware system, and the structured vs. unstructured distinction is the right design axis. A few things to address from automated review before merge:
Truncation math mismatch
_truncate_text_content computes excess = current_size - self.max_size where current_size is the full serialized ToolResult (JSON envelope, content array, keys, etc.), but then subtracts excess + suffix_bytes from the raw text byte length. These aren't in the same coordinate space — the JSON overhead doesn't cancel out cleanly, so for small max_size values the truncated result could still exceed the limit. The post-truncation size check catches this and raises an error, but it means truncation silently degrades to an error for small limits. Consider accounting for serialization overhead in the truncation target, or iteratively measure-and-trim. (or document the gap, which IMO is acceptable)
Type safety in on_call_tool
The base class uses MiddlewareContext[mt.CallToolRequestParams] and CallNext[mt.CallToolRequestParams, ToolResult] — this PR uses Any for both, which loses static type checking. Should match the base class signature.
hasattr/getattr in _estimate_content_size
elif hasattr(block, "data") and self.include_binary:
data = getattr(block, "data", None)Project convention is isinstance checks with type narrowing. Use isinstance(block, ImageContent) (and EmbeddedResource if needed).
Wrapped-result detection is fragile
_has_structured_content inspects the dict shape ({"result": <value>}) to detect FastMCP's x-fastmcp-wrap-result pattern. Any user tool that naturally returns {"result": "some_string"} as structured output would be misidentified as unstructured and truncated.
The tool's output_schema is actually reachable through public API — context.fastmcp_context.fastmcp.get_tool(name) returns the Tool object, which has output_schema with the x-fastmcp-wrap-result key when wrapping is in play. Something like:
ctx = context.fastmcp_context
if ctx:
tool = await ctx.fastmcp.get_tool(tool_name)
if tool and tool.output_schema:
is_wrapped = tool.output_schema.get("x-fastmcp-wrap-result", False)This would make the detection reliable instead of inferring from dict shape.
Test assertion too loose
assert len(result.content[0].text.encode("utf-8")) <= max_size + 500500-byte tolerance on a 1000-byte limit means the response could be 50% over. The middleware should guarantee the serialized result fits within max_size, and the test should verify that.
Remove custom_logger and log_level parameters
The logging/timing middleware have these because logging is their purpose. For ResponseLimitingMiddleware, logging is incidental — just use the module-level logger at appropriate levels (WARNING for truncation events). Users can control it through standard Python logging config on the fastmcp.server.middleware.response_limiting logger. We don't want to establish a convention where every middleware that happens to emit a log line needs logger configurability.
Minor
- The binary search in
_truncate_utf8works buttext.encode('utf-8')[:max_bytes].decode('utf-8', errors='ignore')is simpler with the same correctness.
Design is right and docs are well-written. Happy to re-review once these are addressed.
Test Failure AnalysisSummary: Test in is timing out on Windows during SQLite database initialization in the . Root Cause: The test times out while initializing an , which internally creates a for OAuth state storage. On Windows, SQLite operations can hang during concurrent database access due to file locking issues. The timeout occurs at This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues. Suggested Solution: The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to import sys
import pytest
class TestAWSCognitoProvider:
# ... other tests ...
@pytest.mark.timeout(15) # Increase timeout for Windows SQLite operations
@pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
def test_oidc_discovery_integration(self):
"""Test that OIDC discovery endpoints are used correctly."""
# ... rest of test ...Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues. Detailed AnalysisThe timeout occurs during provider initialization: The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes. Related Files
|
|
Thanks for the thorough review @jlowin! I've addressed all feedback in commit d6793bb:
|
Test Failure AnalysisSummary: Test Root Cause: The test times out while initializing an This is unrelated to the PR changes (which add ResponseLimitingMiddleware). This appears to be a flaky test that occasionally fails on Windows due to SQLite/diskcache timing issues. Suggested Solution: The test needs to be marked with a longer timeout or skipped on Windows. Add a pytest marker to import sys
import pytest
class TestAWSCognitoProvider:
# ... other tests ...
@pytest.mark.timeout(15) # Increase timeout for Windows SQLite operations
@pytest.mark.skipif(sys.platform == "win32", reason="Flaky on Windows due to SQLite locking")
def test_oidc_discovery_integration(self):
"""Test that OIDC discovery endpoints are used correctly."""
# ... rest of test ...Alternatively, if the test needs to run on Windows, consider using an in-memory key-value store instead of DiskStore for this specific test to avoid SQLite file locking issues. Detailed AnalysisThe timeout occurs during provider initialization: The stack trace shows the process is stuck in SQLite's connection setup, waiting for a database lock. While the This is a known issue with diskcache and SQLite on Windows - see python-diskcache Issue #85 for details about timeout issues during initialization from multiple threads/processes. Related Files
|
There was a problem hiding this comment.
@dgenio sorry in my previous review I focused on correctness of code but not what it actually does -- the design is way too complex for what this needs to do. This should be ~10 lines, not ~400.
The middleware currently maintains separate code paths for structured vs unstructured content, binary inclusion/exclusion, wrapped-value detection via schema inspection, iterative truncation loops with magic numbers, structured content syncing — all to preserve response structure during truncation. That's the wrong goal. If a response is too big, the right thing is to flatten it to text and truncate. There's definitionally no need to preserve the structure of something that's being truncated anyway.
The simpler approach:
async def on_call_tool(self, context, call_next):
result = await call_next(context)
if self.tools is not None and context.message.name not in self.tools:
return result
serialized = pydantic_core.to_json(result, fallback=str)
if len(serialized) <= self.max_size:
return result
# Over limit: extract text, truncate, return single TextContent
texts = [b.text for b in result.content if isinstance(b, TextContent)]
text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
return self._truncate_to_result(text)This eliminates raise_on_structured, raise_on_unstructured, include_binary, _has_structured_content, _is_wrapped_simple_value, _estimate_content_size, the iterative truncation loop, the wrapped-value schema inspection, and the structured content syncing logic. The config surface drops to three parameters: max_size, truncation_suffix, tools.
The whole file should be the class with init, on_call_tool, and one helper to do the truncate-and-wrap math. Tests and docs simplify accordingly.
|
Completely rewrote per your feedback. The diff is now -664 / +119 lines from the previous version. Core change: the entire async def on_call_tool(self, context, call_next):
result = await call_next(context)
if self.tools is not None and context.message.name not in self.tools:
return result
serialized = pydantic_core.to_json(result, fallback=str)
if len(serialized) <= self.max_size:
return result
texts = [b.text for b in result.content if isinstance(b, TextContent)]
text = "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace")
return self._truncate_to_result(text)Removed: 🤖 Generated with Claude |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/fastmcp/server/middleware/response_limiting.py (2)
68-89: Hardcodedoverhead = 50is fragile and the final result is never verified againstmax_size.The JSON wrapper overhead depends on the
ToolResultmodel's serialization (field names, default values likeisError, potentialmeta, etc.) and can change across library versions. A hardcoded constant makes the guarantee that the truncated result fits withinmax_sizeunreliable.A more robust approach: build the candidate
ToolResult, serialize it, and if it still exceedsmax_size, trim further in a loop (as the PR comments describe was intended).Sketch of a verify-after-truncation approach
def _truncate_to_result(self, text: str) -> ToolResult: """Truncate text to fit within max_size and wrap in ToolResult.""" suffix_bytes = len(self.truncation_suffix.encode("utf-8")) - # Account for JSON wrapper overhead: {"content":[{"type":"text","text":"..."}]} - overhead = 50 - target_size = self.max_size - suffix_bytes - overhead - - if target_size <= 0: - # Edge case: max_size too small for even the suffix - truncated = self.truncation_suffix - else: - # Truncate to target size, preserving UTF-8 boundaries - encoded = text.encode("utf-8") - if len(encoded) <= target_size: - truncated = text + self.truncation_suffix - else: - truncated = ( - encoded[:target_size].decode("utf-8", errors="ignore") - + self.truncation_suffix - ) - - return ToolResult(content=[TextContent(type="text", text=truncated)]) + # Build an initial candidate, then measure and trim if needed + encoded = text.encode("utf-8") + # Start with a conservative estimate; refine via actual measurement + candidate_text = ( + encoded[: max(0, self.max_size)].decode("utf-8", errors="ignore") + + self.truncation_suffix + ) + for _ in range(5): # bounded iterations + candidate = ToolResult( + content=[TextContent(type="text", text=candidate_text)] + ) + serialized_size = len(pydantic_core.to_json(candidate, fallback=str)) + if serialized_size <= self.max_size: + return candidate + # Trim further + overshoot = serialized_size - self.max_size + trim_target = len(candidate_text.encode("utf-8")) - overshoot - suffix_bytes + if trim_target <= 0: + candidate_text = self.truncation_suffix + break + candidate_text = ( + candidate_text.encode("utf-8")[:trim_target].decode( + "utf-8", errors="ignore" + ) + + self.truncation_suffix + ) + return ToolResult(content=[TextContent(type="text", text=candidate_text)])
116-123: Non-text content (images, embedded resources) is silently discarded without logging.When the response exceeds
max_sizeand contains mixed content types, onlyTextContentblocks are kept — everything else is dropped with no log entry or indication in the returned result. This could be surprising if a tool returns images alongside text.Consider logging the count/types of dropped content blocks so operators can diagnose unexpected data loss.
Suggested enhancement
texts = [b.text for b in result.content if isinstance(b, TextContent)] + dropped = [b for b in result.content if not isinstance(b, TextContent)] + if dropped: + logger.warning( + "Tool %r: dropping %d non-text content block(s) during truncation", + context.message.name, + len(dropped), + ) text = ( "\n\n".join(texts) if texts else serialized.decode("utf-8", errors="replace") )
52a4a78 to
63993aa
Compare
63993aa to
aacc150
Compare
Large tool responses can overwhelm LLM context windows or cause memory issues. This PR adds middleware to enforce configurable size limits on tool outputs, with intelligent handling of structured vs unstructured responses.
The middleware truncates text responses that exceed the limit while preserving UTF-8 character boundaries. For structured responses (tools with output_schema returning complex objects), it raises a ToolError since truncation would corrupt the schema. Both behaviors are configurable.
Key features:
toolsparametermetafield for monitoringraise_on_structuredandraise_on_unstructuredbehaviorCloses #2004