Skip to content

Support ImageContent and AudioContent in sampling handlers#3550

Merged
jlowin merged 4 commits intomainfrom
feat/image-audio-sampling-handlers
Mar 18, 2026
Merged

Support ImageContent and AudioContent in sampling handlers#3550
jlowin merged 4 commits intomainfrom
feat/image-audio-sampling-handlers

Conversation

@jlowin
Copy link
Copy Markdown
Member

@jlowin jlowin commented Mar 18, 2026

The MCP specification explicitly includes ImageContent and AudioContent in sampling messages — a server can send an image to the client and ask the LLM to analyze it via sampling/createMessage. But all three sampling handlers (OpenAI, Anthropic, Google GenAI) only handled TextContent, silently dropping multimodal content in list contexts or raising opaque errors for single content.

Each handler now converts these content types to its provider's native format, with MIME type validation that fails early with clear messages rather than letting the provider API reject them later. Audio support varies by provider — OpenAI supports wav/mp3, Google GenAI accepts any audio MIME type, and Anthropic doesn't support audio at all (raises ValueError immediately).

from mcp.types import ImageContent, SamplingMessage

# MCP server can now include images in sampling requests
message = SamplingMessage(
    role="user",
    content=[
        TextContent(type="text", text="What do you see in this image?"),
        ImageContent(type="image", data=base64_data, mimeType="image/png"),
    ],
)

Closes #3544

@jlowin jlowin added the feature Major new functionality. Reserved for 2-4 significant PRs per release. Not for issues. label Mar 18, 2026
@marvin-context-protocol marvin-context-protocol Bot added enhancement Improvement to existing functionality. For issues and smaller PR improvements. client Related to the FastMCP client SDK or client-side functionality. labels Mar 18, 2026
@marvin-context-protocol
Copy link
Copy Markdown
Contributor

marvin-context-protocol Bot commented Mar 18, 2026

Test Failure Analysis

Summary: Two failures: (1) Tests with lowest-direct dependencies fails due to Base64ImageSourceParam not existing in the minimum declared anthropic>=0.40.0; (2) the Integration tests job has a flaky timeout in test_github_api_schema_performance (network download exceeded 10s in CI).

Root Cause:

  1. lowest-direct job — The PR adds Base64ImageSourceParam to imports in src/fastmcp/client/sampling/handlers/anthropic.py. This type was introduced in anthropic==0.48.0. When the suite runs against the minimum declared dependency (anthropic==0.40.0), the import fails and pytest cannot even collect test_anthropic_handler.py.

  2. Integration test timeoutTestOpenAPIPerformance::test_github_api_schema_performance downloads ~10MB from raw.githubusercontent.com and must complete in 10 seconds. The test itself notes CI can take 6–7s; this run hit the 10s wall. This is unrelated to the PR changes (network variability in GHA).

Suggested Solution:

For the real bug (#1), bump the minimum anthropic version in pyproject.toml:

# pyproject.toml
anthropic = ["anthropic>=0.48.0"]

The integration test timeout (#2) is a pre-existing flaky test. No action required from this PR.

Detailed Analysis

Failure 1: Tests with lowest-direct dependencies

The PR adds these imports inside the try block in src/fastmcp/client/sampling/handlers/anthropic.py:

from anthropic.types import (
    Base64ImageSourceParam,  # ← Added by this PR
    ImageBlockParam,
    ...
)

Log excerpt:

src/fastmcp/client/sampling/handlers/anthropic.py:25: in <module>
    from anthropic.types import (
ImportError: cannot import name 'Base64ImageSourceParam' from 'anthropic.types'
...
ERROR tests/client/sampling/handlers/test_anthropic_handler.py
======= 5001 passed, 1 skipped, 1 xfailed, 1 error in 132.69s

Base64ImageSourceParam first appears in anthropic==0.48.0; the current floor >=0.40.0 is too old.

Failure 2: Integration test timeout

FAILED tests/server/providers/openapi/test_openapi_performance.py::TestOpenAPIPerformance::test_github_api_schema_performance
- Failed: Timeout (>10.0s) from pytest-timeout.

The test hits raw.githubusercontent.com to download a ~10MB schema. The test docstring already acknowledges CI can take 6–7s; this run exceeded 10s due to network conditions. Not related to this PR.

Related Files
  • src/fastmcp/client/sampling/handlers/anthropic.py — imports Base64ImageSourceParam from anthropic.types
  • pyproject.toml — defines anthropic>=0.40.0 (needs bumping to >=0.48.0)
  • tests/client/sampling/handlers/test_anthropic_handler.py — fails to collect due to import error
  • tests/server/providers/openapi/test_openapi_performance.py — flaky network-dependent test (pre-existing)

Note: This comment was edited to reflect workflow run 23256501982. The fix for the lowest-direct failure is unchanged.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15196dabe6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 251 to 255
# When there are tool calls, content must be text-only
text_str = "\n".join(text_parts) or None
openai_messages.append(
ChatCompletionAssistantMessageParam(
role="assistant",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject assistant image/audio parts when tool_calls are present

If an assistant turn contains ToolUseContent plus ImageContent/AudioContent, this branch only serializes text_parts and never validates or forwards the non-text content_parts. The non-tool_calls path already raises for assistant multimodal content, and OpenAI's chat-completions assistant messages only accept text/refusal parts, so this now silently drops image/audio context instead of failing fast. That will change the conversation state for any mixed assistant turn and is very hard to diagnose.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in aa76c47. Both the tool_calls and non-tool_calls branches now raise ValueError for image/audio in assistant messages.

Comment on lines +202 to +206
if isinstance(content, ImageContent):
return Part(
inline_data=Blob(
data=base64.b64decode(content.data),
mime_type=content.mimeType,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate Gemini media MIME types before building inline_data

Gemini only accepts a fixed allowlist of image and audio MIME types, but this new conversion path passes content.mimeType straight through without any checks (the audio branch immediately below does the same). Inputs like image/bmp or common mobile recordings such as audio/m4a will now get all the way to generate_content() and fail with provider-side 400s instead of the early, clear ValueError the other handlers return for unsupported media.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gemini supports a much wider range of media types than OpenAI/Anthropic — a restrictive allowlist would reject valid inputs. The API validates and returns a clear error for unsupported types.

@jlowin jlowin force-pushed the feat/image-audio-sampling-handlers branch from 15196da to db9c1e2 Compare March 18, 2026 16:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa76c47704

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +301 to +305
if isinstance(content, ImageContent):
anthropic_messages.append(
MessageParam(
role=message.role,
content=[_image_content_to_anthropic_block(content)],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject assistant-side image blocks before calling Anthropic

_convert_to_anthropic_messages() now forwards ImageContent with role=message.role, but Anthropic's Messages API only accepts image blocks in user turns; assistant turns are limited to model text/tool-use content. If a sampling history includes an assistant ImageContent (or a list-content assistant turn with an image), this code will build an invalid payload and the next request fails with a provider-side 400 instead of a local validation error.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 2298e76. Both single-content and list-content paths now validate that ImageContent is only in user messages.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2298e7639a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +226 to +229
elif isinstance(item, ImageContent):
content_parts.append(_image_content_to_openai_part(item))
elif isinstance(item, AudioContent):
content_parts.append(_audio_content_to_openai_part(item))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject non-text media in OpenAI tool results

This change makes ImageContent/AudioContent valid elsewhere in _convert_to_openai_messages(), but both ToolResultContent branches still build role="tool" messages from TextContent only. When an MCP tool returns content=[ImageContent(...)] or AudioContent(...), the payload is silently stripped and OpenAI receives an empty tool result, so the model never sees the screenshot/audio it asked for. OpenAI's chat-completions docs say tool messages only support text parts, so this path should fail fast instead of dropping the media.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing limitation — ToolResultContent has always only extracted TextContent. This PR adds multimodal support for messages, not tool results. Valid follow-up but out of scope here.

Comment on lines +204 to +208
elif isinstance(item, ImageContent):
if message.role != "user":
raise ValueError(
"ImageContent is only supported in user messages "
"for Anthropic"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve images inside Anthropic tool_result blocks

Anthropic's tool-use API allows tool_result.content to contain nested text or image blocks, but _convert_to_anthropic_messages() still serializes ToolResultContent.content by collecting only TextContent. If a FastMCP tool returns content=[ImageContent(...)], Claude will receive an empty tool result instead of the image it requested. Since this commit adds ImageContent support for user messages, the same media needs to be forwarded (or explicitly rejected) in ToolResultContent as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same — pre-existing, unchanged by this PR. Anthropic does support images in tool_result blocks so it's a valid follow-up.

Comment on lines +202 to +206
if isinstance(content, ImageContent):
return Part(
inline_data=Blob(
data=base64.b64decode(content.data),
mime_type=content.mimeType,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle media blocks in Gemini function responses too

The new ImageContent/AudioContent branches only cover top-level message parts. _sampling_content_to_google_genai_part() still raises from its ToolResultContent branch on any non-TextContent, so a tool returning ImageContent or AudioContent will still fail during sampling. Gemini's FunctionResponsePart supports inlineData blobs, so this leaves the new multimodal support incomplete for the common tool-result path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same — pre-existing, unchanged by this PR. Multimodal tool results are a valid follow-up.

@jlowin jlowin merged commit 734b93b into main Mar 18, 2026
7 of 15 checks passed
@jlowin jlowin deleted the feat/image-audio-sampling-handlers branch March 18, 2026 19:25
@jlowin jlowin removed the feature Major new functionality. Reserved for 2-4 significant PRs per release. Not for issues. label Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

client Related to the FastMCP client SDK or client-side functionality. enhancement Improvement to existing functionality. For issues and smaller PR improvements.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support ImageContent and AudioContent in OpenAISamplingHandler

1 participant