Skip to content

Conversation

PratyushNag
Copy link

@PratyushNag PratyushNag commented May 19, 2025

Output Schema Support with Semantic Metadata Enhancement

This PR adds automatic JSON Schema generation for tool return types with intelligent semantic metadata enhancement, enabling LLMs and client applications to understand both the structure and semantic meaning of data they'll receive from tools.

Motivation and Context

  • Previously, tools only had input schemas but no structured metadata about their outputs
  • This helps LLMs better understand the structure of data they'll receive from tools
  • Particularly valuable for complex return types like Pydantic models and nested structures
  • Improves the developer experience by making tools more self-documenting
  • NEW: Adds semantic metadata enhancement that automatically detects semantic meaning from field names
  • NEW: Enables client applications to provide intelligent UI rendering and formatting based on semantic types
  • NEW: Supports 13+ semantic types including URLs, emails, datetime fields, media formats, currencies, and more

Implementation Details

Core Schema Enhancement System

  • New Module: python-sdk/src/mcp/server/fastmcp/utilities/schema.py
    • detect_semantic_format(): Analyzes field names and types to detect semantic meaning
    • enhance_output_schema(): Embeds semantic metadata within JSON Schema properties
    • Maintains full JSON Schema compliance while adding optional semantic information

Supported Semantic Types

  • Communication: email, url (including uri, link, href variations)
  • DateTime: datetime with subtypes (date_only, time_only, datetime)
  • Media: audio, video, image with format detection (audio_file, video_file, image_file)
  • System: file_path, identifier (id, uuid, guid), status, color
  • Numeric: currency, percentage (validated for numeric types only)

Automatic Integration

  • Tool Creation: Integrated into tools/base.py via Tool.from_function()
  • Zero Configuration: Works automatically with existing tools
  • Backward Compatible: All existing functionality preserved

How Has This Been Tested?

Comprehensive Test Suite (29 Total Tests)

  • Unit Tests (test_schema_utilities.py): 21 tests covering all semantic detection scenarios
    • Semantic type detection with positive/negative cases
    • Schema structure preservation and edge case handling
    • Media format detection, datetime subtyping, numeric validation
  • Integration Tests (test_tool_manager.py): 8 tests in TestOutputSchema class
    • 3 new tests for advanced semantic enhancement scenarios
    • 5 updated existing tests to verify semantic enhancement while maintaining backward compatibility
  • End-to-End Testing: Verified integration with FastMCP tool listing
  • Example Application: Tested with enhanced output_schema_demo.py

Schema Enhancement Examples

Before and After Transformations

// Basic field enhancement
// Before
{"type": "string", "title": "Email"}

// After
{"type": "string", "title": "Email", "semantic_type": "email"}

// DateTime with subtyping
// Before
{"type": "string", "title": "Created Date"}

// After
{"type": "string", "title": "Created Date", "semantic_type": "datetime", "datetime_type": "date_only"}

// Media with format detection
// Before
{"type": "string", "title": "Audio Mp3"}

// After
{"type": "string", "title": "Audio Mp3", "semantic_type": "audio", "media_format": "audio_file"}

// Numeric semantic types
// Before
{"type": "number", "title": "Account Balance"}

// After
{"type": "number", "title": "Account Balance", "semantic_type": "currency"}

Client Application Benefits

  • Smart UI Rendering: Email fields can show mail icons, URLs become clickable links
  • Format Validation: Date fields get date pickers, currency fields show proper formatting
  • Media Handling: Audio/video fields can show appropriate players or thumbnails
  • Enhanced UX: Status fields can show colored indicators, percentages get progress bars

Breaking Changes

  • Fully Non-Breaking: Adds functionality without modifying existing behavior
  • Automatic Enhancement: Existing tools automatically gain semantic metadata
  • Zero Migration: No changes required to existing code
  • Backward Compatible: All original schema properties preserved

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional Context and Implementation Notes

Core Architecture

  • Type Extraction: Uses Python's type annotations and inspect module to extract return type information
  • Schema Generation: Maps primitive types directly to JSON Schema, leverages Pydantic for complex models
  • Enhancement Pipeline: Automatically applies semantic detection during tool registration
  • Graceful Fallback: Handles cases where schema generation isn't possible, falling back to None

Design Decisions

  • Embedded Metadata: Semantic information is embedded directly within JSON Schema properties rather than separate top-level fields, maintaining JSON Schema compliance
  • Case-Insensitive Detection: Field name pattern matching works regardless of case (email, EMAIL, Email)
  • Type-Specific Validation: Currency and percentage detection only applies to numeric types
  • Required Field Removal: Output schemas automatically remove 'required' fields since tools always return complete objects
  • Extensible Architecture: New semantic types can be easily added to the detection system

Performance Considerations

  • Minimal Overhead: Schema enhancement occurs only during tool registration, not during execution
  • Caching: Enhanced schemas are cached as part of the tool definition
  • Lazy Evaluation: Enhancement only runs when output schemas are successfully generated

Future Extensibility

  • Plugin Architecture: Detection patterns can be extended for domain-specific semantic types
  • Configuration Options: Future versions could allow customization of detection rules
  • Nested Enhancement: Current implementation focuses on top-level properties, with potential for nested object enhancement

Fixes #754

@ihrpr ihrpr added this to the r-06-25 milestone May 23, 2025
@PratyushNag
Copy link
Author

The failing test seems a bit flaky to me, could someone please rerun the failing test? I think it might pass on retry

@felixweinberger felixweinberger added needs SEP Major changes to structs and features generally require a SEP to be approved. needs more work Not ready to be merged yet, needs additional changes. labels Sep 5, 2025
@felixweinberger
Copy link
Contributor

Hi @PratyushNag, thank you for your contribution!

Apologies for the time it took to get back to this - this looks like a significant enhancement that should likely go through a SEP review to ensure we maintain parity between SDKs and only take on new features in the protocol that we have the capacity to maintain actively.

If you wanted some lightweight feedback on your proposal first before investing time in a SEP, you could make a post and link to this PR in the community discord so the maintainer group could take a look for some directional feedback?

Copy link
Contributor

@felixweinberger felixweinberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for now to put in your queue for a response on SEPs - feel free to ask questions if anything is unclear on next steps here.

@felixweinberger
Copy link
Contributor

Closing this one for now as it's been a while - we're currently going through older PRs to ensure we prioritize limited maintainer time appropriately.

@PratyushNag
Copy link
Author

Apologies for the delay @felixweinberger would it be possible to go through a SEP review now?
I was a bit confused about how i should go on about submitting an SEP and what would be the appropriate location
Should i create a new github Issue and tag this there?

@felixweinberger
Copy link
Contributor

felixweinberger commented Sep 26, 2025

Apologies for the delay @felixweinberger would it be possible to go through a SEP review now? I was a bit confused about how i should go on about submitting an SEP and what would be the appropriate location Should i create a new github Issue and tag this there?

Hi @PratyushNag no worries - yes, if you just put up an issue on this repo first that would be a good start - this is a major refactor and change, so it's worth getting some feedback first. Maintainers only have limited resources to review all the inbound, so for large changes like this it's best to get agreement first before submitting large changes like this one.

I'd recommend for now to limit your time used to the absolute core changes you're proposing and motivation for why it's valuable to have this feature in the protocol without heavy implementation details. You can link to this PR here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs more work Not ready to be merged yet, needs additional changes. needs SEP Major changes to structs and features generally require a SEP to be approved.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Functionality to fetch output schemas as well on list_tools()

3 participants