Skip to content

[GPT-OSS] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API#24158

Closed
JasonZhu1313 wants to merge 3 commits intovllm-project:mainfrom
JasonZhu1313:jaszhu/pydantic_deserialization_fix
Closed

[GPT-OSS] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API#24158
JasonZhu1313 wants to merge 3 commits intovllm-project:mainfrom
JasonZhu1313:jaszhu/pydantic_deserialization_fix

Conversation

@JasonZhu1313
Copy link
Contributor

@JasonZhu1313 JasonZhu1313 commented Sep 3, 2025

Purpose

When using GPT OSS with tool calling, the vLLM API requires the messages input to be a list containing a mix of message types:

  • Regular user/assistant messages in dictionary format:
{"role": "user", "content": "What is the weather in Tokyo?"}
  • Structured objects such as ResponseFunctionToolCall and ResponseReasoningItem, which represent tool call content and reasoning content.

These objects are serialized into JSON for the HTTP request and then deserialized into pydantic objects on the server side as the request hits FastAPI. During this serialization/deserialization cycle, I observed that ResponseFunctionToolCall is being dropped down to a plain dictionary, losing its type information in the request payload.

This causes downstream errors because:

  • The code later checks for messages specifically of type ResponseFunctionToolCall. code pointer

  • Tool call IDs must be tracked across messages (tool output IDs must match their corresponding tool call IDs).

  • If the type is lost, ID matching fails, leading to an exception being raised: raise ValueError(f"No call message found for {call_id}")

Issues

Input messages prepared for sending to server with the correct type :

Input messages: [{'role': 'user', 'content': 'What is the weather now in Tokyo?'}, ResponseReasoningItem(id='rs_490563de9bb0413ea31a9fd737a45a0c', summary=[], type='reasoning', content=[Content(text='User asks for weather in Tokyo. I need to use function get_weather.', type='reasoning_text')], encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"location": "Tokyo"}', call_id='call_bb25f8057add4af4ab507d8aefb9601f', name='get_weather', type='function_call', id=None, status=None), {'type': 'function_call_output', 'call_id': 'call_bb25f8057add4af4ab507d8aefb9601f', 'output': "It's sunny."}]

As soon as it reaches to the server, the logging of ResponsesRequest shows the type information is dropped for ResponseFunctionToolCall, and it becomes a regular dict:

ResponsesRequest(background=False, include=None, input=[{'content': 'What is the weather now in Tokyo?', 'role': 'user'}, ResponseReasoningItem(id='rs_f5baa78573c94a92b18cd811baead509', content=[ResponseReasoningTextContent(text='The user asks "What is the weather now in Tokyo?" We should use the get_weather function.', type='reasoning_text')], summary=[], type='reasoning', encrypted_content=None, status=None), {'arguments': '{"location": "Tokyo"}', 'call_id': 'call_7410db2534124acdbbaa91a120945a55', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_7410db2534124acdbbaa91a120945a55'}, {'call_id': 'call_7410db2534124acdbbaa91a120945a55', 'output': "It's sunny.", 'type': 'function_call_output'}], instructions=None, max_output_tokens=None, max_tool_calls=None, metadata=None, model='/shared/public/elr-models/openai/gpt-oss-20b/6cd4d0ffba39483fe4fb0f5637831f717dafca35', parallel_tool_calls=True, previous_response_id=None, prompt=None, reasoning=None, service_tier='auto', store=True, stream=False, temperature=None, text=None, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The location to get weather for'}}, 'required': ['location'], 'additionalProperties': False}, strict=True, type='function', description='Get the weather at a location.')], top_logprobs=0, top_p=None, truncation='disabled', user=None, request_id='resp_35abb850516e42e28b3201cb558404b5', mm_processor_kwargs=None, priority=0, cache_salt=None)

Which will cause the downstream code fail!!

Root Cause

The issue occurs during Pydantic union type resolution in ResponseInputOutputItem. When validating function call data, Pydantic matches ResponseFunctionToolCallParam (a TypedDict within the broad ResponseInputItemParam union) before reaching ResponseFunctionToolCall (the specific Pydantic model).

The union type resolution order:

  1. ResponseInputItemParam (contains ResponseFunctionToolCallParam TypedDict) ✅ Matches first
  2. ResponseReasoningItem
  3. ResponseFunctionToolCallNever reached

Since TypedDict validation is more permissive than Pydantic model validation, function calls are parsed as plain dictionaries instead of ResponseFunctionToolCall objects.

Solution

Add a pre-processing validator to ResponsesRequest that explicitly converts function call dictionaries to ResponseFunctionToolCall objects before Pydantic's union resolution.

Test Plan

A simple script to reproduce the issue:


from openai.types.responses.response_function_tool_call import ResponseFunctionToolCall
from vllm.entrypoints.openai.protocol import ResponsesRequest

# Test the exact data from your logs
test_data = {
    'arguments': '{\"location\": \"Tokyo\"}',
    'call_id': 'call_a90e65e81cd84553855dde1733e77749',
    'name': 'get_weather',
    'type': 'function_call',
    'id': 'ft_a90e65e81cd84553855dde1733e77749',
    'status': 'completed'
}

print('Testing function call dict:')
print(test_data)
print()

# Test with ResponsesRequest
request_data = {
    'input': [test_data],
    'model': 'test'
}

request = ResponsesRequest(**request_data)
parsed = request.input[0]

print('Parsed result:')
print(f'Type: {type(parsed)}')
print(f'Value: {parsed}')
print(f'Is ResponseFunctionToolCall: {isinstance(parsed, ResponseFunctionToolCall)}')

Test Result

Same code works after the fix, log shows the type is preserved:


create_responses api_server: background=False include=None input=[{'content': 'What is the weather now in Tokyo?', 'role': 'user'}, ResponseReasoningItem(id='rs_c112b1030bb04067aba7d2ddb19be5ed', content=[ResponseReasoningTextContent(text='User asks for current weather in Tokyo. We have a function "get_weather" that returns current temperature. We need to call that.', type='reasoning_text')], summary=[], type='reasoning', encrypted_content=None, status=None), ResponseFunctionToolCall(arguments='{"location": "Tokyo"}', call_id='call_2362dba4d733467998a83a142047cafc', name='get_weather', type='function_call', id='ft_2362dba4d733467998a83a142047cafc', status='completed'), {'call_id': 'call_2362dba4d733467998a83a142047cafc', 'output': "It's sunny.", 'type': 'function_call_output'}] instructions=None max_output_tokens=None max_tool_calls=None metadata=None model='/shared/public/elr-models/openai/gpt-oss-20b/6cd4d0ffba39483fe4fb0f5637831f717dafca35' parallel_tool_calls=True previous_response_id=None prompt=None reasoning=None service_tier='auto' store=True stream=False temperature=None text=None tool_choice='auto' tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The location to get weather for'}}, 'required': ['location'], 'additionalProperties': False}, strict=True, type='function', description='Get current temperature for provided coordinates in celsius.')] top_logprobs=0 top_p=None truncation='disabled' user=None request_id='resp_070bc21e77b34e03b3fb67c9c2dfe5cc' mm_processor_kwargs=None priority=0 cache_salt=None

Final result:

Response: Response(id='resp_239fe88f23834698969024bbf49e7c5e', created_at=1756883917.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/shared/public/elr-models/openai/gpt-oss-20b/6cd4d0ffba39483fe4fb0f5637831f717dafca35', object='response', output=[ResponseReasoningItem(id='rs_1198ca88f0564c348c2cdf0e26e3a062', summary=[], type='reasoning', content=[Content(text='We have the answer: It\'s sunny. But maybe provide temperature as well? The function might return only "It\'s sunny." So answer accordingly.', type='reasoning_text')], encrypted_content=None, status=None), ResponseOutputMessage(id='msg_2602171dc0bf4019866f5c5785075a95', content=[ResponseOutputText(annotations=[], text='The current weather in Tokyo is **sunny**.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[FunctionTool(name='get_weather', parameters={'type': 'object', 'properties': {'location': {'type': 'string', 'description': 'The location to get weather for'}}, 'required': ['location'], 'additionalProperties': False}, strict=True, type='function', description='Get current temperature for provided coordinates in celsius.')], top_p=1.0, background=False, conversation=None, max_output_tokens=130895, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=0, truncation='disabled', usage=ResponseUsage(input_tokens=0, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=0, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=0), user=None)
================================== Ai Message ==================================

The current weather in Tokyo is **sunny**.


Fun test script before the fix:

INFO 09-03 07:00:31 [__init__.py:241] Automatically detected platform cuda.
Testing function call dict:
{'arguments': '{"location": "Tokyo"}', 'call_id': 'call_a90e65e81cd84553855dde1733e77749', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_a90e65e81cd84553855dde1733e77749', 'status': 'completed'}

Parsed result:
Type: <class 'dict'>
Value: {'arguments': '{"location": "Tokyo"}', 'call_id': 'call_a90e65e81cd84553855dde1733e77749', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_a90e65e81cd84553855dde1733e77749', 'status': 'completed'}
Is ResponseFunctionToolCall: False

Fun test script after the fix:

INFO 09-03 07:00:12 [__init__.py:241] Automatically detected platform cuda.
Testing function call dict:
{'arguments': '{"location": "Tokyo"}', 'call_id': 'call_a90e65e81cd84553855dde1733e77749', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_a90e65e81cd84553855dde1733e77749', 'status': 'completed'}

Parsed result:
Type: <class 'openai.types.responses.response_function_tool_call.ResponseFunctionToolCall'>
Value: ResponseFunctionToolCall(arguments='{"location": "Tokyo"}', call_id='call_a90e65e81cd84553855dde1733e77749', name='get_weather', type='function_call', id='ft_a90e65e81cd84553855dde1733e77749', status='completed')
Is ResponseFunctionToolCall: True


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added frontend gpt-oss Related to GPT-OSS models labels Sep 3, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly identifies and fixes a Pydantic union resolution issue for ResponseFunctionToolCall by introducing a pre-processing validator. The fix is well-described and the test plan is clear. My review includes a suggestion to improve the implementation of the new validator for better performance and robustness by addressing an inefficient import and overly broad exception handling.

Comment on lines 406 to 427
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The implementation of function_call_parsing can be improved for efficiency and robustness.

  1. Inefficient Import: The from openai.types.responses.response_function_tool_call import ResponseFunctionToolCall is inside a loop. Since ResponseFunctionToolCall is already imported at the top of the file (line 21), this local import is redundant and should be removed.
  2. Broad Exception Handling: Using except Exception: is too broad as it can catch and silence unexpected errors, making debugging harder. It's better to catch specific exceptions like pydantic.ValidationError and TypeError which are expected from ResponseFunctionToolCall(**item).

Here is a refactored version of the function that addresses these points and improves readability. Note that ValidationError will need to be imported from pydantic (e.g., by adding it to the import on line 36).

def function_call_parsing(cls, data):
    """Function call parsing to ensure ResponseFunctionToolCall objects are created."""
    from pydantic import ValidationError

    input_data = data.get("input")
    if not isinstance(input_data, list):
        return data

    new_input = []
    for item in input_data:
        if isinstance(item, dict) and item.get("type") == "function_call":
            try:
                # ResponseFunctionToolCall is already imported at the top.
                new_input.append(ResponseFunctionToolCall(**item))
            except (ValidationError, TypeError):
                # If conversion fails, keep the original dict.
                new_input.append(item)
        else:
            new_input.append(item)
    data["input"] = new_input
    return data

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing, can you add a unit test to avoid regressions?

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the exploration. Can we fix it by changing the order of the Union?

ResponseInputOutputItem: TypeAlias = Union[ResponseInputItemParam, 
                                           ResponseFunctionToolCall
                                           ResponseReasoningItem,
                                          ]

@JasonZhu1313
Copy link
Contributor Author

Thanks for the exploration. Can we fix it by changing the order of the Union?

ResponseInputOutputItem: TypeAlias = Union[ResponseInputItemParam, 
                                           ResponseFunctionToolCall
                                           ResponseReasoningItem,
                                          ]

jobuser [ ~ ]$ python /home/jobuser/test_fix/test_changes.py
INFO 09-05 06:00:47 [init.py:241] Automatically detected platform cuda.
Testing function call dict:
{'arguments': '{"location": "Tokyo"}', 'call_id': 'call_a90e65e81cd84553855dde1733e77749', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_a90e65e81cd84553855dde1733e77749', 'status': 'completed'}

Parsed result:
Type: <class 'dict'>
Value: {'arguments': '{"location": "Tokyo"}', 'call_id': 'call_a90e65e81cd84553855dde1733e77749', 'name': 'get_weather', 'type': 'function_call', 'id': 'ft_a90e65e81cd84553855dde1733e77749', 'status': 'completed'}
Is ResponseFunctionToolCall: False

I tried and it doesn't work. Seems explicit validation is needed

@JasonZhu1313
Copy link
Contributor Author

Thanks for fixing, can you add a unit test to avoid regressions?

Added a unit test, thanks for reminding

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 5, 2025 06:42
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 5, 2025
auto-merge was automatically disabled September 6, 2025 00:39

Head branch was pushed to by a user without write access

@JasonZhu1313 JasonZhu1313 force-pushed the jaszhu/pydantic_deserialization_fix branch from b62ed35 to 0a0faac Compare September 7, 2025 16:50
@mergify mergify bot added the tpu Related to Google TPUs label Sep 8, 2025
@mergify
Copy link

mergify bot commented Sep 8, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @JasonZhu1313.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Sep 8, 2025
@heheda12345
Copy link
Collaborator

Can you revert unrelated changes?

Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>
@JasonZhu1313 JasonZhu1313 force-pushed the jaszhu/pydantic_deserialization_fix branch from 599b98a to efaa967 Compare September 8, 2025 21:56
@mergify mergify bot removed tpu Related to Google TPUs needs-rebase labels Sep 8, 2025
@JasonZhu1313
Copy link
Contributor Author

Can you revert unrelated changes?

Reverted, I was following the suggestions from DCO and got this.

Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com>
@JasonZhu1313
Copy link
Contributor Author

  • Tests aren't run in CI
  • Please fix pre-commit

[2025-09-08T22:29:13Z] -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
--
  | [2025-09-08T22:29:13Z] =========================== short test summary info ============================
  | [2025-09-08T22:29:13Z] FAILED v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[Qwen/Qwen2.5-1.5B-Instruct-lm-format-enforcer-auto-None] - json.decoder.JSONDecodeError: Invalid control character at: line 7 column 67 (char 80)
  | [2025-09-08T22:29:13Z] !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
  | [2025-09-08T22:29:13Z] ============= 1 failed, 5 passed, 11 warnings in 328.83s (0:05:28) =============
  | [2025-09-08T22:29:13Z] (EngineCore_0 pid=934) DEBUG 09-08 15:29:13 [core.py:714] EngineCore exiting.
  | [2025-09-08T22:29:13Z] ERROR 09-08 15:29:13 [core_client.py:562] Engine core proc EngineCore_0 died unexpectedly, shutting down client.
  | [2025-09-08T22:29:16Z] 🚨 Error: The command exited with status 1

Fixed pre-commit and I think this might not relate to my PR correct me if I am wrong.

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @chaunceyjiang wrt #20874 compatibility.

should we just parse the dict into response tool instead? below

parse_response_input(response_msg, prev_outputs))

def function_call_parsing(cls, data):
"""Function call parsing to ensure ResponseFunctionToolCall objects
are created."""
input_data = data.get("input")
Copy link
Collaborator

@aarnphm aarnphm Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused this seems to live under ChatCompletionRequest, but the PR seems to be only Responses related.

I haven't looked at the responses code yet, but do we use this object in the responses API as well?

Copy link
Contributor Author

@JasonZhu1313 JasonZhu1313 Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last week tool calling was not supported so I have to use the response API which supports tool call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c29fb54 Looks like it's supported

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great improvement.

I also encountered this issue here: https://github.com/vllm-project/vllm/pull/20874/files#diff-31e6bd0df09a47b5587701203d558701ac46e4f85bf7db83632da9990eaef198R1546.
I wanted to use isinstance(input, ResponseFunctionToolCall) for the check, but since input is a dict, isinstance cannot be used — I had to resort to input.get().

However, I’m confused: why is function_call_parsing located under ChatCompletionRequest rather than ResponsesRequest? Shouldn’t it belong to ChatCompletionRequest?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonZhu1313 could your full client code. this change doesn't seem to impact. why do we depend on ChatCompletionRequest?

@aarnphm aarnphm requested a review from hmellor September 9, 2025 00:41
'{"query": "weather", "filters": ["temperature", "humidity"], "count": 5}',
'{"complex": {"nested": {"data": true}}, "array": [1, 2, 3]}'
])
def test_function_call_with_complex_arguments(arguments):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to https://platform.openai.com/docs/guides/function-calling?api-mode=responses

FunctionCallOutput will also be used as input. Could you also add a test for FunctionCallOutput?

@heheda12345
Copy link
Collaborator

Also CC @yeqcharlotte

def function_call_parsing(cls, data):
"""Function call parsing to ensure ResponseFunctionToolCall objects
are created."""
input_data = data.get("input")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JasonZhu1313 could your full client code. this change doesn't seem to impact. why do we depend on ChatCompletionRequest?

@strinczer
Copy link
Contributor

Also hit this and would love a fix for it. @JasonZhu1313 are you still looking into this?

@strinczer
Copy link
Contributor

I reached out to @JasonZhu1313 who said he did not have to finish the PR. Opened a new one and addressed the comments from this PR - #26706.

Would appreciate a review on that to unblock function calling using the responses API

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

@github-actions github-actions bot added the stale Over 90 days of inactivity label Jan 12, 2026
@github-actions
Copy link

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

@github-actions github-actions bot closed this Feb 11, 2026
@github-project-automation github-project-automation bot moved this from In progress to Done in gpt-oss Issues & Enhancements Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build deepseek Related to DeepSeek models documentation Improvements or additions to documentation frontend gpt-oss Related to GPT-OSS models llama Related to Llama models multi-modality Related to multi-modality (#4194) new-model Requests to new models performance Performance-related issues qwen Related to Qwen models ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm speculative-decoding stale Over 90 days of inactivity tool-calling v1

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants