Skip to content

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens#37258

Merged
DarkLight1337 merged 3 commits intovllm-project:mainfrom
chaunceyjiang:response_required_max_tokens
Mar 17, 2026
Merged

[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens#37258
DarkLight1337 merged 3 commits intovllm-project:mainfrom
chaunceyjiang:response_required_max_tokens

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Mar 17, 2026

Purpose

follow up #36841

FIX https://buildkite.com/vllm/ci/builds/56537?group_by=test#019cf9dc-06da-4341-aa86-6e0d6cb06ec8


[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return await self.responses_full_generator(
--
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/responses/serving.py", line 711, in responses_full_generator
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     output = self._make_response_output_items(request, final_output, tokenizer)
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/responses/serving.py", line 904, in _make_response_output_items
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return parser.extract_response_outputs(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/parser/abstract_parser.py", line 325, in extract_response_outputs
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     tool_calls, content = self._parse_tool_calls(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]                           ^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/vllm/parser/abstract_parser.py", line 426, in _parse_tool_calls
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     tool_calls = TypeAdapter(list[FunctionDefinition]).validate_json(content)
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]   File "/usr/local/lib/python3.12/dist-packages/pydantic/type_adapter.py", line 492, in validate_json
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]     return self.validator.validate_json(
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2026-03-17T04:29:59Z] (APIServer pid=2510) ERROR 03-17 04:29:59 [server_utils.py:374] pydantic_core._pydantic_core.ValidationError: 1 validation error for list[function-wrap[__log_extra_fields__()]]



Test Plan

see e2e

Test gpt-5 with openai

response = client.responses.create(
    model="gpt-5",
    input=prompt,
    tools=tools,
    tool_choice="required",
    max_output_tokens=1,
)

Test Result

gpt-5

{
    "id": "resp_0c59c8e31591e43d0069b8f1e2a17c8190bffa061a344becb7",
    "created_at": 1773728226.0,
    "error": null,
    "incomplete_details": {
        "reason": "max_output_tokens"
    },
    "instructions": null,
    "metadata": {},
    "model": "gpt-5",
    "object": "response",
    "output": [
        {
            "id": "rs_0c59c8e31591e43d0069b8f1e3de108190b7204a20852a1dca",
            "summary": [],
            "type": "reasoning",
            "content": null,
            "encrypted_content": null,
            "status": null
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "required",
....
}

vllm

{
    "id": "resp_89d52120b02c63ff",
    "created_at": 1773728620.0,
    "error": null,
    "incomplete_details": {
        "reason": "max_output_tokens"
    },
    "instructions": null,
    "metadata": null,
    "model": "my-model",
    "object": "response",
    "output": [
        {
            "id": "rs_a1ff3b9137e892f3",
            "summary": [],
            "type": "reasoning",
            "content": [
                {
                    "text": "The",
                    "type": "reasoning_text"
                }
            ],
            "encrypted_content": null,
            "status": null
        }
    ],
    "parallel_tool_calls": true,
    "temperature": 1.0,
    "tool_choice": "required",

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…x_output_tokens

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify mergify bot added the bug Something isn't working label Mar 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash in the Responses API when tool_choice="required" and the generated output for the tool call exceeds max_output_tokens. The fix correctly handles potential ValidationError during JSON parsing of the model's output by suppressing the exception. This prevents the crash and ensures that if the tool call JSON is invalid or truncated, no tool call is returned, which is the desired behavior. A new test case is added to validate this fix, confirming that the system remains stable under these conditions.

…x_output_tokens

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…x_output_tokens

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@chaunceyjiang
Copy link
Copy Markdown
Collaborator Author

/cc @DarkLight1337 PTAL.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) March 17, 2026 07:00
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 17, 2026
@DarkLight1337 DarkLight1337 merged commit 132bfd4 into vllm-project:main Mar 17, 2026
47 checks passed
@chaunceyjiang chaunceyjiang deleted the response_required_max_tokens branch March 17, 2026 09:03
zhenwei-intel pushed a commit to zhenwei-intel/vllm that referenced this pull request Mar 17, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
andylolu2 pushed a commit to andylolu2/vllm that referenced this pull request Mar 18, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
JiantaoXu pushed a commit to JiantaoXu/vllm that referenced this pull request Mar 28, 2026
…x_output_tokens (vllm-project#37258)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants