Skip to content

[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841

Merged
vllm-bot merged 5 commits intovllm-project:mainfrom
chaunceyjiang:required_max_tokens
Mar 12, 2026
Merged

[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841
vllm-bot merged 5 commits intovllm-project:mainfrom
chaunceyjiang:required_max_tokens

Conversation

@chaunceyjiang
Copy link
Copy Markdown
Collaborator

@chaunceyjiang chaunceyjiang commented Mar 12, 2026

Purpose

FIX #36794

Test Plan

see e2e

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify mergify bot added frontend tool-calling bug Something isn't working labels Mar 12, 2026
@chaunceyjiang chaunceyjiang marked this pull request as ready for review March 12, 2026 03:20
@chaunceyjiang chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 12, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a crash that occurs when tool_choice="required" is used with a max_tokens value too small for the tool call generation. The fix correctly handles potential JSON parsing errors from truncated model outputs by using a try...except block (via contextlib.suppress) and removing a problematic assertion. The changes in vllm/entrypoints/openai/engine/serving.py and vllm/entrypoints/openai/chat_completion/serving.py are sound. However, the new test case intended to verify this fix appears to be flawed, as it uses conflicting parameters that prevent it from testing the intended scenario. I've provided a suggestion to correct the test.

@mergify
Copy link
Copy Markdown

mergify bot commented Mar 12, 2026

Hi @chaunceyjiang, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
@mergify
Copy link
Copy Markdown

mergify bot commented Mar 12, 2026

Hi @chaunceyjiang, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Copy link
Copy Markdown
Contributor

@alvinttang alvinttang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix correctly addresses the crash by replacing the hard with a graceful fallback — is the right idiom here. However, the change in uses which silently swallows any JSON parse errors other than a validation failure (e.g., malformed JSON from the model); it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible. The added test is good but simultaneously passes both and — it would be cleaner to test just the path in isolation to make the failure mode unambiguous. Overall the core fix is sound and the behavior now matches OpenAI semantics.

Copy link
Copy Markdown
Contributor

@alvinttang alvinttang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix correctly addresses the crash by replacing the hard assert with a graceful fallback — tool_calls or [] is the right idiom here. However, the change in engine/serving.py uses contextlib.suppress(ValidationError) which silently swallows any JSON parse errors; it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible in production. The added test simultaneously passes both max_tokens=5 and max_completion_tokens=150, which makes the failure mode ambiguous — testing just the max_tokens path in isolation would be cleaner. Overall the core fix is sound and the behavior now matches OpenAI semantics for this edge case.

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
)
# When `tool_choice="required"` and the tokens of `tools` exceed `max_tokens`,
# both `tool_calls` and `content` should be empty.
# This behavior should be consistent with OpenAI.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed that OpenAI does this as well?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chat_completion = client.chat.completions.create(
    messages=messages,
    model="gpt-5",
    tools=tools,
    tool_choice="required",
    max_completion_tokens=10,
)
print(chat_completion)


ChatCompletion(id='chatcmpl-DIU1M3ic0iPTxtnxit59rWDxUpaEH', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1773297678, model='gpt-5', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=340, total_tokens=350, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=10, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed that OpenAI does this as well?

This is the result from my test with GPT-5.

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Monishver11 pushed a commit to Monishver11/vllm that referenced this pull request Mar 27, 2026
…-project#36841)

Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend ready ONLY add when PR is ready to merge/full CI is needed tool-calling

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Improve Error Message When Max Length Reached With Tool Call=Required

4 participants