[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841
[Bugfix] Fix crash when tool_choice=required exceeds max_tokens#36841vllm-bot merged 5 commits intovllm-project:mainfrom
Conversation
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a crash that occurs when tool_choice="required" is used with a max_tokens value too small for the tool call generation. The fix correctly handles potential JSON parsing errors from truncated model outputs by using a try...except block (via contextlib.suppress) and removing a problematic assertion. The changes in vllm/entrypoints/openai/engine/serving.py and vllm/entrypoints/openai/chat_completion/serving.py are sound. However, the new test case intended to verify this fix appears to be flawed, as it uses conflicting parameters that prevent it from testing the intended scenario. I've provided a suggestion to correct the test.
|
Hi @chaunceyjiang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
Hi @chaunceyjiang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
alvinttang
left a comment
There was a problem hiding this comment.
The fix correctly addresses the crash by replacing the hard with a graceful fallback — is the right idiom here. However, the change in uses which silently swallows any JSON parse errors other than a validation failure (e.g., malformed JSON from the model); it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible. The added test is good but simultaneously passes both and — it would be cleaner to test just the path in isolation to make the failure mode unambiguous. Overall the core fix is sound and the behavior now matches OpenAI semantics.
alvinttang
left a comment
There was a problem hiding this comment.
The fix correctly addresses the crash by replacing the hard assert with a graceful fallback — tool_calls or [] is the right idiom here. However, the change in engine/serving.py uses contextlib.suppress(ValidationError) which silently swallows any JSON parse errors; it might be worth logging a debug/warning in the suppressed block so these failures aren't invisible in production. The added test simultaneously passes both max_tokens=5 and max_completion_tokens=150, which makes the failure mode ambiguous — testing just the max_tokens path in isolation would be cleaner. Overall the core fix is sound and the behavior now matches OpenAI semantics for this edge case.
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
| ) | ||
| # When `tool_choice="required"` and the tokens of `tools` exceed `max_tokens`, | ||
| # both `tool_calls` and `content` should be empty. | ||
| # This behavior should be consistent with OpenAI. |
There was a problem hiding this comment.
Have you confirmed that OpenAI does this as well?
There was a problem hiding this comment.
chat_completion = client.chat.completions.create(
messages=messages,
model="gpt-5",
tools=tools,
tool_choice="required",
max_completion_tokens=10,
)
print(chat_completion)
ChatCompletion(id='chatcmpl-DIU1M3ic0iPTxtnxit59rWDxUpaEH', choices=[Choice(finish_reason='length', index=0, logprobs=None, message=ChatCompletionMessage(content='', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1773297678, model='gpt-5', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=340, total_tokens=350, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=10, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0), input_tokens=0, output_tokens=0, input_tokens_details=None))
There was a problem hiding this comment.
Have you confirmed that OpenAI does this as well?
This is the result from my test with GPT-5.
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…-project#36841) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
Purpose
FIX #36794
Test Plan
see e2e
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.