[bugfix] Fix online serving crash when text type response_format is received by cjackal · Pull Request #26822 · vllm-project/vllm

cjackal · 2025-10-14T16:28:53Z

Purpose

Especially, this PR adds a proper input validation to StructuredOutpusParams not to generate unschedulable structured outputs params, and adjust sampling parameter generation logic in ChatCompletionRequest.to_sampling_parameters() and OpenAIServingResponses.create_responses() to reflect the change and be more robust.

co-authored by @j0shuajun who first reported the issue and minimal reproducible examples on our side.

Test Plan

Pass a chat completion request with "response_format": {"type": "text"}:

curl -XPOST http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"openai/gpt-oss-120b","messages":[{"role":"user","content":"hello"}],"max_tokens":2048,"stream":false,"response_format":{"type":"text"}}'

Plus, pass a chat completion request with "response_format": {"type": "json_object"} to check for regressions.

curl -XPOST http://localhost:8080/v1/chat/completions -H 'Content-Type: application/json' -d '{"model":"openai/gpt-oss-120b","messages":[{"role":"user","content":"hello"}],"max_tokens":2048,"stream":false,"response_format":{"type":"json_object"}}'

Test Result

Return a valid response in both cases.

# response_format text - respond with normal text
{"id":"chatcmpl-b6a22dfca265460ab316d7ea490d974b","object":"chat.completion","created":1760457543,"model":"openai/gpt-oss-120b","choices":[{"index":0,"messages":{"role":"assistant","content":"Hello! How can I assist you today?","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":"We need to respond: greeting. No instructions."},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":70,"total_tokens":99,"completion_tokens":29,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_tokens_ids":null,"kv_transfer_params":null}

# response_format json_object - respond with JSON object
{"id":"chatcmpl-9643189ba2674cc2b471bd9a63472e
69","object":"chat.completion","created":1760457566,"model":"openai/gpt-oss-120b","choices":[{"index":0,"messages":{"role":"assistant","content":"{\"response\":\"Hello! How can I assist you today?\"}","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":"The user says \"hello\". We just respond politely."},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":70,"total_tokens":104,"completion_tokens":34,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_tokens_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results

gemini-code-assist

Code Review

This pull request addresses a crash in the online serving endpoint when a chat completion request includes response_format: {"type": "text"}. The fix correctly prevents the creation of an invalid, empty StructuredOutputsParams object by adding stricter validation and refining the parameter handling logic. The changes are well-implemented. However, I've identified a critical regression where the new validation can cause crashes for requests using certain deprecated parameters. A fix is suggested in the detailed comment.

gemini-code-assist · 2025-10-14T16:30:35Z

vllm/sampling_params.py

+        if count < 1:
+            raise ValueError(
+                "You must use one kind of structured outputs constraint "
+                f"but none are specified: {self.__dict__}"
+            )


This new validation is a great improvement for ensuring StructuredOutputsParams is always in a valid state. However, it introduces a potential regression for deprecated parameters.

Specifically, the logic for handling deprecated guided_* parameters in vllm/entrypoints/openai/protocol.py (lines 794-806) can now raise this ValueError. If a user provides only guided_whitespace_pattern (which maps to whitespace_pattern), the code will attempt to create StructuredOutputsParams with only a non-constraint parameter. This will cause count to be 0 here, triggering this error.

While the problematic code is not in this diff, this change makes it faulty. To prevent this regression, the logic for handling deprecated parameters should be updated to only construct StructuredOutputsParams if at least one constraint parameter (e.g., guided_json, guided_regex) is provided.

For example, in ChatCompletionRequest.to_sampling_params in vllm/entrypoints/openai/protocol.py, the logic could be adjusted:

# ... inside to_sampling_params, after collecting kwargs from deprecated params kwargs = {k: v for k, v in kwargs.items() if v is not None} constraint_keys = {'json', 'regex', 'choice', 'grammar', 'structural_tag'} if any(k in constraint_keys for k in kwargs): self.structured_outputs = StructuredOutputsParams(**kwargs)

This would ensure backward compatibility for the deprecated parameters while upholding the new, stricter validation.

AFAIK guided_whitespace_pattern must be given with one of the structural constraints, otherwise the same no structured parameter error is raised (so it is not a newly introduced regression but pre-existing bug).

Though I'd welcome a better way to validate the structured outputs params.

cjackal · 2025-10-16T14:56:38Z

I think we can do much better response_format validation after #26519; will rework if this PR does not get merged before #26519

cjackal · 2025-10-22T16:25:07Z

@chaunceyjiang Would you mind having a look at this PR? As more clients are using response_format, this bug is increasingly disruptive in terms of server stability. I think StructuredOutputsParams without a compilable grammar, which crashes the server at the grammar compilation stage, should not be allowed to be created in the first place.

zifeitong · 2025-10-28T21:43:56Z

vllm/entrypoints/openai/protocol.py

-            # we must enable it for these features to work
-            if self.structured_outputs is None:
-                self.structured_outputs = StructuredOutputsParams()
+            kwargs_changes = dict[str, Any]()

            # Set structured output params for response format
            if response_format is not None:


This if clause is redundant

Indeed, recent code change in upstream makes this if clause funny 😄. Thank you for pointing it.

zifeitong · 2025-10-28T21:45:23Z

vllm/entrypoints/openai/protocol.py


            # Set structured output params for response format
            if response_format is not None:
                if response_format.type == "json_object":
-                    self.structured_outputs.json_object = True
+                    kwargs_changes["json_object"] = True


There is no need to introduce kwargs_changes?

How about just replace self.structured_outputs.json_object = True with self.structured_outputs = StructuredOutputsParams(json_object=True)

The same applies to all the other cases.

We'd like to inherit from the original StructuredOutputsParams due to the other options like whitespace_pattern, so I updated over the existing parameters, not newly create it.

Would we like to ignore these options for the response_format codepath?

I'd like to mention that the whole logic around kwargs_changes and dataclasses.replace is there just to validate StructuredOutputParams. We can achieve the same effect by validating on assignment; Pydantic dataclasses already support this by adding ConfigDict(validate_assignment=True) and replacing __post_init__ to @pydantic.model_validator decorator.

cc @hmellor Is it the way to go from the point of pydantic validation refactoring? Most models on vllm.sampling_params are using msgpack and currently StructuredOutputParams is the only exception with a small comment on "maybe make msgpack". If we can keep the StructuredOutputParams pydantic, all the mess around the current state of StructuredOutputParams not validated during the creation/modification can be nicely gone.

chaunceyjiang · 2026-01-13T02:32:03Z

@cjackal I’m really sorry—I missed this PR. Could you rebase main onto your branch so we can move forward?

mergify · 2026-01-13T03:44:15Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @cjackal.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

cjackal · 2026-01-13T04:33:35Z

@cjackal I’m really sorry—I missed this PR. Could you rebase main onto your branch so we can move forward?

No worries, I will rebase tonight. Or I have granted push permission to maintainers, feel free to rebase by yourself if urgent.

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

vllm/entrypoints/openai/completion/protocol.py

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

chaunceyjiang · 2026-01-15T10:59:05Z

After #32127 is merged, vLLM will no longer crash. However, I still think this PR is a nice improvement.

cjackal · 2026-01-15T11:20:10Z

After #32127 is merged, vLLM will no longer crash. However, I still think this PR is a nice improvement.

Indeed, this PR looks more like a general code quality improvement + unit test addition for now 😄 Thanks for the review!

chaunceyjiang

thanks~

…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

cjackal requested review from aarnphm and chaunceyjiang as code owners October 14, 2025 16:28

mergify bot added the frontend label Oct 14, 2025

cjackal changed the title ~~Fix online serving shutdown when chat completion with text type response_format is received~~ [bugfix] Fix online serving shutdown when chat completion with text type response_format is received Oct 14, 2025

gemini-code-assist bot reviewed Oct 14, 2025

View reviewed changes

cjackal changed the title ~~[bugfix] Fix online serving shutdown when chat completion with text type response_format is received~~ [bugfix] Fix online serving crash when chat completion with text type response_format is received Oct 16, 2025

cjackal force-pushed the validate-structured-output branch from 247ab09 to 5fc84eb Compare October 22, 2025 15:07

cjackal changed the title ~~[bugfix] Fix online serving crash when chat completion with text type response_format is received~~ [bugfix] Fix online serving crash when text type response_format is received Oct 22, 2025

cjackal force-pushed the validate-structured-output branch from 8c4b3e3 to 3a5976d Compare October 28, 2025 14:29

cjackal requested review from DarkLight1337, NickLucche, robertgshaw2-redhat and simon-mo as code owners October 28, 2025 14:29

DarkLight1337 requested a review from russellb October 28, 2025 14:41

zifeitong reviewed Oct 28, 2025

View reviewed changes

cjackal force-pushed the validate-structured-output branch 2 times, most recently from 6df7424 to 7e58459 Compare November 6, 2025 14:28

mergify bot added the tool-calling label Nov 6, 2025

github-project-automation bot added this to Tool Calling Nov 6, 2025

cjackal mentioned this pull request Nov 7, 2025

[Bugfix] Prevent crash on empty grammar string #28210

Merged

cjackal mentioned this pull request Nov 21, 2025

[RFC]: SamplingParams should raise a warning when modified via direct assignment on the user side #29081

Closed

1 task

zifeitong mentioned this pull request Jan 13, 2026

[BugFix] Fix engine crash caused by chat tools + response_format #32127

Merged

mergify bot added the needs-rebase label Jan 13, 2026

cjackal force-pushed the validate-structured-output branch from 7e58459 to 7f389dd Compare January 13, 2026 14:26

mergify bot removed the needs-rebase label Jan 13, 2026

cjackal and others added 6 commits January 15, 2026 09:55

rebase

94578a7

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

cjackal force-pushed the validate-structured-output branch from 17505a7 to d96e330 Compare January 15, 2026 09:59

mergify bot removed the needs-rebase label Jan 15, 2026

rebase

2ae7b6b

Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>

chaunceyjiang reviewed Jan 15, 2026

View reviewed changes

vllm/entrypoints/openai/completion/protocol.py Outdated Show resolved Hide resolved

chaunceyjiang added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 15, 2026

chaunceyjiang enabled auto-merge (squash) January 16, 2026 04:23

chaunceyjiang approved these changes Jan 16, 2026

View reviewed changes

chaunceyjiang merged commit 35bf5d0 into vllm-project:main Jan 16, 2026
50 checks passed

github-project-automation bot moved this to Done in Tool Calling Jan 16, 2026

cjackal deleted the validate-structured-output branch January 18, 2026 02:14

This was referenced Mar 3, 2026

[Responses API] Structured output + reasoning via structural tag embedding #35873

Closed

[Responses API] Structured output + reasoning via structural tag embedding #35904

Open

Uh oh!

Conversation

cjackal commented Oct 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal commented Oct 16, 2025

Uh oh!

cjackal commented Oct 22, 2025

Uh oh!

zifeitong Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

zifeitong Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

cjackal Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

chaunceyjiang commented Jan 13, 2026

Uh oh!

mergify bot commented Jan 13, 2026

Uh oh!

cjackal commented Jan 13, 2026

Uh oh!

Uh oh!

chaunceyjiang commented Jan 15, 2026

Uh oh!

cjackal commented Jan 15, 2026

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cjackal commented Oct 14, 2025 •

edited by github-actions bot

Loading