[bugfix] Fix online serving crash when text type response_format is received#26822
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a crash in the online serving endpoint when a chat completion request includes response_format: {"type": "text"}. The fix correctly prevents the creation of an invalid, empty StructuredOutputsParams object by adding stricter validation and refining the parameter handling logic. The changes are well-implemented. However, I've identified a critical regression where the new validation can cause crashes for requests using certain deprecated parameters. A fix is suggested in the detailed comment.
| if count < 1: | ||
| raise ValueError( | ||
| "You must use one kind of structured outputs constraint " | ||
| f"but none are specified: {self.__dict__}" | ||
| ) |
There was a problem hiding this comment.
This new validation is a great improvement for ensuring StructuredOutputsParams is always in a valid state. However, it introduces a potential regression for deprecated parameters.
Specifically, the logic for handling deprecated guided_* parameters in vllm/entrypoints/openai/protocol.py (lines 794-806) can now raise this ValueError. If a user provides only guided_whitespace_pattern (which maps to whitespace_pattern), the code will attempt to create StructuredOutputsParams with only a non-constraint parameter. This will cause count to be 0 here, triggering this error.
While the problematic code is not in this diff, this change makes it faulty. To prevent this regression, the logic for handling deprecated parameters should be updated to only construct StructuredOutputsParams if at least one constraint parameter (e.g., guided_json, guided_regex) is provided.
For example, in ChatCompletionRequest.to_sampling_params in vllm/entrypoints/openai/protocol.py, the logic could be adjusted:
# ... inside to_sampling_params, after collecting kwargs from deprecated params
kwargs = {k: v for k, v in kwargs.items() if v is not None}
constraint_keys = {'json', 'regex', 'choice', 'grammar', 'structural_tag'}
if any(k in constraint_keys for k in kwargs):
self.structured_outputs = StructuredOutputsParams(**kwargs)This would ensure backward compatibility for the deprecated parameters while upholding the new, stricter validation.
There was a problem hiding this comment.
AFAIK guided_whitespace_pattern must be given with one of the structural constraints, otherwise the same no structured parameter error is raised (so it is not a newly introduced regression but pre-existing bug).
Though I'd welcome a better way to validate the structured outputs params.
247ab09 to
5fc84eb
Compare
|
@chaunceyjiang Would you mind having a look at this PR? As more clients are using response_format, this bug is increasingly disruptive in terms of server stability. I think |
8c4b3e3 to
3a5976d
Compare
vllm/entrypoints/openai/protocol.py
Outdated
| # we must enable it for these features to work | ||
| if self.structured_outputs is None: | ||
| self.structured_outputs = StructuredOutputsParams() | ||
| kwargs_changes = dict[str, Any]() | ||
|
|
||
| # Set structured output params for response format | ||
| if response_format is not None: |
There was a problem hiding this comment.
This if clause is redundant
There was a problem hiding this comment.
Indeed, recent code change in upstream makes this if clause funny 😄. Thank you for pointing it.
vllm/entrypoints/openai/protocol.py
Outdated
|
|
||
| # Set structured output params for response format | ||
| if response_format is not None: | ||
| if response_format.type == "json_object": | ||
| self.structured_outputs.json_object = True | ||
| kwargs_changes["json_object"] = True |
There was a problem hiding this comment.
There is no need to introduce kwargs_changes?
How about just replace self.structured_outputs.json_object = True with self.structured_outputs = StructuredOutputsParams(json_object=True)
The same applies to all the other cases.
There was a problem hiding this comment.
We'd like to inherit from the original StructuredOutputsParams due to the other options like whitespace_pattern, so I updated over the existing parameters, not newly create it.
Would we like to ignore these options for the response_format codepath?
There was a problem hiding this comment.
I'd like to mention that the whole logic around kwargs_changes and dataclasses.replace is there just to validate StructuredOutputParams. We can achieve the same effect by validating on assignment; Pydantic dataclasses already support this by adding ConfigDict(validate_assignment=True) and replacing __post_init__ to @pydantic.model_validator decorator.
cc @hmellor Is it the way to go from the point of pydantic validation refactoring? Most models on vllm.sampling_params are using msgpack and currently StructuredOutputParams is the only exception with a small comment on "maybe make msgpack". If we can keep the StructuredOutputParams pydantic, all the mess around the current state of StructuredOutputParams not validated during the creation/modification can be nicely gone.
6df7424 to
7e58459
Compare
|
@cjackal I’m really sorry—I missed this PR. Could you rebase |
|
This pull request has merge conflicts that must be resolved before it can be |
No worries, I will rebase tonight. Or I have granted push permission to maintainers, feel free to rebase by yourself if urgent. |
7e58459 to
7f389dd
Compare
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
17505a7 to
d96e330
Compare
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
|
After #32127 is merged, vLLM will no longer crash. However, I still think this PR is a nice improvement. |
Indeed, this PR looks more like a general code quality improvement + unit test addition for now 😄 Thanks for the review! |
…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…eceived (vllm-project#26822) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com> Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com> Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Purpose
Fix #26639.
Especially, this PR adds a proper input validation to
StructuredOutpusParamsnot to generate unschedulable structured outputs params, and adjust sampling parameter generation logic inChatCompletionRequest.to_sampling_parameters()andOpenAIServingResponses.create_responses()to reflect the change and be more robust.co-authored by @j0shuajun who first reported the issue and minimal reproducible examples on our side.
Test Plan
Pass a chat completion request with
"response_format": {"type": "text"}:Plus, pass a chat completion request with
"response_format": {"type": "json_object"}to check for regressions.Test Result
Return a valid response in both cases.
Essential Elements of an Effective PR Description Checklist