-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561
Conversation
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
1fe6099 to
1718d05
Compare
Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: Andrew Xia <[email protected]>
yeqcharlotte
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the test plan! cc: @heheda12345 @houseroad
| # TODO: implement the other reason for incomplete_details, | ||
| # which is content_filter | ||
| # incomplete_details = IncompleteDetails(reason='content_filter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's missing from current logic btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206
vllm/entrypoints/harmony_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the parser still has messages (ie if the generator got cut abruptly, this should be incomplete and not completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense please move your comment to the code :)
| # TODO: implement the other reason for incomplete_details, | ||
| # which is content_filter | ||
| # incomplete_details = IncompleteDetails(reason='content_filter') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206
Signed-off-by: Andrew Xia <[email protected]>
|
@qandrew @houseroad this needs a rebase over the structured output disable commit |
) Signed-off-by: Andrew Xia <[email protected]>
) Signed-off-by: Andrew Xia <[email protected]>
Signed-off-by: bbartels <[email protected]> [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561) Signed-off-by: Andrew Xia <[email protected]> [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (vllm-project#24759) Signed-off-by: Andrew Xia <[email protected]> [Performance] Remove redundant clone() calls in cutlass_mla (vllm-project#24891) [Bug] Fix Cutlass Scaled MM Compilation Error (vllm-project#24887) Signed-off-by: yewentao256 <[email protected]> [ci] fix wheel names for arm wheels (vllm-project#24898) Signed-off-by: simon-mo <[email protected]> [Tests] fix initialization of kv hash in tests (vllm-project#24273) Signed-off-by: Mickael Seznec <[email protected]> [Compile] Fix noop_elimination pass and add tests for noop_elimination (vllm-project#24880) Signed-off-by: zjy0516 <[email protected]> Propagate entire tokens to connector for resumed preemptions Signed-off-by: Qier Li <[email protected]> Fix pre-commit Signed-off-by: Qier Li <[email protected]> Rename field and nullify empty lists Signed-off-by: Qier Li <[email protected]> Update vllm/v1/core/sched/scheduler.py Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Qier Li <[email protected]> Add unit test for preemption resumption Signed-off-by: Qier Li <[email protected]>
) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
) Signed-off-by: Andrew Xia <[email protected]>
) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Purpose
I do realize there are other things we need to fix for IncompleteDetails; for now we'll guarantee that IF incomplete details is outputted, it is true. We'll still need to test flows like if the generator was interrupted abruptly, and output the reason why. And I don't think GPT-OSS does content filters for now but that is another IncompleteDetails implementation.
Test Plan
Server
Client
Client, where we don't hit max token limit
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.