[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

qandrew · 2025-09-10T06:22:20Z

Purpose

add IncompleteDetails to vllm's implementation of ResponsesResponses.
first use case: if the max token length has been hit, we should add the reason for incompleteness. this is added in the non streaming version, the streaming version will be a follow up PR (bc it needs to be based off of [gpt-oss][1][bugfix] fix streaming final output #24466
fix some formatting issues

I do realize there are other things we need to fix for IncompleteDetails; for now we'll guarantee that IF incomplete details is outputted, it is true. We'll still need to test flows like if the generator was interrupted abruptly, and output the reason why. And I don't think GPT-OSS does content filters for now but that is another IncompleteDetails implementation.

Test Plan

Server

(gpt_oss_edit) [[email protected] /data/users/axia/gitrepos/vllm (andrew/incomplete-details)]$ CUDA_VISIBLE_DEVICES=2,3 with-proxy vllm serve "/data/users/axia/checkpoints/gpt-oss-120b" -tp 2 --port 20001

Client

(gpt_oss_edit) [[email protected] /data/users/axia/gitrepos/vllm (andrew/incomplete-details)]$ curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Write two paragraphs on the weather."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256
}' | jq
# outputs 
{
  "id": "resp_5b6f93e8efa4445497f6b6bc052b6dac",
  "created_at": 1757701130,
  "incomplete_details": {
    "reason": "max_output_tokens" # we get incomplete reasons here
  },
  "instructions": null,
  "metadata": null,
  "model": "/data/users/axia/checkpoints/gpt-oss-120b",
  "object": "response",
  "output": [
    {
      "id": "rs_541bc7e4806f4630b04703e72e14e025",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "User asks: \"Write two paragraphs on the weather.\" We need to produce two paragraphs about weather. Should be descriptive, could talk about different aspects. Probably just two paragraphs. Ensure it's coherent. No extra instructions.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_122778466dba478aa761b0542f2cf81f",
      "content": [
        {
          "annotations": [],
          "text": "The sky stretched a bruised violet this morning, the first hints of sunrise shyly peeking through a thin veil of low‑lying clouds. A gentle breeze whispered through the oak leaves, carrying the faint scent of damp earth and pine sap, while droplets of drizzle clung to the windowpanes like tiny crystal beads. The temperature hovered just above the dew point, making the air feel cool enough to pull a light sweater from the closet, yet not so cold as to bite the skin. As the sun climbed higher, its golden rays began to melt the lingering mist, turning the wet streets into shimmering ribbons that reflected the city’s hurried rhythm.\n\nBy afternoon, the weather shifted dramatically. Dark, billowing cumulonimbus towers gathered on the horizon, their edges tinged with electric blue, signalling an approaching thunderstorm. A sudden gust surged through the streets, rattling shutters and sending loose papers swirling in a chaotic dance. The first crack of thunder rolled like distant drums, followed quickly by a cascade of rain that",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0.7,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": 256,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": null,
  "service_tier": "auto",
  "status": "incomplete",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 72,
    "input_tokens_details": {
      "cached_tokens": 64
    },
    "output_tokens": 256,
    "output_tokens_details": {
      "reasoning_tokens": 44,
      "tool_output_tokens": 0
    },
    "total_tokens": 328
  },
  "user": null
}

Client, where we don't hit max token limit

(gpt_oss_edit) [[email protected] /data/users/axia/gitrepos/vllm (andrew/incomplete-details)]$ curl http://localhost:20001/v1/responses   -H "Content-Type: application/json"   -N   -d '{
    "model": "/data/users/axia/checkpoints/gpt-oss-120b",
    "input": [
        {
            "role": "user",
            "content": "Write two words on the weather."
        }
    ],
    "temperature": 0.7,
    "max_output_tokens": 256
}' 
# output
{
  "id": "resp_60215e7212c044669bf5b85015fe19e4",
  "created_at": 1757701190,
  "incomplete_details": null,
  "instructions": null,
  "metadata": null,
  "model": "/data/users/axia/checkpoints/gpt-oss-120b",
  "object": "response",
  "output": [
    {
      "id": "rs_02262a1cf9cf4ff3891d07d282f2e176",
      "summary": [],
      "type": "reasoning",
      "content": [
        {
          "text": "The user asks: \"Write two words on the weather.\" Probably they want a short phrase of two words describing the weather. Could be \"sunny skies\", \"stormy night\", etc. Provide two words. Could also be a short description: \"Cloudy day\". Provide two words. Probably just output two words. I'll respond with two words.",
          "type": "reasoning_text"
        }
      ],
      "encrypted_content": null,
      "status": null
    },
    {
      "id": "msg_52743c005ca94adf9687a4ffebb0107e",
      "content": [
        {
          "annotations": [],
          "text": "Sunny skies.",
          "type": "output_text",
          "logprobs": null
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 0.7,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "max_output_tokens": 256,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "reasoning": null,
  "service_tier": "auto",
  "status": "completed",
  "text": null,
  "top_logprobs": null,
  "truncation": "disabled",
  "usage": {
    "input_tokens": 72,
    "input_tokens_details": {
      "cached_tokens": 64
    },
    "output_tokens": 84,
    "output_tokens_details": {
      "reasoning_tokens": 72,
      "tool_output_tokens": 0
    },
    "total_tokens": 156
  },
  "user": null
}

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Andrew Xia <[email protected]>

mergify · 2025-09-12T17:11:58Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @qandrew.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Xia <[email protected]>

yeqcharlotte

thanks for the test plan! cc: @heheda12345 @houseroad

yeqcharlotte · 2025-09-14T02:43:12Z

vllm/entrypoints/openai/protocol.py

+        # TODO: implement the other reason for incomplete_details,
+        # which is content_filter
+        # incomplete_details = IncompleteDetails(reason='content_filter')


what's missing from current logic btw.

I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206

qandrew · 2025-09-12T22:33:56Z

vllm/entrypoints/harmony_utils.py

if the parser still has messages (ie if the generator got cut abruptly, this should be incomplete and not completed.

make sense please move your comment to the code :)

qandrew · 2025-09-14T03:56:33Z

vllm/entrypoints/openai/protocol.py

+        # TODO: implement the other reason for incomplete_details,
+        # which is content_filter
+        # incomplete_details = IncompleteDetails(reason='content_filter')


I don't think VLLM baseline implementation supports content filter as an abort reason currently: https://github.com/vllm-project/vllm/blob/main/vllm/v1/request.py#L206

Signed-off-by: Andrew Xia <[email protected]>

yeqcharlotte · 2025-09-15T06:00:22Z

@qandrew @houseroad this needs a rebase over the structured output disable commit

) Signed-off-by: Andrew Xia <[email protected]>

Signed-off-by: bbartels <[email protected]> [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561) Signed-off-by: Andrew Xia <[email protected]> [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (vllm-project#24759) Signed-off-by: Andrew Xia <[email protected]> [Performance] Remove redundant clone() calls in cutlass_mla (vllm-project#24891) [Bug] Fix Cutlass Scaled MM Compilation Error (vllm-project#24887) Signed-off-by: yewentao256 <[email protected]> [ci] fix wheel names for arm wheels (vllm-project#24898) Signed-off-by: simon-mo <[email protected]> [Tests] fix initialization of kv hash in tests (vllm-project#24273) Signed-off-by: Mickael Seznec <[email protected]> [Compile] Fix noop_elimination pass and add tests for noop_elimination (vllm-project#24880) Signed-off-by: zjy0516 <[email protected]> Propagate entire tokens to connector for resumed preemptions Signed-off-by: Qier Li <[email protected]> Fix pre-commit Signed-off-by: Qier Li <[email protected]> Rename field and nullify empty lists Signed-off-by: Qier Li <[email protected]> Update vllm/v1/core/sched/scheduler.py Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Qier Li <[email protected]> Add unit test for preemption resumption Signed-off-by: Qier Li <[email protected]>

) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

) Signed-off-by: Andrew Xia <[email protected]>

) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

qandrew added 2 commits September 8, 2025 21:50

initial commit incomplete details

c2c88af

Signed-off-by: Andrew Xia <[email protected]>

looking for finish reason...

a96c40b

Signed-off-by: Andrew Xia <[email protected]>

mergify bot added frontend gpt-oss Related to GPT-OSS models v1 labels Sep 10, 2025

fix works

bb5933d

Signed-off-by: Andrew Xia <[email protected]>

mergify bot added the needs-rebase label Sep 12, 2025

clean

f2617c8

Signed-off-by: Andrew Xia <[email protected]>

mergify bot removed the needs-rebase label Sep 12, 2025

almost review ready

1718d05

Signed-off-by: Andrew Xia <[email protected]>

qandrew force-pushed the andrew/incomplete-details branch from 1fe6099 to 1718d05 Compare September 12, 2025 22:35

qandrew added 2 commits September 12, 2025 16:00

unit test

15df099

Signed-off-by: Andrew Xia <[email protected]>

mypy

eab9c10

Signed-off-by: Andrew Xia <[email protected]>

qandrew marked this pull request as ready for review September 12, 2025 23:14

qandrew requested review from DarkLight1337, NickLucche, WoosukKwon, aarnphm, alexm-redhat, chaunceyjiang, comaniac, heheda12345, njhill, robertgshaw2-redhat, simon-mo and ywang96 as code owners September 12, 2025 23:14

yeqcharlotte added this to gpt-oss Issues & Enhancements Sep 14, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte approved these changes Sep 14, 2025

View reviewed changes

yeqcharlotte moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 14, 2025

yeqcharlotte mentioned this pull request Sep 14, 2025

[Bug]: responses api - no way to tell if finish reason is length #24184

Closed

1 task

qandrew commented Sep 14, 2025

View reviewed changes

houseroad approved these changes Sep 14, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 14, 2025

charlotte comments

40839d7

Signed-off-by: Andrew Xia <[email protected]>

Merge branch 'main' into andrew/incomplete-details

b7fa6e5

houseroad merged commit 25aba2b into vllm-project:main Sep 15, 2025
45 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 15, 2025

tlrmchlsmth pushed a commit to tlrmchlsmth/vllm that referenced this pull request Sep 15, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561

a51a297

) Signed-off-by: Andrew Xia <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561

8c766f5

) Signed-off-by: Andrew Xia <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561

1cd8f3f

) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561

03e32f6

) Signed-off-by: Andrew Xia <[email protected]>

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse (vllm-project#24561

a942707

) Signed-off-by: Andrew Xia <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

Uh oh!

qandrew commented Sep 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Sep 12, 2025

Uh oh!

yeqcharlotte left a comment

Uh oh!

yeqcharlotte Sep 14, 2025

Uh oh!

qandrew Sep 14, 2025

Uh oh!

qandrew Sep 12, 2025

Uh oh!

yeqcharlotte Sep 14, 2025

Uh oh!

qandrew Sep 14, 2025

Uh oh!

yeqcharlotte commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

[gpt-oss] Add IncompleteDetails to ResponsesRepsonse #24561

Uh oh!

Conversation

qandrew commented Sep 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Sep 12, 2025

Uh oh!

yeqcharlotte left a comment

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

qandrew Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

yeqcharlotte commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qandrew commented Sep 10, 2025 •

edited by github-actions bot

Loading