Skip to content

Conversation

@wuhang2014
Copy link
Contributor

@wuhang2014 wuhang2014 commented Aug 29, 2025

Purpose

Fix #23295 especially for streaming response

Test Plan

import asyncio, time

import openai

MODEL = "/home/models/gpt-oss-20b"
client = openai.AsyncOpenAI(api_key="my_key", base_url="http://127.0.0.1:8000/v1")


async def main_stream_background():
    response = await client.responses.create(
        model=MODEL,
        input="Multiply 64548*15151 using python code interpreter.",
        tools=[{
            "type": "code_interpreter",
            "container": {
                "type": "auto"
            }
        }],
        stream=True,
        background=True,
    )

    # stream=True
    cursor = None
    resp_id = None
    async for event in response:
        cursor = event.sequence_number
        if event.type == "response.created":
            resp_id = event.response.id
        print(f"{event=}")
        if cursor == 100:
            break

    print("="*36)

    async with await client.responses.retrieve(response_id=resp_id, stream=True, starting_after=90) as stream:
        async for event in stream:
            print(f"{event=}")

# Run the async function
if __name__ == "__main__":
    asyncio.run(main_stream_background())

Test Result

streaming response

event=ResponseCreatedEvent(response=Response(id='resp_05238f1b33304f71995c210942619de4', created_at=1756464260.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/models/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[CodeInterpreter(container=CodeInterpreterContainerCodeInterpreterToolAuto(type='auto', file_ids=None), type='code_interpreter')], top_p=1.0, background=True, max_output_tokens=130927, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=0, type='response.created')
event=ResponseInProgressEvent(response=Response(id='resp_05238f1b33304f71995c210942619de4', created_at=1756464260.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/models/gpt-oss-20b', object='response', output=[], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[CodeInterpreter(container=CodeInterpreterContainerCodeInterpreterToolAuto(type='auto', file_ids=None), type='code_interpreter')], top_p=1.0, background=True, max_output_tokens=130927, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='in_progress', text=None, top_logprobs=None, truncation='disabled', usage=None, user=None), sequence_number=1, type='response.in_progress')
event=ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=0, sequence_number=2, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=0, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=3, type='response.content_part.added')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='We', item_id='', output_index=0, sequence_number=4, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' need', item_id='', output_index=0, sequence_number=5, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' to', item_id='', output_index=0, sequence_number=6, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' multiply', item_id='', output_index=0, sequence_number=7, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' ', item_id='', output_index=0, sequence_number=8, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='645', item_id='', output_index=0, sequence_number=9, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='48', item_id='', output_index=0, sequence_number=10, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' *', item_id='', output_index=0, sequence_number=11, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' ', item_id='', output_index=0, sequence_number=12, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='151', item_id='', output_index=0, sequence_number=13, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='51', item_id='', output_index=0, sequence_number=14, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=15, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' Use', item_id='', output_index=0, sequence_number=16, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' python', item_id='', output_index=0, sequence_number=17, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' code', item_id='', output_index=0, sequence_number=18, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' interpreter', item_id='', output_index=0, sequence_number=19, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=20, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' Just', item_id='', output_index=0, sequence_number=21, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' compute', item_id='', output_index=0, sequence_number=22, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' product', item_id='', output_index=0, sequence_number=23, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=24, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=" Let's", item_id='', output_index=0, sequence_number=25, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' calculate', item_id='', output_index=0, sequence_number=26, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=0, sequence_number=27, type='response.reasoning_text.delta')
event=ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=1, sequence_number=28, text="We need to multiply 64548 * 15151. Use python code interpreter. Just compute product. Let's calculate.", type='response.reasoning_text.done')
event=ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text="We need to multiply 64548 * 15151. Use python code interpreter. Just compute product. Let's calculate.", type='reasoning_text')], encrypted_content=None, status='completed'), output_index=1, sequence_number=29, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseCodeInterpreterToolCall(id='', code=None, container_id='auto', outputs=None, status='in_progress', type='code_interpreter_call'), output_index=1, sequence_number=30, type='response.output_item.added')
event=ResponseCodeInterpreterCallInProgressEvent(item_id='', output_index=1, sequence_number=31, type='response.code_interpreter_call.in_progress')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='645', item_id='', output_index=1, sequence_number=32, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='48', item_id='', output_index=1, sequence_number=33, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' *', item_id='', output_index=1, sequence_number=34, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' ', item_id='', output_index=1, sequence_number=35, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='151', item_id='', output_index=1, sequence_number=36, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='51', item_id='', output_index=1, sequence_number=37, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDoneEvent(code='64548 * 15151', item_id='', output_index=2, sequence_number=38, type='response.code_interpreter_call_code.done')
event=ResponseCodeInterpreterCallInterpretingEvent(item_id='', output_index=2, sequence_number=39, type='response.code_interpreter_call.interpreting')
event=ResponseCodeInterpreterCallCompletedEvent(item_id='', output_index=2, sequence_number=40, type='response.code_interpreter_call.completed')
event=ResponseOutputItemDoneEvent(item=ResponseCodeInterpreterToolCall(id='', code='64548 * 15151', container_id='auto', outputs=[], status='completed', type='code_interpreter_call'), output_index=2, sequence_number=41, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=2, sequence_number=42, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=2, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=43, type='response.content_part.added')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta="Let's", item_id='', output_index=2, sequence_number=44, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' print', item_id='', output_index=2, sequence_number=45, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' result', item_id='', output_index=2, sequence_number=46, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=2, sequence_number=47, type='response.reasoning_text.delta')
event=ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=3, sequence_number=48, text="Let's print result.", type='response.reasoning_text.done')
event=ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text="Let's print result.", type='reasoning_text')], encrypted_content=None, status='completed'), output_index=3, sequence_number=49, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseCodeInterpreterToolCall(id='', code=None, container_id='auto', outputs=None, status='in_progress', type='code_interpreter_call'), output_index=3, sequence_number=50, type='response.output_item.added')
event=ResponseCodeInterpreterCallInProgressEvent(item_id='', output_index=3, sequence_number=51, type='response.code_interpreter_call.in_progress')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='645', item_id='', output_index=3, sequence_number=52, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='48', item_id='', output_index=3, sequence_number=53, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' *', item_id='', output_index=3, sequence_number=54, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' ', item_id='', output_index=3, sequence_number=55, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='151', item_id='', output_index=3, sequence_number=56, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='51', item_id='', output_index=3, sequence_number=57, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDoneEvent(code='64548 * 15151', item_id='', output_index=4, sequence_number=58, type='response.code_interpreter_call_code.done')
event=ResponseCodeInterpreterCallInterpretingEvent(item_id='', output_index=4, sequence_number=59, type='response.code_interpreter_call.interpreting')
event=ResponseCodeInterpreterCallCompletedEvent(item_id='', output_index=4, sequence_number=60, type='response.code_interpreter_call.completed')
event=ResponseOutputItemDoneEvent(item=ResponseCodeInterpreterToolCall(id='', code='64548 * 15151', container_id='auto', outputs=[], status='completed', type='code_interpreter_call'), output_index=4, sequence_number=61, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=4, sequence_number=62, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=4, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=63, type='response.content_part.added')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='Maybe', item_id='', output_index=4, sequence_number=64, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' we', item_id='', output_index=4, sequence_number=65, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' need', item_id='', output_index=4, sequence_number=66, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' to', item_id='', output_index=4, sequence_number=67, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' run', item_id='', output_index=4, sequence_number=68, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' correctly', item_id='', output_index=4, sequence_number=69, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=4, sequence_number=70, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=" Let's", item_id='', output_index=4, sequence_number=71, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' do', item_id='', output_index=4, sequence_number=72, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' print', item_id='', output_index=4, sequence_number=73, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=4, sequence_number=74, type='response.reasoning_text.delta')
event=ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=5, sequence_number=75, text="Maybe we need to run correctly. Let's do print.", type='response.reasoning_text.done')
event=ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text="Maybe we need to run correctly. Let's do print.", type='reasoning_text')], encrypted_content=None, status='completed'), output_index=5, sequence_number=76, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseCodeInterpreterToolCall(id='', code=None, container_id='auto', outputs=None, status='in_progress', type='code_interpreter_call'), output_index=5, sequence_number=77, type='response.output_item.added')
event=ResponseCodeInterpreterCallInProgressEvent(item_id='', output_index=5, sequence_number=78, type='response.code_interpreter_call.in_progress')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='print', item_id='', output_index=5, sequence_number=79, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='(', item_id='', output_index=5, sequence_number=80, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='645', item_id='', output_index=5, sequence_number=81, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='48', item_id='', output_index=5, sequence_number=82, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' *', item_id='', output_index=5, sequence_number=83, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=' ', item_id='', output_index=5, sequence_number=84, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='151', item_id='', output_index=5, sequence_number=85, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta='51', item_id='', output_index=5, sequence_number=86, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDeltaEvent(delta=')', item_id='', output_index=5, sequence_number=87, type='response.code_interpreter_call_code.delta')
event=ResponseCodeInterpreterCallCodeDoneEvent(code='print(64548 * 15151)', item_id='', output_index=6, sequence_number=88, type='response.code_interpreter_call_code.done')
event=ResponseCodeInterpreterCallInterpretingEvent(item_id='', output_index=6, sequence_number=89, type='response.code_interpreter_call.interpreting')
event=ResponseCodeInterpreterCallCompletedEvent(item_id='', output_index=6, sequence_number=90, type='response.code_interpreter_call.completed')
event=ResponseOutputItemDoneEvent(item=ResponseCodeInterpreterToolCall(id='', code='print(64548 * 15151)', container_id='auto', outputs=[], status='completed', type='code_interpreter_call'), output_index=6, sequence_number=91, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=6, sequence_number=92, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=6, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=93, type='response.content_part.added')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='Thus', item_id='', output_index=6, sequence_number=94, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' product', item_id='', output_index=6, sequence_number=95, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='=', item_id='', output_index=6, sequence_number=96, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='977', item_id='', output_index=6, sequence_number=97, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=',', item_id='', output_index=6, sequence_number=98, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='966', item_id='', output_index=6, sequence_number=99, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=',', item_id='', output_index=6, sequence_number=100, type='response.reasoning_text.delta')
====================================
event=ResponseOutputItemDoneEvent(item=ResponseCodeInterpreterToolCall(id='', code='print(64548 * 15151)', container_id='auto', outputs=[], status='completed', type='code_interpreter_call'), output_index=6, sequence_number=91, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=None, encrypted_content=None, status='in_progress'), output_index=6, sequence_number=92, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=6, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=93, type='response.content_part.added')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='Thus', item_id='', output_index=6, sequence_number=94, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' product', item_id='', output_index=6, sequence_number=95, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='=', item_id='', output_index=6, sequence_number=96, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='977', item_id='', output_index=6, sequence_number=97, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=',', item_id='', output_index=6, sequence_number=98, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='966', item_id='', output_index=6, sequence_number=99, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=',', item_id='', output_index=6, sequence_number=100, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='748', item_id='', output_index=6, sequence_number=101, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=6, sequence_number=102, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta=' Provide', item_id='', output_index=6, sequence_number=103, type='response.reasoning_text.delta')
event=ResponseReasoningTextDeltaEvent(content_index=0, delta='.', item_id='', output_index=6, sequence_number=104, type='response.reasoning_text.delta')
event=ResponseReasoningTextDoneEvent(content_index=0, item_id='', output_index=7, sequence_number=105, text='Thus product=977,966,748. Provide.', type='response.reasoning_text.done')
event=ResponseOutputItemDoneEvent(item=ResponseReasoningItem(id='', summary=[], type='reasoning', content=[Content(text='Thus product=977,966,748. Provide.', type='reasoning_text')], encrypted_content=None, status='completed'), output_index=7, sequence_number=106, type='response.output_item.done')
event=ResponseOutputItemAddedEvent(item=ResponseOutputMessage(id='', content=[], role='assistant', status='in_progress', type='message'), output_index=7, sequence_number=107, type='response.output_item.added')
event=ResponseContentPartAddedEvent(content_index=0, item_id='', output_index=7, part=ResponseOutputText(annotations=[], text='', type='output_text', logprobs=[]), sequence_number=108, type='response.content_part.added')
event=ResponseTextDeltaEvent(content_index=0, delta='Multip', item_id='', logprobs=[], output_index=7, sequence_number=109, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='lying', item_id='', logprobs=[], output_index=7, sequence_number=110, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' the', item_id='', logprobs=[], output_index=7, sequence_number=111, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' two', item_id='', logprobs=[], output_index=7, sequence_number=112, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' numbers', item_id='', logprobs=[], output_index=7, sequence_number=113, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=':\n\n', item_id='', logprobs=[], output_index=7, sequence_number=114, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='\\', item_id='', logprobs=[], output_index=7, sequence_number=115, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='[\n', item_id='', logprobs=[], output_index=7, sequence_number=116, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='645', item_id='', logprobs=[], output_index=7, sequence_number=117, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='48', item_id='', logprobs=[], output_index=7, sequence_number=118, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' \\', item_id='', logprobs=[], output_index=7, sequence_number=119, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='times', item_id='', logprobs=[], output_index=7, sequence_number=120, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' ', item_id='', logprobs=[], output_index=7, sequence_number=121, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='151', item_id='', logprobs=[], output_index=7, sequence_number=122, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='51', item_id='', logprobs=[], output_index=7, sequence_number=123, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' =', item_id='', logprobs=[], output_index=7, sequence_number=124, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' ', item_id='', logprobs=[], output_index=7, sequence_number=125, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='977', item_id='', logprobs=[], output_index=7, sequence_number=126, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=',', item_id='', logprobs=[], output_index=7, sequence_number=127, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='966', item_id='', logprobs=[], output_index=7, sequence_number=128, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=',', item_id='', logprobs=[], output_index=7, sequence_number=129, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='748', item_id='', logprobs=[], output_index=7, sequence_number=130, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='\n', item_id='', logprobs=[], output_index=7, sequence_number=131, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='\\', item_id='', logprobs=[], output_index=7, sequence_number=132, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=']\n\n', item_id='', logprobs=[], output_index=7, sequence_number=133, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='So', item_id='', logprobs=[], output_index=7, sequence_number=134, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' the', item_id='', logprobs=[], output_index=7, sequence_number=135, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' product', item_id='', logprobs=[], output_index=7, sequence_number=136, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' is', item_id='', logprobs=[], output_index=7, sequence_number=137, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=' **', item_id='', logprobs=[], output_index=7, sequence_number=138, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='977', item_id='', logprobs=[], output_index=7, sequence_number=139, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=',', item_id='', logprobs=[], output_index=7, sequence_number=140, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='966', item_id='', logprobs=[], output_index=7, sequence_number=141, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta=',', item_id='', logprobs=[], output_index=7, sequence_number=142, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='748', item_id='', logprobs=[], output_index=7, sequence_number=143, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='**', item_id='', logprobs=[], output_index=7, sequence_number=144, type='response.output_text.delta')
event=ResponseTextDeltaEvent(content_index=0, delta='.', item_id='', logprobs=[], output_index=7, sequence_number=145, type='response.output_text.delta')
event=ResponseTextDoneEvent(content_index=0, item_id='', logprobs=[], output_index=8, sequence_number=146, text='Multiplying the two numbers:\n\n\\[\n64548 \\times 15151 = 977,966,748\n\\]\n\nSo the product is **977,966,748**.', type='response.output_text.done')
event=ResponseContentPartDoneEvent(content_index=0, item_id='', output_index=8, part=ResponseOutputText(annotations=[], text='Multiplying the two numbers:\n\n\\[\n64548 \\times 15151 = 977,966,748\n\\]\n\nSo the product is **977,966,748**.', type='output_text', logprobs=None), sequence_number=147, type='response.content_part.done')
event=ResponseOutputItemDoneEvent(item=ResponseOutputMessage(id='', content=[ResponseOutputText(annotations=[], text='Multiplying the two numbers:\n\n\\[\n64548 \\times 15151 = 977,966,748\n\\]\n\nSo the product is **977,966,748**.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message'), output_index=8, sequence_number=148, type='response.output_item.done')
event=ResponseCompletedEvent(response=Response(id='resp_05238f1b33304f71995c210942619de4', created_at=1756464260.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='/home/models/gpt-oss-20b', object='response', output=[ResponseReasoningItem(id='rs_f401c9d2a7c246c6a1ee2ed527fdc87c', summary=[], type='reasoning', content=[Content(text="Let's print result.", type='reasoning_text')], encrypted_content=None, status=None), ResponseReasoningItem(id='rs_b1859a33d4a547538c907db5ff2fc56d', summary=[], type='reasoning', content=[Content(text='64548 * 15151', type='reasoning_text')], encrypted_content=None, status=None), ResponseReasoningItem(id='rs_4f316d86b9964e50b91763b1875d4fc1', summary=[], type='reasoning', content=[Content(text="Maybe we need to run correctly. Let's do print.", type='reasoning_text')], encrypted_content=None, status=None), ResponseReasoningItem(id='rs_d24120aa7b6d4683b4545a2fa280421f', summary=[], type='reasoning', content=[Content(text='print(64548 * 15151)', type='reasoning_text')], encrypted_content=None, status=None), ResponseReasoningItem(id='rs_956b4a673c3141edbbb968d72e5e7baa', summary=[], type='reasoning', content=[Content(text='Thus product=977,966,748. Provide.', type='reasoning_text')], encrypted_content=None, status=None), ResponseOutputMessage(id='msg_13a366e0c30a45289327065411b4fa41', content=[ResponseOutputText(annotations=[], text='Multiplying the two numbers:\n\n\\[\n64548 \\times 15151 = 977,966,748\n\\]\n\nSo the product is **977,966,748**.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[CodeInterpreter(container=CodeInterpreterContainerCodeInterpreterToolAuto(type='auto', file_ids=None), type='code_interpreter')], top_p=1.0, background=True, max_output_tokens=130915, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier='auto', status='completed', text=None, top_logprobs=None, truncation='disabled', usage=ResponseUsage(input_tokens=475, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=162, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=637), user=None), sequence_number=149, type='response.completed')


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the frontend label Aug 29, 2025
@wuhang2014 wuhang2014 marked this pull request as ready for review August 29, 2025 10:48
@wuhang2014 wuhang2014 requested a review from aarnphm as a code owner August 29, 2025 10:48
@wuhang2014
Copy link
Contributor Author

cc @heheda12345

@wuhang2014 wuhang2014 changed the title [Feature][Responses API]Support MCP tools with streaming response in background mode [Feature][Responses API]Support MCP tools with streaming mode + background mode Aug 30, 2025
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heheda12345
Copy link
Collaborator

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for MCP tools with streaming and background modes in the Responses API. The changes involve updating the retrieve_responses API endpoint to handle streaming and introducing new logic to manage background streaming requests.

My review focuses on two main points:

  1. A potential deadlock in the new background stream generator due to a race condition. I've provided a suggestion to fix this.
  2. A memory leak in the new event_store, which could be problematic in production.

Overall, the changes look good and align with the feature goal, but these two issues should be addressed.

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the great job!

@heheda12345 heheda12345 enabled auto-merge (squash) September 2, 2025 23:36
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025
@heheda12345 heheda12345 merged commit a38f8bd into vllm-project:main Sep 4, 2025
39 checks passed
@wuhang2014 wuhang2014 deleted the streamingmode_responsesapi branch September 4, 2025 04:53
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature][Responses API] Support MCP tool in background mode

2 participants