Skip to content

[Bug]: Streaming=True causes missing or scrambled tokens with GPT-OSS 120B on vLLM v0.11.0 #28635

@cxz1418

Description

@cxz1418

Your current environment

| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |

  • vLLM versions: v0.10.2 and v0.11.0
  • Model: GPT-OSS 120B
  • Streaming: True
  • --enforce-eager: Tested

🐛 Describe the bug

Issue:
When using Streaming=True, some tokens are missing or arrive in scrambled order.

  • With Streaming=False, all tokens are generated correctly.
  • Using --enforce-eager produces the correct token sequence but significantly slows down generation.

This issue occurs in both v0.10.2 and v0.11.0.

Expected behavior:
Streaming should produce all tokens in the correct order, similar to --enforce-eager, without the performance penalty.

Steps to reproduce:

  1. Run GPT-OSS 120B with vLLM v0.11.0 (or v0.10.2)
  2. Enable streaming (Streaming=True)
  3. Generate text and observe missing or scrambled tokens

Additional notes:
The problem appears to be specific to asynchronous streaming. Using eager execution ensures correct token order but reduces speed.

Command Example

CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve openai/gpt-oss-120b \
   --tensor-parallel-size 4 \
   --enable-expert-parallel \
   --tool-call-parser openai \
   --reasoning-parser openai_gptoss \
   --enable-auto-tool-choice \
   --async-scheduling \
   --max-model-len 131072 \
   --gpu-memory-utilization 0.90 \
   --max-num-seqs 32 \
   --host 0.0.0.0 \
   --max-num-batched-tokens 8192 \
   --port 20003

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions