Skip to content

[CI/BugFix] Fix Flaky Test for Qwen Omni Perf#2754

Merged
Gaohan123 merged 1 commit into
vllm-project:mainfrom
alex-jw-brooks:fix_stream_parse
Apr 14, 2026
Merged

[CI/BugFix] Fix Flaky Test for Qwen Omni Perf#2754
Gaohan123 merged 1 commit into
vllm-project:mainfrom
alex-jw-brooks:fix_stream_parse

Conversation

@alex-jw-brooks
Copy link
Copy Markdown
Contributor

@alex-jw-brooks alex-jw-brooks commented Apr 13, 2026

Purpose

Fixes the flaky test in the build linked here: #2752 #2389

Streaming requests in vLLM / vLLM Omni follow SSE specification. Since we largely send data, this mostly means that we are sending things like:

b'data: {JSON}\n\n'

Importantly, the space after the : does matter. In our performance script, we are currently .strip() ing all incoming chunks. This is the underlying cause of the erratic CI failures, because higher concurrency in streaming requests can lead to situations like this:

chunk1: b'data: '
chunk2: b'{JSON}\n\n'

When we encounter this case, add_chunks on the handler will add the stripped messages, i.e., giving data:{JSON}\n\n. As a result chunk = message.removeprefix("data: ") later in our script doesn't do anything, and it tries to decode the JSON with the data: in front, which causes the parsing error.

Reproducing it is a bit difficult, but I did log one of the failed requests out and did see the leading data: on it when decoding failed. The best way to repro is likely with a higher concurrency config, e.g., running pytest tests/dfx/perf/scripts/run_benchmark.py -s with the config path set to point to something like below:

[
    {
        "test_name": "test_qwen3_omni_chunk_stress",
        "server_params": {
            "model": "Qwen/Qwen3-Omni-30B-A3B-Instruct",
            "stage_config_name": "qwen3_omni.yaml",
            "update": {
                "async_chunk": true,
                "stage_args": {
                    "0": {
                        "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.thinker2talker_async_chunk"
                    },
                    "1": {
                        "engine_args.custom_process_next_stage_input_func": "vllm_omni.model_executor.stage_input_processors.qwen3_omni.talker2code2wav_async_chunk"
                    }
                }
            },
            "delete": {
                "stage_args": {
                    "2": [
                        "custom_process_input_func"
                    ]
                }
            }
        },
        "benchmark_params": [
            {
                "dataset_name": "random",
                "backend": "openai-chat-omni",
                "endpoint": "/v1/chat/completions",
                "num_prompts": 500,
                "max_concurrency": 32,
                "random_input_len": 100,
                "random_output_len": 100,
                "ignore_eos": true,
                "percentile-metrics": "ttft,tpot,itl,e2el,audio_rtf,audio_ttfp,audio_duration",
                "baseline": {
                    "mean_ttft_ms": 10000,
                    "mean_audio_ttfp_ms": 10000,
                    "mean_audio_rtf": 1.0
                }
            }
        ]
    }
]

CC @tzhouam

Signed-off-by: Alex Brooks <albrooks@redhat.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@amy-why-3459
Copy link
Copy Markdown
Contributor

#2389
Thank you so much for your fix. I believe your fix will also resolve this issue.

@yenuo26 yenuo26 added nightly-test label to trigger buildkite nightly test CI ready label to trigger buildkite CI and removed nightly-test label to trigger buildkite nightly test CI labels Apr 14, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

BLOCKER scan:

  • Correctness: PASS
  • Reliability/Safety: PASS
  • Breaking Changes: PASS
  • Test Coverage: PASS (CI tests verify)
  • Documentation: PASS
  • Security: PASS

OVERALL: NO BLOCKERS

VERDICT: COMMENT

Good catch on the TCP fragmentation issue. SSE parsing requires exact handling of whitespace - stripping can break the protocol. The comment explaining the issue is clear and helpful.

@yenuo26
Copy link
Copy Markdown
Collaborator

yenuo26 commented Apr 14, 2026

@Gaohan123 @gcanlin @princepride Please help review whether this is ready to be merged.

Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@Gaohan123 Gaohan123 merged commit cf1fcd5 into vllm-project:main Apr 14, 2026
8 checks passed
y123456y78 pushed a commit to y123456y78/vllm-omni that referenced this pull request Apr 15, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com>
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com>
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com>
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: Alex Brooks <albrooks@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants