[Benchmark] [Diffusion] [Enhancement] Random dataset by Bounty-hunter · Pull Request #1657 · vllm-project/vllm-omni

Bounty-hunter · 2026-03-04T11:32:50Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

(1) diffusion benchmark enhancement:
--- add enable-negative-prompt : support pass negative prompt for random dataset
--- add random-request-config: support mix resolution request for random dataset

(2) add qwen-image/wan2.2 performance dashboard md.

Test Plan

Test with qwen-image/wan2.2, and result can be seen in qwen_image_serving_performance.md and wan_2_2_serving_performance.md

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please paste the results comparison before and after, or the e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81819cfaf6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: dengyunyang <584797741@qq.com>

SamitHuang

need to double-check the benchmark configs like resolution and frames generated by AI

SamitHuang · 2026-03-05T16:42:16Z

+
+## 3.2 Key Parameters
+
+| Parameter             | Description              |


we should record all necessary configs, including quant, attention, and cache

SamitHuang · 2026-03-05T16:42:53Z

+]
+```
+
+### Dataset B (1536 Resolution)


why use 1536x1536 Resolution?

I suggest to change to 1024

SamitHuang · 2026-03-05T16:44:06Z

+* Mix Resolution
+```
+[
+    {"width":1280,"height":720,"num_inference_steps":6,"num_frames":80,"fps":16,"weight":1}


num_frames should be 4xN + 1

btw, this resolution and frames can lead to OOM or large running time cost

SamitHuang · 2026-03-05T16:45:51Z

+    --max-concurrency 1 \
+    --enable-negative-prompt \
+    --random-request-config '[
+        {"width":854,"height":480,"num_inference_steps":18,"num_frames":120,"fps":24,"weight":1}


num_frames 120 is not proper.
why num_inference_steps varies in each dataset?

…formance.md Signed-off-by: Samit <285365963@qq.com>

wtomin · 2026-03-09T11:59:06Z

+# 6. Performance Results
+
+| Dataset Configuration | Max Concur. | CFG | Usp | Tp | VAE Parallel | Mean Latency (s) | P99 Latency (s) |
+|-----------------------|-----|-----|-----|----|--------------|------------------|------------------|


I think peak_memory_mb_max and throughput_qps are also valuable metrics that should be recorded.

done, already set in metrics, will modify this testing data at once

wtomin · 2026-03-11T08:01:33Z



+async def async_request_v1_videos(
+    input: RequestFuncInput,


In diffusion_benchmark_serving.py, it says t2v benchmark can also use vllm-omni backends. Why defining another backend here?

/v1/chat/complete backends actually not support t2v

lishunyang12

A few things still open:

wan_2_2 doc still says "Qwen-Image" — the closing line ("official Qwen-Image serving performance reference") wasn't updated. Same issue I flagged last time.
Broken JSON in wan doc example command (line ~133):

{"width":854,"height":480,"num_inference_steps":18,"num_frames": 33,"fps":16"weight":1}

Missing comma between "fps":16 and "weight":1.

Section numbering is off — both docs have two sections labeled "# 5." (Dataset & Workload Settings, then Performance Metrics).
Several of @SamitHuang's and @wtomin's comments still appear unresolved (num_frames values, missing metrics like peak_memory_mb_max/throughput_qps, sgl-diffusion backend question, duplicate backend question). Worth addressing or replying to those.

The code changes (RandomDataset, v1/videos backend, VAE patch parallel CLI flag) look fine.

yingluosanqian · 2026-03-12T03:32:52Z

+| Metric             | Description                   | Unit    |
+| ------------------ | ----------------------------- | ------- |
+| Mean Latency        | Mean of latency       | seconds |
+| P99 Latency        | P99 of latency             | seconds |


Hi, could we also add P95 latency as a metric? Since P95 latency is the optimization goal in this issue

it makes sense, already set P95 in metrics

yingluosanqian · 2026-03-12T03:34:48Z

+    --dataset random \
+    --task t2i \
+    --num-prompts 1 \
+    --max-concurrency 1 \


should we use a larger concurrency value when testing the preemption mechanism?

the testing data in different concurrency nums has already shown below

wtomin · 2026-03-12T03:39:46Z

            ring_degree = kwargs.get("ring_degree") or 1
            sequence_parallel_size = kwargs.get("sequence_parallel_size")
            tensor_parallel_size = kwargs.get("tensor_parallel_size") or 1
+            vae_patch_parallel_size = kwargs.get("vae_patch_parallel_size") or 1


vae patch parallel is added in online serving. Please rebase to the latest main branch.

Signed-off-by: bjf-frz <frz123db@gmail.com>

wtomin · 2026-03-19T03:02:40Z

 backends_function_mapping = {
    "vllm-omni": (async_request_chat_completions, "/v1/chat/completions"),
    "openai": (async_request_openai_images, "/v1/images/generations"),
+    "v1/videos": (async_request_v1_videos, "/v1/videos"),


The key names of backends_function_mapping are a bit confusing. I think the mapping are two levels:

level 1: task, i2v, t2v are mapped to video generation, t2i and i2i are mapped to image generation;

level 2: framework, vllm-omni and sglang are mapped to different functions.

Let's come up with a better naming.

Offline discussion determined to set the backends_function_mapping as a two-level dict, with the first level as "task" and the second level as "backend".

lishunyang12

The issues I flagged on Mar 11 are still present in the current diff:

wan_2_2_serving_performance.md closing line still says "official Qwen-Image serving performance reference"
Broken JSON in wan doc example command — missing comma between "fps":16 and "weight":1
Both docs have duplicate # 5. section numbering

Please fix these before merge. Also @wtomin's new comment about backend naming is worth addressing.

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz · 2026-03-20T09:31:01Z

The issues I flagged on Mar 11 are still present in the current diff:

wan_2_2_serving_performance.md closing line still says "official Qwen-Image serving performance reference"

Broken JSON in wan doc example command — missing comma between "fps":16 and "weight":1

Both docs have duplicate # 5. section numbering

Please fix these before merge. Also @wtomin's new comment about backend naming is worth addressing.

fixed, thanks a lot

wtomin

LGTM.

Signed-off-by: dengyunyang <584797741@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: bjf-frz <frz123db@gmail.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: bjf-frz <frz123db@gmail.com>

Bounty-hunter requested a review from hsliuustc0106 as a code owner March 4, 2026 11:32

chatgpt-codex-connector Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread benchmarks/diffusion/backends.py Outdated

Bounty-hunter force-pushed the 3_4_performance branch 3 times, most recently from 0fdd2fe to ccb812d Compare March 4, 2026 12:25

performance dashborad

7391629

Signed-off-by: dengyunyang <584797741@qq.com>

Bounty-hunter force-pushed the 3_4_performance branch from ccb812d to 7391629 Compare March 4, 2026 12:31

lishunyang12 reviewed Mar 4, 2026

View reviewed changes

Comment thread benchmarks/diffusion/performance_dashboard/wan_2_2_serving_performance.md

Bounty-hunter added 3 commits March 5, 2026 09:07

fix

bd1101c

Signed-off-by: dengyunyang <584797741@qq.com>

fix

17657ac

Signed-off-by: dengyunyang <584797741@qq.com>

Merge branch 'main' into 3_4_performance

82ab66d

Bounty-hunter mentioned this pull request Mar 5, 2026

[Performance]: Fine-grained Preemptive Scheduling for Diffusion/DiT Inference to Improve SLO #1679

Open

1 task

SamitHuang reviewed Mar 5, 2026

View reviewed changes

Update benchmarks/diffusion/performance_dashboard/wan_2_2_serving_per…

964ee68

…formance.md Signed-off-by: Samit <285365963@qq.com>

wtomin reviewed Mar 9, 2026

View reviewed changes

Comment thread benchmarks/diffusion/diffusion_benchmark_serving.py

david6666666 mentioned this pull request Mar 11, 2026

[RFC]: Qwen-Image、Qwen-Image-Layered、Qwen-Image-Edit-Plus、Wan2.2 Production-grade Feature Monitoring JiusiServe/vllm-omni#167

Closed

26 tasks

wtomin reviewed Mar 11, 2026

View reviewed changes

lishunyang12 reviewed Mar 11, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request Mar 12, 2026

[Perf] Qwen-Image Performance Nightly CI test #1805

Merged

5 tasks

potatoZhx added a commit to potatoZhx/vllm-omni that referenced this pull request Mar 12, 2026

Merge PR vllm-project#1657 from Bounty-hunter/3_4_performance

4bf6f87

yingluosanqian reviewed Mar 12, 2026

View reviewed changes

wtomin reviewed Mar 12, 2026

View reviewed changes

wtomin mentioned this pull request Mar 12, 2026

[RFC]: Diffusion Models Features Supports Plan #814

Open

54 tasks

Merge remote-tracking branch 'upstream/main' into 3_4_performance

0bb5055

bjf-frz force-pushed the 3_4_performance branch 3 times, most recently from 24c51f4 to 7ded0e0 Compare March 14, 2026 08:47

add match between task and backend && fix md

1b174f9

Signed-off-by: bjf-frz <frz123db@gmail.com>

bjf-frz force-pushed the 3_4_performance branch from 7ded0e0 to 1b174f9 Compare March 14, 2026 09:50

Gaohan123 added this to the v0.18.0 milestone Mar 14, 2026

wtomin reviewed Mar 19, 2026

View reviewed changes

lishunyang12 reviewed Mar 19, 2026

View reviewed changes

hsliuustc0106 mentioned this pull request Mar 19, 2026

[RFC]: vLLM-Omni 2026 Q1 Roadmap #677

Open

38 tasks

Bounty-hunter mentioned this pull request Mar 19, 2026

[RFC]: HunyuanImage Model deployment optimization #2015

Open

bjf-frz added 2 commits March 20, 2026 17:03

Merge remote-tracking branch 'upstream/main' into 3_4_performance

431dff8

Signed-off-by: bjf-frz <frz123db@gmail.com>

fix: fix /vi/videos api to async mode && fix doc

5361280

Signed-off-by: bjf-frz <frz123db@gmail.com>

wtomin approved these changes Mar 20, 2026

View reviewed changes

wtomin added the ready label to trigger buildkite CI label Mar 20, 2026

wtomin merged commit ff25479 into vllm-project:main Mar 20, 2026
8 checks passed

Conversation

Bounty-hunter commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

SamitHuang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

bjf-frz commented Mar 20, 2026

Uh oh!

wtomin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Bounty-hunter commented Mar 4, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading