Skip to content

[Benchmark] [Diffusion] [Enhancement] Random dataset#1657

Merged
wtomin merged 9 commits into
vllm-project:mainfrom
Bounty-hunter:3_4_performance
Mar 20, 2026
Merged

[Benchmark] [Diffusion] [Enhancement] Random dataset#1657
wtomin merged 9 commits into
vllm-project:mainfrom
Bounty-hunter:3_4_performance

Conversation

@Bounty-hunter
Copy link
Copy Markdown
Contributor

@Bounty-hunter Bounty-hunter commented Mar 4, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

(1) diffusion benchmark enhancement:
--- add enable-negative-prompt : support pass negative prompt for random dataset
--- add random-request-config: support mix resolution request for random dataset

(2) add qwen-image/wan2.2 performance dashboard md.

Test Plan

Test with qwen-image/wan2.2, and result can be seen in qwen_image_serving_performance.md and wan_2_2_serving_performance.md

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81819cfaf6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread benchmarks/diffusion/backends.py Outdated
@Bounty-hunter Bounty-hunter force-pushed the 3_4_performance branch 3 times, most recently from 0fdd2fe to ccb812d Compare March 4, 2026 12:25
Signed-off-by: dengyunyang <584797741@qq.com>
Signed-off-by: dengyunyang <584797741@qq.com>
Signed-off-by: dengyunyang <584797741@qq.com>
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to double-check the benchmark configs like resolution and frames generated by AI


## 3.2 Key Parameters

| Parameter | Description |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should record all necessary configs, including quant, attention, and cache

]
```

### Dataset B (1536 Resolution)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use 1536x1536 Resolution?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to change to 1024

* Mix Resolution
```
[
{"width":1280,"height":720,"num_inference_steps":6,"num_frames":80,"fps":16,"weight":1}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_frames should be 4xN + 1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this resolution and frames can lead to OOM or large running time cost

--max-concurrency 1 \
--enable-negative-prompt \
--random-request-config '[
{"width":854,"height":480,"num_inference_steps":18,"num_frames":120,"fps":24,"weight":1}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_frames 120 is not proper.
why num_inference_steps varies in each dataset?

Comment thread benchmarks/diffusion/performance_dashboard/wan_2_2_serving_performance.md Outdated
…formance.md

Signed-off-by: Samit <285365963@qq.com>
# 6. Performance Results

| Dataset Configuration | Max Concur. | CFG | Usp | Tp | VAE Parallel | Mean Latency (s) | P99 Latency (s) |
|-----------------------|-----|-----|-----|----|--------------|------------------|------------------|
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think peak_memory_mb_max and throughput_qps are also valuable metrics that should be recorded.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, already set in metrics, will modify this testing data at once

Comment thread benchmarks/diffusion/diffusion_benchmark_serving.py


async def async_request_v1_videos(
input: RequestFuncInput,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In diffusion_benchmark_serving.py, it says t2v benchmark can also use vllm-omni backends. Why defining another backend here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/v1/chat/complete backends actually not support t2v

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things still open:

  1. wan_2_2 doc still says "Qwen-Image" — the closing line ("official Qwen-Image serving performance reference") wasn't updated. Same issue I flagged last time.

  2. Broken JSON in wan doc example command (line ~133):

{"width":854,"height":480,"num_inference_steps":18,"num_frames": 33,"fps":16"weight":1}

Missing comma between "fps":16 and "weight":1.

  1. Section numbering is off — both docs have two sections labeled "# 5." (Dataset & Workload Settings, then Performance Metrics).

  2. Several of @SamitHuang's and @wtomin's comments still appear unresolved (num_frames values, missing metrics like peak_memory_mb_max/throughput_qps, sgl-diffusion backend question, duplicate backend question). Worth addressing or replying to those.

The code changes (RandomDataset, v1/videos backend, VAE patch parallel CLI flag) look fine.

potatoZhx added a commit to potatoZhx/vllm-omni that referenced this pull request Mar 12, 2026
| Metric | Description | Unit |
| ------------------ | ----------------------------- | ------- |
| Mean Latency | Mean of latency | seconds |
| P99 Latency | P99 of latency | seconds |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could we also add P95 latency as a metric? Since P95 latency is the optimization goal in this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it makes sense, already set P95 in metrics

--dataset random \
--task t2i \
--num-prompts 1 \
--max-concurrency 1 \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a larger concurrency value when testing the preemption mechanism?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the testing data in different concurrency nums has already shown below

Comment thread vllm_omni/entrypoints/async_omni.py Outdated
ring_degree = kwargs.get("ring_degree") or 1
sequence_parallel_size = kwargs.get("sequence_parallel_size")
tensor_parallel_size = kwargs.get("tensor_parallel_size") or 1
vae_patch_parallel_size = kwargs.get("vae_patch_parallel_size") or 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vae patch parallel is added in online serving. Please rebase to the latest main branch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@bjf-frz bjf-frz force-pushed the 3_4_performance branch 3 times, most recently from 24c51f4 to 7ded0e0 Compare March 14, 2026 08:47
Signed-off-by: bjf-frz <frz123db@gmail.com>
@Gaohan123 Gaohan123 added this to the v0.18.0 milestone Mar 14, 2026
Comment thread benchmarks/diffusion/backends.py Outdated
backends_function_mapping = {
"vllm-omni": (async_request_chat_completions, "/v1/chat/completions"),
"openai": (async_request_openai_images, "/v1/images/generations"),
"v1/videos": (async_request_v1_videos, "/v1/videos"),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key names of backends_function_mapping are a bit confusing. I think the mapping are two levels:

  • level 1: task, i2v, t2v are mapped to video generation, t2i and i2i are mapped to image generation;
  • level 2: framework, vllm-omni and sglang are mapped to different functions.

Let's come up with a better naming.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline discussion determined to set the backends_function_mapping as a two-level dict, with the first level as "task" and the second level as "backend".

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues I flagged on Mar 11 are still present in the current diff:

  1. wan_2_2_serving_performance.md closing line still says "official Qwen-Image serving performance reference"
  2. Broken JSON in wan doc example command — missing comma between "fps":16 and "weight":1
  3. Both docs have duplicate # 5. section numbering

Please fix these before merge. Also @wtomin's new comment about backend naming is worth addressing.

bjf-frz added 2 commits March 20, 2026 17:03
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
@bjf-frz
Copy link
Copy Markdown
Contributor

bjf-frz commented Mar 20, 2026

The issues I flagged on Mar 11 are still present in the current diff:

  1. wan_2_2_serving_performance.md closing line still says "official Qwen-Image serving performance reference"
  2. Broken JSON in wan doc example command — missing comma between "fps":16 and "weight":1
  3. Both docs have duplicate # 5. section numbering

Please fix these before merge. Also @wtomin's new comment about backend naming is worth addressing.

fixed, thanks a lot

Copy link
Copy Markdown
Collaborator

@wtomin wtomin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@wtomin wtomin added the ready label to trigger buildkite CI label Mar 20, 2026
@wtomin wtomin merged commit ff25479 into vllm-project:main Mar 20, 2026
8 checks passed
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: dengyunyang <584797741@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
Co-authored-by: Samit <285365963@qq.com>
Co-authored-by: bjf-frz <frz123db@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants