[Benchmark] [Diffusion] [Enhancement] Random dataset#1657
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81819cfaf6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
0fdd2fe to
ccb812d
Compare
Signed-off-by: dengyunyang <584797741@qq.com>
ccb812d to
7391629
Compare
SamitHuang
left a comment
There was a problem hiding this comment.
need to double-check the benchmark configs like resolution and frames generated by AI
|
|
||
| ## 3.2 Key Parameters | ||
|
|
||
| | Parameter | Description | |
There was a problem hiding this comment.
we should record all necessary configs, including quant, attention, and cache
| ] | ||
| ``` | ||
|
|
||
| ### Dataset B (1536 Resolution) |
There was a problem hiding this comment.
why use 1536x1536 Resolution?
There was a problem hiding this comment.
I suggest to change to 1024
| * Mix Resolution | ||
| ``` | ||
| [ | ||
| {"width":1280,"height":720,"num_inference_steps":6,"num_frames":80,"fps":16,"weight":1} |
There was a problem hiding this comment.
num_frames should be 4xN + 1
There was a problem hiding this comment.
btw, this resolution and frames can lead to OOM or large running time cost
| --max-concurrency 1 \ | ||
| --enable-negative-prompt \ | ||
| --random-request-config '[ | ||
| {"width":854,"height":480,"num_inference_steps":18,"num_frames":120,"fps":24,"weight":1} |
There was a problem hiding this comment.
num_frames 120 is not proper.
why num_inference_steps varies in each dataset?
…formance.md Signed-off-by: Samit <285365963@qq.com>
| # 6. Performance Results | ||
|
|
||
| | Dataset Configuration | Max Concur. | CFG | Usp | Tp | VAE Parallel | Mean Latency (s) | P99 Latency (s) | | ||
| |-----------------------|-----|-----|-----|----|--------------|------------------|------------------| |
There was a problem hiding this comment.
I think peak_memory_mb_max and throughput_qps are also valuable metrics that should be recorded.
There was a problem hiding this comment.
done, already set in metrics, will modify this testing data at once
|
|
||
|
|
||
| async def async_request_v1_videos( | ||
| input: RequestFuncInput, |
There was a problem hiding this comment.
In diffusion_benchmark_serving.py, it says t2v benchmark can also use vllm-omni backends. Why defining another backend here?
There was a problem hiding this comment.
/v1/chat/complete backends actually not support t2v
There was a problem hiding this comment.
A few things still open:
-
wan_2_2 doc still says "Qwen-Image" — the closing line ("official Qwen-Image serving performance reference") wasn't updated. Same issue I flagged last time.
-
Broken JSON in wan doc example command (line ~133):
{"width":854,"height":480,"num_inference_steps":18,"num_frames": 33,"fps":16"weight":1}
Missing comma between "fps":16 and "weight":1.
-
Section numbering is off — both docs have two sections labeled "# 5." (Dataset & Workload Settings, then Performance Metrics).
-
Several of @SamitHuang's and @wtomin's comments still appear unresolved (num_frames values, missing metrics like
peak_memory_mb_max/throughput_qps, sgl-diffusion backend question, duplicate backend question). Worth addressing or replying to those.
The code changes (RandomDataset, v1/videos backend, VAE patch parallel CLI flag) look fine.
| | Metric | Description | Unit | | ||
| | ------------------ | ----------------------------- | ------- | | ||
| | Mean Latency | Mean of latency | seconds | | ||
| | P99 Latency | P99 of latency | seconds | |
There was a problem hiding this comment.
Hi, could we also add P95 latency as a metric? Since P95 latency is the optimization goal in this issue
There was a problem hiding this comment.
it makes sense, already set P95 in metrics
| --dataset random \ | ||
| --task t2i \ | ||
| --num-prompts 1 \ | ||
| --max-concurrency 1 \ |
There was a problem hiding this comment.
should we use a larger concurrency value when testing the preemption mechanism?
There was a problem hiding this comment.
the testing data in different concurrency nums has already shown below
| ring_degree = kwargs.get("ring_degree") or 1 | ||
| sequence_parallel_size = kwargs.get("sequence_parallel_size") | ||
| tensor_parallel_size = kwargs.get("tensor_parallel_size") or 1 | ||
| vae_patch_parallel_size = kwargs.get("vae_patch_parallel_size") or 1 |
There was a problem hiding this comment.
vae patch parallel is added in online serving. Please rebase to the latest main branch.
24c51f4 to
7ded0e0
Compare
Signed-off-by: bjf-frz <frz123db@gmail.com>
7ded0e0 to
1b174f9
Compare
| backends_function_mapping = { | ||
| "vllm-omni": (async_request_chat_completions, "/v1/chat/completions"), | ||
| "openai": (async_request_openai_images, "/v1/images/generations"), | ||
| "v1/videos": (async_request_v1_videos, "/v1/videos"), |
There was a problem hiding this comment.
The key names of backends_function_mapping are a bit confusing. I think the mapping are two levels:
- level 1: task,
i2v,t2vare mapped to video generation,t2iandi2iare mapped to image generation; - level 2: framework, vllm-omni and sglang are mapped to different functions.
Let's come up with a better naming.
There was a problem hiding this comment.
Offline discussion determined to set the backends_function_mapping as a two-level dict, with the first level as "task" and the second level as "backend".
lishunyang12
left a comment
There was a problem hiding this comment.
The issues I flagged on Mar 11 are still present in the current diff:
wan_2_2_serving_performance.mdclosing line still says "official Qwen-Image serving performance reference"- Broken JSON in wan doc example command — missing comma between
"fps":16and"weight":1 - Both docs have duplicate
# 5.section numbering
Please fix these before merge. Also @wtomin's new comment about backend naming is worth addressing.
Signed-off-by: bjf-frz <frz123db@gmail.com>
Signed-off-by: bjf-frz <frz123db@gmail.com>
fixed, thanks a lot |
Signed-off-by: dengyunyang <584797741@qq.com> Signed-off-by: Samit <285365963@qq.com> Signed-off-by: bjf-frz <frz123db@gmail.com> Co-authored-by: Samit <285365963@qq.com> Co-authored-by: bjf-frz <frz123db@gmail.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
(1) diffusion benchmark enhancement:
--- add
enable-negative-prompt: support pass negative prompt for random dataset--- add
random-request-config: support mix resolution request for random dataset(2) add qwen-image/wan2.2 performance dashboard md.
Test Plan
Test with qwen-image/wan2.2, and result can be seen in
qwen_image_serving_performance.mdandwan_2_2_serving_performance.mdTest Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)