[Perf] Qwen-Image Performance Nightly CI test#1805
Conversation
350a928 to
2ea794b
Compare
|
@congw729 Alicia, how to solve thi email sending problem? python tools/nightly/send_nightly_perf_email.py --report-file tests
Missing required env vars: SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD, DAILY_EMAIL_LIST. Set them (e.g. in Buildkite secrets).Besides, can you check it the .xlsx file is correct? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ea794b290
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@hsliuustc0106 Can you check if the dataset configuration and test case coverage are in line with your expectations? |
Hi, you can set those using your personal email account token (like Gmail) for your local testing (which will send email). Or you can set those env to None and using |
|
Can you send me the performance results JSON file? I need to make sure the combination of all models' performance results is correctly shown. |
|
Have you checked the #1321? I'm not sure if you are using the same key for the same meaning metric. |
if we only check the performance here, we need to use diffusion profiler for each stage and make the inference step smaller, maybe even 1&2 is enough. |
|
please check #1657 for dataset A&B&C |
Sure. Sent to you.
A lot of metrics name of omni models and diffusion models are not overlapped. An example of diffusion performance json file: {
"duration": 55.729680337011814,
"completed_requests": 10,
"failed_requests": 0,
"throughput_qps": 0.17943759841304327,
"latency_mean": 5.572730791615323,
"latency_median": 5.546993801603094,
"latency_p99": 5.834732530997135,
"latency_p50": 5.546993801603094,
"peak_memory_mb_max": 0,
"peak_memory_mb_mean": 0,
"peak_memory_mb_median": 0,
"backend": "vllm-omni",
"model": "Qwen/Qwen-Image",
"dataset": "random",
"task": "t2i"
}I think we cannot use the same key names as the omni models's json file. |
I see, in that case, we could have two sub-sheets for different types of models. |
|
@congw729 As requested, the email sending label is removed from test-nightly.yml. Related edits were reverted. |
|
I have another question, in Omni benchmark tests, we will have max_concurrency and num_prompts to control the request frequency. Do we need those configurations in diffusion models' benchmark tests? |
I think max_concurrency value is only valid when batch request is supported for diffusion models. Since most diffusion models do not support batch request (@fhfuih only Qwen-Image support it? I think wan2.2 does not support batch request), then we should always use max_concurrency =1. Even if we enlarge the max_concurrency, the request is still feeding to the pipeline one by one. Please correct me if I am wrong. @Bounty-hunter Have you tested max_concurrency >1 in your PR #1657? |
Correct, not all models support batch request. For the common Alibaba ones, Wan & Qwen-Image-Edit-* don't support. Qwen-Image supports. Z-Image can run, but the internal paddings and stuff may not be truly optimized for batch requests |
37569a7 to
6806ee8
Compare
|
I suggest to test different resolutions, in each resolution, we only need to select a few(maybe only the lowest latency one) parallel strategy to monitor the performance. |
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
5860bf4 to
5d1d3c5
Compare
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
|
Hi, as we discussed earlier, would you attach the baseline results to |
Signed-off-by: Didan Deng <33117903+wtomin@users.noreply.github.com>
Purpose
This is one of the PRs addressing this RFC #1606 .
This PR aims to run night performance benchmark for
Qwen-Imagetext-to-image tasks, and generate performance records in json files.The current acceleration combinations includes:
The acceleration combinations are defined as "server_params" in
tests/perf/tests/test_qwen_image_vllm_omni.jsonortests/perf/tests/test_qwen_image_sglang_diffusion.json.The current dataset configuration:
The datasets params are defined in
tests/perf/tests/benchmark_params.json.The number of test cases equals to:
The number of server_params x The number of benchmark_params = 9
Test Plan
Only vllm-omni benchmarks is defined in night CI now. Sglang benchmark is verified locally.
If running it locally:
It tooked about 12mins on H800:
Afterwards, json files will be generated, for example:
Test Result
All results of both vllm-omni and sglang are shared in this URL.
An example of json file of vllm-omni:
{ "test_name": "test_qwen_image_ulysses2_cfg2", "backend": "vllm-omni", "timestamp": "20260323-100309", "benchmark_params": { "name": "512x512_steps20", "dataset": "random", "task": "t2i", "width": 512, "height": 512, "num-inference-steps": 20, "num-prompts": 10, "max-concurrency": 1, "enable-negative-prompt": true, "baseline": { "throughput_qps": 0.001, "latency_p99": 1000.0, "peak_memory_mb_max": 400000, "peak_memory_mb_mean": 400000 } }, "result": { "duration": 13.625132665038109, "completed_requests": 10, "failed_requests": 0, "throughput_qps": 0.7339378078614863, "latency_mean": 1.3623275712132454, "latency_median": 1.364011375233531, "latency_p99": 1.3710215020179748, "latency_p95": 1.3701544940471648, "latency_p50": 1.364011375233531, "peak_memory_mb_max": 60202.0, "peak_memory_mb_mean": 60202.0, "peak_memory_mb_median": 60202.0, "stage_durations_mean": { "QwenImagePipeline.text_encoder.forward": 0.012416277453303337, "QwenImagePipeline.diffuse": 1.2124435044825077, "QwenImagePipeline.vae.decode": 0.018585569970309735 }, "stage_durations_p50": { "QwenImagePipeline.text_encoder.forward": 0.012366048991680145, "QwenImagePipeline.diffuse": 1.2126157889142632, "QwenImagePipeline.vae.decode": 0.01858174055814743 }, "stage_durations_p99": { "QwenImagePipeline.text_encoder.forward": 0.012640358190983534, "QwenImagePipeline.diffuse": 1.2175082486867905, "QwenImagePipeline.vae.decode": 0.018637520931661128 }, "backend": "vllm-omni", "model": "Qwen/Qwen-Image", "dataset": "random", "task": "t2i" }, "log_file": "/home/fq9hpsac/fq9hpsacuser04/workspace/vllm-omni-ddd/tests/perf/results/logs/test_qwen_image_ulysses2_cfg2_vllm-omni_20260323-100309.log" },Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)