Skip to content

[Test] Fix HunyuanImage3 nightly perf startup#3819

Closed
TaffyOfficial wants to merge 3 commits into
vllm-project:mainfrom
TaffyOfficial:codex/nightly-pr3582-tests
Closed

[Test] Fix HunyuanImage3 nightly perf startup#3819
TaffyOfficial wants to merge 3 commits into
vllm-project:mainfrom
TaffyOfficial:codex/nightly-pr3582-tests

Conversation

@TaffyOfficial
Copy link
Copy Markdown
Contributor

@TaffyOfficial TaffyOfficial commented May 22, 2026

Purpose

Fix the HunyuanImage3 nightly perf path so the run-nightly script executes every selected pytest command, starts HunyuanImage3 perf with the intended DiT-only deploy topology, and survives an offline cache that lacks generation_config.json.

What Was Broken

  1. tools/nightly/run_nightly_jobs.sh generated one shell job from a Buildkite step, but some Buildkite steps contain multiple pytest commands. With set -e, the first failing pytest stopped the job and later Hunyuan variants could be skipped.

  2. The HunyuanImage3 perf JSONs launched tencent/HunyuanImage-3.0-Instruct without the DiT-only deploy config. In the single-server perf topology, async chunking is invalid because there is no next-stage input processor.

  3. HunyuanImage3 requires Hugging Face remote code, so the perf server args need --trust-remote-code.

  4. In offline cache, HunyuanImage3Pipeline.__init__ could fail before loading weights when the snapshot had config.json and tokenizer files but no generation_config.json.

How This PR Fixes It

  • Preserve all pytest commands generated from one Buildkite step and return the combined failure status after every command has run.
  • Use the existing hunyuan_image3_dit.yaml deploy config for all three HunyuanImage3 perf cases, then keep only the per-case CLI overrides for TP/SP/CFG/quantization/profiler/trust-remote-code.
  • Resolve perf deploy configs from the repository vllm_omni/deploy directory so stage_config_name can point at the shared Hunyuan deploy YAML.
  • Add a HunyuanImage3 generation-config loader that preserves real hub/snapshot values when available, but logs a warning and fills the bundled Hunyuan defaults when loading generation config fails in offline cache.

This PR does not change perf baselines or model execution logic. It only fixes nightly job execution and startup metadata/configuration needed for HunyuanImage3 perf startup.

Test Plan

git diff --check
python -m json.tool tests/dfx/perf/tests/test_hunyuan_image_tp4_fp8.json
python -m json.tool tests/dfx/perf/tests/test_hunyuan_image_tp2_fp8_sp2.json
python -m json.tool tests/dfx/perf/tests/test_hunyuan_image_tp2_fp8_cfgp2.json
python -m ruff check vllm_omni/diffusion/models/hunyuan_image3/pipeline_hunyuan_image3.py tests/diffusion/models/hunyuan_image3/test_generation_config.py tests/dfx/perf/scripts/run_benchmark.py
python -m ruff format --check --diff vllm_omni/diffusion/models/hunyuan_image3/pipeline_hunyuan_image3.py tests/diffusion/models/hunyuan_image3/test_generation_config.py tests/dfx/perf/scripts/run_benchmark.py
bash -n tools/nightly/run_nightly_jobs.sh
PYTHONPATH=/data/wzr/wt-nightly-pr3582-tests-codex python3 -m pytest tests/diffusion/models/hunyuan_image3/test_generation_config.py -q
bash tools/nightly/run_nightly_jobs.sh --test-type perf --model-type diffusion --label-substr HunyuanImage3

Remote validation used only /data/wzr for persistent files, with:

HF_HUB_OFFLINE=1
TRANSFORMERS_OFFLINE=1
CUDA_VISIBLE_DEVICES=2,3,4,5

Test Result

Local/static checks:

git diff --check: passed
json.tool for all three Hunyuan perf JSONs: passed
Hunyuan deploy path resolution check: passed
ruff check: All checks passed!
ruff format --check --diff: 3 files already formatted
bash -n tools/nightly/run_nightly_jobs.sh: passed

Local Windows pytest collection is blocked by missing full vllm package in this environment:

ModuleNotFoundError: No module named 'vllm'

Remote unit test before the latest local-path fallback case:

PYTHONPATH=/data/wzr/wt-nightly-pr3582-tests-codex python3 -m pytest tests/diffusion/models/hunyuan_image3/test_generation_config.py -q
..                                                                       [100%]
2 passed, 18 warnings

Remote Hunyuan nightly startup evidence:

Before this PR:
1. failed on async_chunk=True with no next-stage input processor
2. after disabling async chunk, failed because trust_remote_code=True was missing
3. after adding trust_remote_code, failed because generation_config.json was missing in offline cache

After this PR:
1. server args include trust_remote_code=True
2. all three Hunyuan perf configs use the DiT-only deploy config with async_chunk=false
3. the missing generation_config.json failure is gone
4. TP4 proceeds to diffusion weight loading

The full Hunyuan nightly still cannot finish on the tested server because the model cache itself is incomplete there. /data/wzr has no HunyuanImage3 weight files, and the available cached snapshot only contains config/tokenizer files. After this PR, the next and current blocker is:

RuntimeError: Cannot find any model weights with `tencent/HunyuanImage-3.0-Instruct`

That final error is expected for this server state: the code now reaches weight loading, but the required model weights are not present in the allowed /data/wzr path.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@TaffyOfficial
Copy link
Copy Markdown
Contributor Author

@Bounty-hunter

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nightly script change to preserve all pytest commands per Buildkite step is correct. Using bitwise OR for overall_status ensures any failure propagates. The generation_config fallback with warning is a sensible offline cache workaround. The test coverage for both paths is sufficient.

@Bounty-hunter
Copy link
Copy Markdown
Contributor

The changes too complicated, could we just specific deploy config hunyuan_image3_dit.yaml and overwirte field with cli?

@TaffyOfficial TaffyOfficial force-pushed the codex/nightly-pr3582-tests branch from 67d8407 to 48ed2d8 Compare May 26, 2026 02:29
@TaffyOfficial
Copy link
Copy Markdown
Contributor Author

@Bounty-hunter Thanks, updated in that direction.

The three HunyuanImage3 perf JSONs now use hunyuan_image3_dit.yaml as the deploy base, and only keep the per-case CLI overrides for TP/SP/CFG/quantization/profiler/trust-remote-code.

I kept two pieces separate because they are outside the deploy-config simplification:

  1. run_nightly_jobs.sh still needs to execute every pytest command from one Buildkite step before returning the combined status.
  2. The generation-config fallback still handles the offline cache case where config/tokenizer files exist but generation_config.json is missing. I also narrowed the fallback path so local snapshots can use _name_or_path from hf_config instead of relying only on the model path string.

@TaffyOfficial TaffyOfficial force-pushed the codex/nightly-pr3582-tests branch from 48ed2d8 to f51761a Compare May 27, 2026 07:26
Signed-off-by: TaffyOfficial <2324465096@qq.com>
@TaffyOfficial TaffyOfficial force-pushed the codex/nightly-pr3582-tests branch from f51761a to a8e4bb9 Compare May 27, 2026 07:39
@Bounty-hunter
Copy link
Copy Markdown
Contributor

Paste the execution results and compare their execution times.

"server_type": "vllm-omni",
"server_params": {
"model": "tencent/HunyuanImage-3.0-Instruct",
"stage_overrides": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an bug(3483) that stage_overrides can't overwrite correct,please check it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, pasted the comparable execution numbers below.

For the current HunyuanImage3 perf JSON baselines, all use 1024x1024, 50 steps, 10 prompts, max_concurrency=1:

config throughput_qps latency_p99 peak_memory_mb_max
tp4_fp8 0.0800 13.1227s 46838
tp2_fp8_sp2 0.0800 12.0731s 66314
tp2_fp8_cfgp2 0.1035 9.9057s 66470

Compared with tp4_fp8:

  • tp2_fp8_sp2 has 8.0% lower p99 latency, same recorded throughput, and about 41.6% higher peak memory.
  • tp2_fp8_cfgp2 has 24.5% lower p99 latency, 29.4% higher throughput, and about 41.9% higher peak memory.

I also checked the #3483 override issue. The broken path there is flat diffusion parallel overrides being left as top-level engine_args fields while diffusion reads engine_args.parallel_config. This PR is not using that flat path for the TP2 cases: it passes a full nested parallel_config through stage_overrides, and a local materialization check gives parallel_config.tensor_parallel_size=2 / parallel_config.cfg_parallel_size=2 with no flat tensor_parallel_size left in engine_args.

If we want to use flat CLI flags instead, then this PR should wait for or rebase on #3483.

@Bounty-hunter
Copy link
Copy Markdown
Contributor

LGTM

@TaffyOfficial
Copy link
Copy Markdown
Contributor Author

@hsliuustc0106 @Gaohan123

@Bounty-hunter
Copy link
Copy Markdown
Contributor

Running it directly results in an error. It seems like the config path needs to be changed to an absolute path.
image

Signed-off-by: TaffyOfficial <2324465096@qq.com>
@TaffyOfficial
Copy link
Copy Markdown
Contributor Author

TaffyOfficial commented May 29, 2026

@Bounty-hunter @congw729 fix

@congw729
Copy link
Copy Markdown
Collaborator

@Bounty-hunter @congw729 fix

Thanks. I will try this tonight.

Signed-off-by: TaffyOfficial <2324465096@qq.com>
@Bounty-hunter
Copy link
Copy Markdown
Contributor

This changes not correct, try with #3996

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants