Fix OmniGen2 transformer config loading for HF models by Joshna-Medisetty · Pull Request #1934 · vllm-project/vllm-omni

Joshna-Medisetty · 2026-03-16T20:38:20Z

Purpose

When OmniGen2 is loaded using a HuggingFace repo ID (for example OmniGen2/OmniGen2), the previous implementation constructed the transformer config path as:

transformer_config_path = os.path.join(model, "transformer", "config.json")

This assumes that model is a local directory. However, when a HuggingFace repo ID is provided, the model files are actually stored inside the HuggingFace cache snapshot directory rather than the repo name itself.
Because of this, the pipeline attempted to locate the config file at: OmniGen2/OmniGen2/transformer/config.json
which does not exist locally.
This caused OmniGen2 initialization to fail.

Error Logs

Below is the error observed during model loading:

RuntimeError: [DEBUG] model=OmniGen2/OmniGen2,
path=OmniGen2/OmniGen2/transformer/config.json,
exists=False

[Stage-0] ERROR [multiproc_executor.py:118]
Rank 0 scheduler is dead. Please check if there are relevant logs

[Stage-0] ERROR [multiproc_executor.py:120]
Exit code: 1

The logs confirmed that the pipeline attempted to locate transformer/config.json using the HuggingFace repo ID directly instead of resolving the HuggingFace cache snapshot path.

Root Cause

For HuggingFace models, files are downloaded into the HF cache under a snapshot directory such as:

~/.cache/huggingface/hub/models--OmniGen2--OmniGen2/snapshots/<hash>/

The previous implementation did not resolve this snapshot path before constructing the transformer config path.

Fix

This PR resolves the transformer config path using the HuggingFace utility:

get_hf_file_to_dict

This ensures that the config file can be correctly located regardless of whether the model is provided as:

a local model directory
a HuggingFace repo ID

The transformer config is now retrieved through HF utilities that resolve the cached snapshot path automatically.

Test Plan

Tested OmniGen2 loading using a HuggingFace repo ID.

Example command used for validation:

python examples/offline_inference/image_to_image/image_edit.py \
  --image qwen-bear.png \
  --model "OmniGen2/OmniGen2" \
  --prompt "Change the background to classroom."

Test Result

Before fix:

RuntimeError: [DEBUG] model=OmniGen2/OmniGen2
path=OmniGen2/OmniGen2/transformer/config.json
exists=False

Rank 0 scheduler is dead

After fix:

Loading checkpoint shards: 100% | 4/4
OmniDiffusion generation completed successfully
Saved edited image to outputs/hf_image_edit.png

Model initialization and inference now complete successfully when loading OmniGen2 using a HuggingFace repo ID.

Joshna-Medisetty · 2026-03-16T20:42:05Z

cc @xuechendi

xuechendi · 2026-03-16T20:45:11Z

@Gaohan123 @gcanlin , may you help to take a look
This is a general bug fix for OmniGen2. Original codes does not work when using direct HF path - --model OmniGen2/OmniGen2
This PR aims to fix this issue

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

hsliuustc0106

PR Review: Fix OmniGen2 transformer config loading for HF models

Gate Status: PASSING ✓

All CI checks pass.

Analysis

What this PR fixes:

Correctly handles HuggingFace repo IDs by using get_hf_file_to_dict instead of os.path.join
Before: assumed model was a local directory
After: properly resolves HF cache snapshot path

Agree with @gcanlin's feedback:

If od_config.tf_model_config already contains the loaded transformer config at the entrypoint level, the fix should reuse it rather than reload:

# Instead of:
transformer_config = get_hf_file_to_dict("transformer/config.json", model, ...)

# Consider:
transformer_config = od_config.tf_model_config

This would be more efficient and avoid redundant file reads.

Summary

Validated	Needs Revision
Bug correctly identified	Reuse `od_config.tf_model_config` instead of reloading
Fix works for HF repo IDs
Test results show fix works

🤖 Generated with Claude Code

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty · 2026-03-18T20:56:38Z

@gcanlin @hsliuustc0106
Addressed feedback: transformer kwargs come from od_config.tf_model_config.params (entrypoint-loaded config), no redundant get_hf_file_to_dict in the pipeline. Verified with HF repo id and local snapshot - works.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-03-19T01:03:31Z

@gcanlin @hsliuustc0106 Addressed feedback: transformer kwargs come from od_config.tf_model_config.params (entrypoint-loaded config), no redundant get_hf_file_to_dict in the pipeline. Verified with HF repo id and local snapshot - works.

Thanks. LGTM overall. Just notice that the way of loading config for OmniGen2 is not consistent with other models. It's better to use get_transformer_config_kwargs. I have pushed one commit to change it. Could you please test it again?

Joshna-Medisetty · 2026-03-19T02:16:54Z

@gcanlin Tested again after your change - everything looks good now. Works well. Thanks!

) Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>

) Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: Hui <1779066624@qq.com>

) Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

### vllm-omni-audio-tts - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-perf - Source: [PR #2059](vllm-project/vllm-omni#2059) - [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool - Changes: - Bug fix: [BugFix][Qwen3TTS] CodePredictor CudaGraph Pool ### vllm-omni-api - Source: [PR #2058](vllm-project/vllm-omni#2058) - [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection - Changes: - Bug fix: [Bugfix] Fix Fish Speech and CosyVoice3 online serving - missing is_comprehension and broken model detection ### vllm-omni-contrib - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-cicd - Source: [PR #2045](vllm-project/vllm-omni#2045) - [Voxtral] Improve example ### vllm-omni-api - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-perf - Source: [PR #2042](vllm-project/vllm-omni#2042) - [bugfix] /chat/completion doesn't read extra_body for diffusion model - Changes: - Bug fix: [bugfix] /chat/completion doesn't read extra_body for diffusion model ### vllm-omni-contrib - Source: [PR #2038](vllm-project/vllm-omni#2038) - [Doc] Update docs and dockerfiles for rebase of vllm v0.18.0 ### vllm-omni-serving - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-contrib - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-api - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2037](vllm-project/vllm-omni#2037) - [Rebase] Rebase to vllm v0.18.0 ### vllm-omni-cicd - Source: [PR #2032](vllm-project/vllm-omni#2032) - [CI] Change Bagel online test environment variable `VLLM_TEST_CLEAN_GPU_MEMORY` to `0` ### vllm-omni-cicd - Source: [PR #2031](vllm-project/vllm-omni#2031) - [CI] Fix test. - Changes: - Bug fix: [CI] Fix test. ### vllm-omni-cicd - Source: [PR #2017](vllm-project/vllm-omni#2017) - [CI] [ROCm] Setup `test-ready.yml` and `test-merge.yml` ### vllm-omni-cicd - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-perf - Source: [PR #2014](vllm-project/vllm-omni#2014) - [Test] Implement mock HTTP request handling in benchmark CLI tests ### vllm-omni-serving - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-image-gen - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-perf - Source: [PR #2012](vllm-project/vllm-omni#2012) - [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips - Changes: - Bug fix: [Fixbug][Perf] Qwen3-omni: code predictor with re-prefill + SDPA and eliminate decode hot-path CPU round-trips ### vllm-omni-serving - Source: [PR #2009](vllm-project/vllm-omni#2009) - [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni - Changes: - Bug fix: [Bugfix] revert PR#1758 which introduced the accuracy problem of qwen3-omni ### vllm-omni-image-gen - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images - Additions: - Qwen-Image-Layered - Qwen-Image-Layered - Qwen-Image-Layered ### vllm-omni-api - Source: [PR #2007](vllm-project/vllm-omni#2007) - [Bugfix]Fix bug of online server can not return mutli images - Changes: - Bug fix: [Bugfix]Fix bug of online server can not return mutli images ### vllm-omni-cicd - Source: [PR #1998](vllm-project/vllm-omni#1998) - [CI] Split BAGEL tests into dummy/real weight tiers (L2/L3) ### vllm-omni-serving - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-audio-tts - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-perf - Source: [PR #1985](vllm-project/vllm-omni#1985) - [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls - Changes: - Performance improvement: [Perf] [Qwen3-TTS] Keep audio_codes and last_talker_hidden on GPU to eliminate per-step sync stalls ### vllm-omni-serving - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-api - Source: [PR #1984](vllm-project/vllm-omni#1984) - [CI] [ROCm] Bugfix device environment issue - Changes: - Bug fix: [CI] [ROCm] Bugfix device environment issue ### vllm-omni-serving - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-cicd - Source: [PR #1982](vllm-project/vllm-omni#1982) - [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ - Changes: - Bug fix: [Fix] Fix slow hasattr in CUDAGraphWrapper.__getattr__ ### vllm-omni-api - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Additions: - `/v1/chat/completions` ### vllm-omni-perf - Source: [PR #1979](vllm-project/vllm-omni#1979) - [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) - Changes: - Bug fix: [Bugfix] Fix config misalignment between offline and online diffusion inference (Wan2.2, Qwen-Image series) ### vllm-omni-contrib - Source: [PR #1976](vllm-project/vllm-omni#1976) - [skip ci][Docs] Update WeChat QR code (fix filename case) - Changes: - Bug fix: [skip ci][Docs] Update WeChat QR code (fix filename case) ### vllm-omni-contrib - Source: [PR #1974](vllm-project/vllm-omni#1974) - [Docs] Update WeChat QR code for community support ### vllm-omni-cicd - Source: [PR #1945](vllm-project/vllm-omni#1945) - Fix Base voice clone streaming quality and stop-token crash - Changes: - Bug fix: Fix Base voice clone streaming quality and stop-token crash ### vllm-omni-cicd - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1938](vllm-project/vllm-omni#1938) - [Test] L4 complete diffusion feature test for Bagel models - Changes: - New feature: [Test] L4 complete diffusion feature test for Bagel models ### vllm-omni-perf - Source: [PR #1934](vllm-project/vllm-omni#1934) - Fix OmniGen2 transformer config loading for HF models - Changes: - Bug fix: Fix OmniGen2 transformer config loading for HF models ### vllm-omni-audio-tts - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-perf - Source: [PR #1930](vllm-project/vllm-omni#1930) - [Bug][Qwen3TTS][Streaming] remove dynamic initial chunk and only compute on initial request ### vllm-omni-audio-tts - Source: [PR #1926](vllm-project/vllm-omni#1926) - [Misc] removed qwen3_tts.py as it is out-dated ### vllm-omni-contrib - Source: [PR #1920](vllm-project/vllm-omni#1920) - [Docs] Add Wan2.1-T2V as supported video generation models - Changes: - New feature: [Docs] Add Wan2.1-T2V as supported video generation models ### vllm-omni-video-gen - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-perf - Source: [PR #1915](vllm-project/vllm-omni#1915) - [Bugfix] fix helios video generate use cpu device - Changes: - Bug fix: [Bugfix] fix helios video generate use cpu device ### vllm-omni-audio-tts - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-perf - Source: [PR #1913](vllm-project/vllm-omni#1913) - [Optim][Qwen3TTS][CodePredictor] support torch.compile with reduce-overhead and dynamic False ### vllm-omni-api - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-perf - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-contrib - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-serving - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-cicd - Source: [PR #1908](vllm-project/vllm-omni#1908) - [Entrypoint][Refactor] vLLM-Omni Entrypoint Refactoring ### vllm-omni-image-gen - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-contrib - Source: [PR #1900](vllm-project/vllm-omni#1900) - [Feat] support HSDP for Flux family - Changes: - New feature: [Feat] support HSDP for Flux family ### vllm-omni-distributed - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-quantization - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-cicd - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-perf - Source: [PR #1898](vllm-project/vllm-omni#1898) - [Feature]: Remove some useless `hf_overrides` in yaml - Changes: - New feature: [Feature]: Remove some useless `hf_overrides` in yaml ### vllm-omni-contrib - Source: [PR #1890](vllm-project/vllm-omni#1890) - [NPU] Upgrade to v0.17.0 ### vllm-omni-contrib - Source: [PR #1889](vllm-project/vllm-omni#1889) - Add `Governance` section - Changes: - New feature: Add `Governance` section ### vllm-omni-distributed - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism ### vllm-omni-cicd - Source: [PR #1881](vllm-project/vllm-omni#1881) - [Feat] Support T5 Tensor Parallelism - Changes: - New feature: [Feat] Support T5 Tensor Parallelism

) Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com>

Joshna-Medisetty requested a review from hsliuustc0106 as a code owner March 16, 2026 20:38

Load transformer config via get_hf_file_to_dict for local and HF models

3a6a3b6

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

Joshna-Medisetty force-pushed the omnigen2-config-fix branch from 69aa080 to 3a6a3b6 Compare March 16, 2026 20:50

gcanlin requested changes Mar 18, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/omnigen2/pipeline_omnigen2.py Outdated

hsliuustc0106 reviewed Mar 18, 2026

View reviewed changes

Joshna-Medisetty force-pushed the omnigen2-config-fix branch from 98ed0d8 to 3a6a3b6 Compare March 18, 2026 20:39

Joshna-Medisetty added 2 commits March 18, 2026 20:43

OmniGen2: use entrypoint tf_model_config.params for transformer kwargs

3986602

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

OmniGen2: use entrypoint tf_model_config.params for transformer kwargs

2a45a14

Signed-off-by: Joshna Medisetty <joshna.medisetty@intel.com>

Use get_transformer_config_kwargs tool

f5feb6b

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin approved these changes Mar 19, 2026

View reviewed changes

gcanlin added the ready label to trigger buildkite CI label Mar 19, 2026

Gaohan123 added this to the v0.18.0 milestone Mar 19, 2026

gcanlin merged commit ca6c7ad into vllm-project:main Mar 19, 2026
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix OmniGen2 transformer config loading for HF models#1934

Fix OmniGen2 transformer config loading for HF models#1934
gcanlin merged 4 commits into
vllm-project:mainfrom
Joshna-Medisetty:omnigen2-config-fix

Joshna-Medisetty commented Mar 16, 2026

Uh oh!

Joshna-Medisetty commented Mar 16, 2026

Uh oh!

xuechendi commented Mar 16, 2026

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Uh oh!

Joshna-Medisetty commented Mar 18, 2026

Uh oh!

gcanlin commented Mar 19, 2026 •

edited

Loading

Uh oh!

Joshna-Medisetty commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Joshna-Medisetty commented Mar 16, 2026

Purpose

Error Logs

Root Cause

Fix

Test Plan

Test Result

Uh oh!

Joshna-Medisetty commented Mar 16, 2026

Uh oh!

xuechendi commented Mar 16, 2026

Uh oh!

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

PR Review: Fix OmniGen2 transformer config loading for HF models

Gate Status: PASSING ✓

Analysis

Summary

Uh oh!

Joshna-Medisetty commented Mar 18, 2026

Uh oh!

gcanlin commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joshna-Medisetty commented Mar 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gcanlin commented Mar 19, 2026 •

edited

Loading