[Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading by SamitHuang · Pull Request #1504 · vllm-project/vllm-omni

SamitHuang · 2026-02-26T07:59:41Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

The weight loading time for large diffusion model are large, ~3min for QwenImage, ~5min for Wan2.2-I2V 14B. This PR reduce weight loading time by loading safetensors shards in parallel with a thread pool instead of sequentially.

Helpful in:

Reduce wait time for CI or benchmarking board
Startup UX

API Changes

CLI: add --disable-multithread-weight-load to turn off, --num-weight-load-threads to set thread count

Use default 4 threads to reduce disk I/O contention on HDD/network storage

Test Plan

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni

vllm serve Qwen/Qwen-Image --omni --num-weight-load-threads 4

Test Result

on H800:

Wan2.2 I2V 14B

Prev:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.48s/it]
[Stage-0] INFO 02-26 03:47:27 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:05<00:00, 10.48s/it]
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:05<00:00, 10.46s/it]
[Stage-0] INFO 02-26 03:51:45 [diffusers_loader.py:277] Loading weights took 255.55 seconds
[Stage-0] INFO 02-26 03:51:45 [diffusion_model_runner.py:103] Model loading took 64.4626 GiB and 283.140117 seconds
[Stage-0] INFO 02-26 03:51:45 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Now:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.53s/it]
[Stage-0] INFO 02-26 04:17:06 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:18<00:00,  1.51s/it]
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:19<00:00,  1.59s/it]
[Stage-0] INFO 02-26 04:17:53 [diffusers_loader.py:273] Loading weights took 44.93 seconds
[Stage-0] INFO 02-26 04:17:54 [diffusion_model_runner.py:103] Model loading took 64.4626 GiB and 55.996611 seconds
[Stage-0] INFO 02-26 04:17:54 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Loading time cost: 283s -> 56s

QwenImage

Loading safetensors checkpoint shards:   0% Completed | 0/9 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  11% Completed | 1/9 [00:17<02:18, 17.28s/it]
Loading safetensors checkpoint shards:  22% Completed | 2/9 [00:34<02:00, 17.15s/it]
Loading safetensors checkpoint shards:  33% Completed | 3/9 [00:51<01:42, 17.07s/it]
Loading safetensors checkpoint shards:  44% Completed | 4/9 [01:08<01:24, 16.96s/it]
Loading safetensors checkpoint shards:  56% Completed | 5/9 [01:09<00:44, 11.18s/it]
Loading safetensors checkpoint shards:  67% Completed | 6/9 [01:25<00:39, 13.12s/it]
Loading safetensors checkpoint shards:  78% Completed | 7/9 [01:42<00:28, 14.31s/it]
Loading safetensors checkpoint shards:  89% Completed | 8/9 [01:59<00:15, 15.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 9/9 [02:16<00:00, 15.70s/it]
Loading safetensors checkpoint shards: 100% Completed | 9/9 [02:16<00:00, 15.17s/it]

[Stage-0] INFO 02-26 03:16:32 [diffusers_loader.py:227] Loading weights took 136.56 seconds
[Stage-0] INFO 02-26 03:16:33 [diffusion_model_runner.py:103] Model loading took 53.7462 GiB and 167.369310 seconds
[Stage-0] INFO 02-26 03:16:33 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Now:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.61s/it]
[Stage-0] INFO 02-26 03:37:10 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:13<00:00,  1.53s/it]
[Stage-0] INFO 02-26 03:37:27 [diffusers_loader.py:277] Loading weights took 15.34 seconds
[Stage-0] INFO 02-26 03:37:28 [diffusion_model_runner.py:103] Model loading took 53.7462 GiB and 27.148050 seconds
[Stage-0] INFO 02-26 03:37:28 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Loading time cost: 168s -> 27s

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please pasting the results comparison before and after, or e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: samithuang <285365963@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ed55a2a5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-26T08:03:50Z

+            "--disable-multithread-weight-load",
+            action="store_false",
+            dest="enable_multithread_weight_load",
+            default=True,
+            help="Disable multi-threaded safetensors loading (default: enabled with 4 threads).",


Wire new weight-load CLI flags into diffusion engine args

These new serve flags are parsed, but in the async serving path they are not propagated into diffusion engine_args (the default diffusion stage builder in vllm_omni/entrypoints/async_omni.py only forwards a fixed subset and omits enable_multithread_weight_load / num_weight_load_threads). Because _build_od_config only copies fields present in engine_args, --disable-multithread-weight-load (and custom thread counts) are silently ignored for default diffusion serving.

Useful? React with 👍 / 👎.

gcanlin

Really helpful feature. Could you please add it in docs?

Isotr0py · 2026-02-26T08:56:52Z

+    def _get_weights_iterator(
+        self,
+        source: "ComponentSource",
+        od_config: OmniDiffusionConfig | None = None,
+    ) -> Generator[tuple[str, torch.Tensor], None, None]:


I think we can initialize DiffusersPipelineLoader with od_config, so that we don't need to access it from model:

def __init__(self, load_config: LoadConfig, od_config): self.load_config = load_config self.od_config = od_config

good suggestion. updated

Signed-off-by: samithuang <285365963@qq.com>

Signed-off-by: Samit <285365963@qq.com>

SamitHuang · 2026-02-26T09:57:16Z

Really helpful feature. Could you please add it in docs?

It's enabled by default. But it's ok to mention it in diffusion acceleration docs. thanks for reminding

Signed-off-by: samithuang <285365963@qq.com>

…ight loading (vllm-project#1504) Signed-off-by: samithuang <285365963@qq.com> Signed-off-by: Samit <285365963@qq.com>

### vllm-omni-perf - Source: [PR #1619](vllm-project/vllm-omni#1619) - [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context - Changes: - Bug fix: [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context ### vllm-omni-contrib - Source: [PR #1615](vllm-project/vllm-omni#1615) - [Doc] Fix links in the configuration doc - Changes: - Bug fix: [Doc] Fix links in the configuration doc ### vllm-omni-perf - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-image-gen - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Additions: - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image ### vllm-omni-api - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-serving - Source: [PR #1602](vllm-project/vllm-omni#1602) - [Bugfix] fix kernel error for qwen3-omni - Changes: - Bug fix: [Bugfix] fix kernel error for qwen3-omni ### vllm-omni-perf - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-image-gen - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Additions: - HunyuanImage3 - HunyuanImage3Pipeline - HunyuanImage3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage-3 ### vllm-omni-quantization - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-distributed - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-contrib - Source: [PR #1576](vllm-project/vllm-omni#1576) - 0.16.0 release ### vllm-omni-audio-tts - Source: [PR #1570](vllm-project/vllm-omni#1570) - [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio - Changes: - Bug fix: [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio ### vllm-omni-api - Source: [PR #1566](vllm-project/vllm-omni#1566) - [Bugfix] Import InputPreprocessor into Renderer - Changes: - Bug fix: [Bugfix] Import InputPreprocessor into Renderer ### vllm-omni-perf - Source: [PR #1565](vllm-project/vllm-omni#1565) - [BugFix]: fix a lot of bug - Changes: - Bug fix: [BugFix]: fix a lot of bug ### vllm-omni-contrib - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-audio-tts - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-perf - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-quantization - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-distributed - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1554](vllm-project/vllm-omni#1554) - fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio - Changes: - Bug fix: fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio ### vllm-omni-cicd - Source: [PR #1543](vllm-project/vllm-omni#1543) - [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. ### vllm-omni-perf - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-distributed - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-perf - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-quantization - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-distributed - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-image-gen - Source: [PR #1538](vllm-project/vllm-omni#1538) - [CI][skip ci]Update H100 image link based on #1518 ### vllm-omni-perf - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-serving - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-cicd - Source: [PR #1534](vllm-project/vllm-omni#1534) - [Debug] Merge vllm pull 35368 ### vllm-omni-contrib - Source: [PR #1530](vllm-project/vllm-omni#1530) - [Docs] update async chunk docs diagram [skip ci] ### vllm-omni-distributed - Source: [PR #1524](vllm-project/vllm-omni#1524) - [BugFix] Restore talker's config - Changes: - Bug fix: [BugFix] Restore talker's config ### vllm-omni-api - Source: [PR #1522](vllm-project/vllm-omni#1522) - [Bugfix] Use uds for zmq address if not set --stage-id - Changes: - New feature: [Bugfix] Use uds for zmq address if not set --stage-id ### vllm-omni-perf - Source: [PR #1521](vllm-project/vllm-omni#1521) - Revert gpu_1 job to use regular image ### vllm-omni-perf - Source: [PR #1518](vllm-project/vllm-omni#1518) - Use pull through cache image for H100 pool ### vllm-omni-perf - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-image-gen - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 - Additions: - num-images-per-prompt ### vllm-omni-quantization - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-distributed - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-api - Source: [PR #1509](vllm-project/vllm-omni#1509) - [Chore] remove unused logger in omni_diffusion (#531) ### vllm-omni-perf - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-quantization - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-distributed - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-contrib - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-video-gen - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-perf - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-api - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-cicd - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-contrib - Source: [PR #1500](vllm-project/vllm-omni#1500) - [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version ### vllm-omni-cicd - Source: [PR #1492](vllm-project/vllm-omni#1492) - [Platform] Enable layerwise offload on all hardware ### vllm-omni-image-gen - Source: [PR #1491](vllm-project/vllm-omni#1491) - [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… ### vllm-omni-cicd - Source: [PR #1488](vllm-project/vllm-omni#1488) - [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda ### vllm-omni-audio-tts - Source: [PR #1482](vllm-project/vllm-omni#1482) - [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements - Changes: - Bug fix: [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements ### vllm-omni-perf - Source: [PR #1468](vllm-project/vllm-omni#1468) - [BugFix] process request.num_cached_tokens if it equals to the initial value - Changes: - Bug fix: [BugFix] process request.num_cached_tokens if it equals to the initial value ### vllm-omni-audio-tts - Source: [PR #1455](vllm-project/vllm-omni#1455) - [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration - Changes: - Bug fix: [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration ### vllm-omni-cicd - Source: [PR #1449](vllm-project/vllm-omni#1449) - [Test] Reduce Perf test case and fix modify stage config - Changes: - Bug fix: [Test] Reduce Perf test case and fix modify stage config ### vllm-omni-cicd - Source: [PR #1448](vllm-project/vllm-omni#1448) - [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler - Changes: - Bug fix: [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler ### vllm-omni-cicd - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-api - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-contrib - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-audio-tts - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-cicd - Source: [PR #1435](vllm-project/vllm-omni#1435) - [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning ### vllm-omni-video-gen - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video ### vllm-omni-audio-tts - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video

Document the multi-thread weight loading startup optimization introduced in PR vllm-project#1504, including configuration, CLI flags, usage examples, and benchmark results. Made-with: Cursor Signed-off-by: samithuang <285365963@qq.com>

SamitHuang added 3 commits February 26, 2026 04:33

support multi-thread weight loading

b635aaf

Signed-off-by: samithuang <285365963@qq.com>

update default threads

0f88b73

Signed-off-by: samithuang <285365963@qq.com>

rm

5ed55a2

Signed-off-by: samithuang <285365963@qq.com>

SamitHuang requested a review from hsliuustc0106 as a code owner February 26, 2026 07:59

SamitHuang mentioned this pull request Feb 26, 2026

[RFC]: Startup UX Improvement for Diffusion Models #1503

Open

1 task

SamitHuang requested review from Isotr0py and ZJY0516 February 26, 2026 08:01

chatgpt-codex-connector Bot reviewed Feb 26, 2026

View reviewed changes

gcanlin reviewed Feb 26, 2026

View reviewed changes

Isotr0py reviewed Feb 26, 2026

View reviewed changes

SamitHuang added 2 commits February 26, 2026 09:47

update docstring, small refactor

625f559

Signed-off-by: samithuang <285365963@qq.com>

Merge branch 'main' into dit_load_fast

5c2adb0

Signed-off-by: Samit <285365963@qq.com>

Isotr0py reviewed Feb 26, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py Outdated

SamitHuang added 2 commits February 26, 2026 10:30

fix arg

02c27f8

Signed-off-by: samithuang <285365963@qq.com>

reuse vllm mlw

c5328bf

Signed-off-by: samithuang <285365963@qq.com>

Isotr0py approved these changes Feb 28, 2026

View reviewed changes

Isotr0py enabled auto-merge (squash) February 28, 2026 07:31

ZJY0516 approved these changes Feb 28, 2026

View reviewed changes

Isotr0py added the ready label to trigger buildkite CI label Feb 28, 2026

SamitHuang mentioned this pull request Mar 2, 2026

[Feature][UX]: Diffusion multi-thread weight loading JiusiServe/vllm-omni#142

Closed

1 task

fix CI

312e231

Signed-off-by: samithuang <285365963@qq.com>

Isotr0py merged commit 22a51a7 into vllm-project:main Mar 2, 2026
6 of 7 checks passed

SamitHuang mentioned this pull request Apr 2, 2026

[Docs] Add multi-thread weight loading documentation #2445

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading#1504

[Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading#1504
Isotr0py merged 8 commits intovllm-project:mainfrom
SamitHuang:dit_load_fast

SamitHuang commented Feb 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Feb 26, 2026

Uh oh!

gcanlin left a comment

Uh oh!

Isotr0py Feb 26, 2026

Uh oh!

SamitHuang Feb 26, 2026

Uh oh!

SamitHuang commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SamitHuang commented Feb 26, 2026

Purpose

API Changes

Test Plan

Test Result

Wan2.2 I2V 14B

QwenImage

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

Isotr0py Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

SamitHuang Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

SamitHuang commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants