Skip to content

[Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading#1504

Merged
Isotr0py merged 8 commits intovllm-project:mainfrom
SamitHuang:dit_load_fast
Mar 2, 2026
Merged

[Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading#1504
Isotr0py merged 8 commits intovllm-project:mainfrom
SamitHuang:dit_load_fast

Conversation

@SamitHuang
Copy link
Copy Markdown
Collaborator

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

The weight loading time for large diffusion model are large, ~3min for QwenImage, ~5min for Wan2.2-I2V 14B. This PR reduce weight loading time by loading safetensors shards in parallel with a thread pool instead of sequentially.

Helpful in:

  • Reduce wait time for CI or benchmarking board
  • Startup UX

API Changes

CLI: add --disable-multithread-weight-load to turn off, --num-weight-load-threads to set thread count

Use default 4 threads to reduce disk I/O contention on HDD/network storage

Test Plan

vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni 
vllm serve Qwen/Qwen-Image --omni --num-weight-load-threads 4

Test Result

on H800:

Wan2.2 I2V 14B

  • Prev:
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:19<00:00,  6.48s/it]
[Stage-0] INFO 02-26 03:47:27 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:05<00:00, 10.48s/it]
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [02:05<00:00, 10.46s/it]
[Stage-0] INFO 02-26 03:51:45 [diffusers_loader.py:277] Loading weights took 255.55 seconds
[Stage-0] INFO 02-26 03:51:45 [diffusion_model_runner.py:103] Model loading took 64.4626 GiB and 283.140117 seconds
[Stage-0] INFO 02-26 03:51:45 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.
  • Now:
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.53s/it]
[Stage-0] INFO 02-26 04:17:06 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:18<00:00,  1.51s/it]
Multi-thread loading safetensors shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:19<00:00,  1.59s/it]
[Stage-0] INFO 02-26 04:17:53 [diffusers_loader.py:273] Loading weights took 44.93 seconds
[Stage-0] INFO 02-26 04:17:54 [diffusion_model_runner.py:103] Model loading took 64.4626 GiB and 55.996611 seconds
[Stage-0] INFO 02-26 04:17:54 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Loading time cost: 283s -> 56s

QwenImage

Prev:

Loading safetensors checkpoint shards:   0% Completed | 0/9 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  11% Completed | 1/9 [00:17<02:18, 17.28s/it]
Loading safetensors checkpoint shards:  22% Completed | 2/9 [00:34<02:00, 17.15s/it]
Loading safetensors checkpoint shards:  33% Completed | 3/9 [00:51<01:42, 17.07s/it]
Loading safetensors checkpoint shards:  44% Completed | 4/9 [01:08<01:24, 16.96s/it]
Loading safetensors checkpoint shards:  56% Completed | 5/9 [01:09<00:44, 11.18s/it]
Loading safetensors checkpoint shards:  67% Completed | 6/9 [01:25<00:39, 13.12s/it]
Loading safetensors checkpoint shards:  78% Completed | 7/9 [01:42<00:28, 14.31s/it]
Loading safetensors checkpoint shards:  89% Completed | 8/9 [01:59<00:15, 15.10s/it]
Loading safetensors checkpoint shards: 100% Completed | 9/9 [02:16<00:00, 15.70s/it]
Loading safetensors checkpoint shards: 100% Completed | 9/9 [02:16<00:00, 15.17s/it]

[Stage-0] INFO 02-26 03:16:32 [diffusers_loader.py:227] Loading weights took 136.56 seconds
[Stage-0] INFO 02-26 03:16:33 [diffusion_model_runner.py:103] Model loading took 53.7462 GiB and 167.369310 seconds
[Stage-0] INFO 02-26 03:16:33 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Now:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00,  1.61s/it]
[Stage-0] INFO 02-26 03:37:10 [platform.py:77] Defaulting to diffusion attention backend FLASH_ATTN
Multi-thread loading safetensors shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:13<00:00,  1.53s/it]
[Stage-0] INFO 02-26 03:37:27 [diffusers_loader.py:277] Loading weights took 15.34 seconds
[Stage-0] INFO 02-26 03:37:28 [diffusion_model_runner.py:103] Model loading took 53.7462 GiB and 27.148050 seconds
[Stage-0] INFO 02-26 03:37:28 [diffusion_model_runner.py:108] Model runner: Model loaded successfully.

Loading time cost: 168s -> 27s


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please pasting the results comparison before and after, or e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5ed55a2a5a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py Outdated
Comment on lines +198 to +202
"--disable-multithread-weight-load",
action="store_false",
dest="enable_multithread_weight_load",
default=True,
help="Disable multi-threaded safetensors loading (default: enabled with 4 threads).",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wire new weight-load CLI flags into diffusion engine args

These new serve flags are parsed, but in the async serving path they are not propagated into diffusion engine_args (the default diffusion stage builder in vllm_omni/entrypoints/async_omni.py only forwards a fixed subset and omits enable_multithread_weight_load / num_weight_load_threads). Because _build_od_config only copies fields present in engine_args, --disable-multithread-weight-load (and custom thread counts) are silently ignored for default diffusion serving.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really helpful feature. Could you please add it in docs?

Comment on lines +192 to +196
def _get_weights_iterator(
self,
source: "ComponentSource",
od_config: OmniDiffusionConfig | None = None,
) -> Generator[tuple[str, torch.Tensor], None, None]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can initialize DiffusersPipelineLoader with od_config, so that we don't need to access it from model:

def __init__(self, load_config: LoadConfig, od_config):
        self.load_config = load_config
        self.od_config = od_config

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion. updated

Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
@SamitHuang
Copy link
Copy Markdown
Collaborator Author

Really helpful feature. Could you please add it in docs?

It's enabled by default. But it's ok to mention it in diffusion acceleration docs. thanks for reminding

Comment thread vllm_omni/diffusion/model_loader/diffusers_loader.py Outdated
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
@Isotr0py Isotr0py enabled auto-merge (squash) February 28, 2026 07:31
@Isotr0py Isotr0py added the ready label to trigger buildkite CI label Feb 28, 2026
Signed-off-by: samithuang <285365963@qq.com>
@Isotr0py Isotr0py merged commit 22a51a7 into vllm-project:main Mar 2, 2026
6 of 7 checks passed
yJader pushed a commit to omni-nicelab/vllm-omni-batching that referenced this pull request Mar 3, 2026
…ight loading (vllm-project#1504)

Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
hsliuustc0106 added a commit to hsliuustc0106/vllm-omni-skills that referenced this pull request Mar 4, 2026
### vllm-omni-perf
- Source: [PR #1619](vllm-project/vllm-omni#1619) - [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context
- Changes:
  - Bug fix: [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context

### vllm-omni-contrib
- Source: [PR #1615](vllm-project/vllm-omni#1615) - [Doc] Fix links in the configuration doc
- Changes:
  - Bug fix: [Doc] Fix links in the configuration doc

### vllm-omni-perf
- Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation
- Changes:
  - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation

### vllm-omni-image-gen
- Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation
- Changes:
  - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation
- Additions:
  - GLM-Image
  - GLM-Image
  - GLM-Image
  - GLM-Image
  - GLM-Image
  - GLM-Image
  - GLM-Image
  - GLM-Image

### vllm-omni-api
- Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation
- Changes:
  - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation

### vllm-omni-serving
- Source: [PR #1602](vllm-project/vllm-omni#1602) - [Bugfix] fix kernel error for qwen3-omni
- Changes:
  - Bug fix: [Bugfix] fix kernel error for qwen3-omni

### vllm-omni-perf
- Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0
- Changes:
  - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0

### vllm-omni-image-gen
- Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0
- Changes:
  - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0
- Additions:
  - HunyuanImage3
  - HunyuanImage3Pipeline
  - HunyuanImage3
  - HunyuanImage-3
  - HunyuanImage-3
  - HunyuanImage-3
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage3Pipeline
  - HunyuanImage-3

### vllm-omni-quantization
- Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0
- Changes:
  - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0

### vllm-omni-distributed
- Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0
- Changes:
  - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0

### vllm-omni-contrib
- Source: [PR #1576](vllm-project/vllm-omni#1576) - 0.16.0 release

### vllm-omni-audio-tts
- Source: [PR #1570](vllm-project/vllm-omni#1570) - [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio
- Changes:
  - Bug fix: [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio

### vllm-omni-api
- Source: [PR #1566](vllm-project/vllm-omni#1566) - [Bugfix] Import InputPreprocessor into Renderer
- Changes:
  - Bug fix: [Bugfix] Import InputPreprocessor into Renderer

### vllm-omni-perf
- Source: [PR #1565](vllm-project/vllm-omni#1565) - [BugFix]: fix a lot of bug
- Changes:
  - Bug fix: [BugFix]: fix a lot of bug

### vllm-omni-contrib
- Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts
- Changes:
  - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts

### vllm-omni-audio-tts
- Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts
- Changes:
  - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts

### vllm-omni-perf
- Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion
- Changes:
  - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion

### vllm-omni-api
- Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion
- Changes:
  - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion

### vllm-omni-quantization
- Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion
- Changes:
  - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion

### vllm-omni-distributed
- Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion
- Changes:
  - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion

### vllm-omni-api
- Source: [PR #1554](vllm-project/vllm-omni#1554) - fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio
- Changes:
  - Bug fix: fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio

### vllm-omni-cicd
- Source: [PR #1543](vllm-project/vllm-omni#1543) - [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage.

### vllm-omni-perf
- Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens
- Changes:
  - Bug fix: Fix no embed text spk tokens

### vllm-omni-distributed
- Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens
- Changes:
  - Bug fix: Fix no embed text spk tokens

### vllm-omni-perf
- Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai

### vllm-omni-quantization
- Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai

### vllm-omni-distributed
- Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai

### vllm-omni-image-gen
- Source: [PR #1538](vllm-project/vllm-omni#1538) - [CI][skip ci]Update H100 image link based on #1518

### vllm-omni-perf
- Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving
- Changes:
  - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving

### vllm-omni-serving
- Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving
- Changes:
  - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving

### vllm-omni-cicd
- Source: [PR #1534](vllm-project/vllm-omni#1534) - [Debug] Merge vllm pull 35368

### vllm-omni-contrib
- Source: [PR #1530](vllm-project/vllm-omni#1530) - [Docs] update async chunk docs diagram [skip ci]

### vllm-omni-distributed
- Source: [PR #1524](vllm-project/vllm-omni#1524) - [BugFix] Restore talker's config
- Changes:
  - Bug fix: [BugFix] Restore talker's config

### vllm-omni-api
- Source: [PR #1522](vllm-project/vllm-omni#1522) - [Bugfix] Use uds for zmq address if not set --stage-id
- Changes:
  - New feature: [Bugfix] Use uds for zmq address if not set --stage-id

### vllm-omni-perf
- Source: [PR #1521](vllm-project/vllm-omni#1521) - Revert gpu_1 job to use regular image

### vllm-omni-perf
- Source: [PR #1518](vllm-project/vllm-omni#1518) - Use pull through cache image for H100 pool

### vllm-omni-perf
- Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009
- Changes:
  - Bug fix: [Bugfix] fix offline text_to_image error from #1009

### vllm-omni-image-gen
- Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009
- Changes:
  - Bug fix: [Bugfix] fix offline text_to_image error from #1009
- Additions:
  - num-images-per-prompt

### vllm-omni-quantization
- Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009
- Changes:
  - Bug fix: [Bugfix] fix offline text_to_image error from #1009

### vllm-omni-distributed
- Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009
- Changes:
  - Bug fix: [Bugfix] fix offline text_to_image error from #1009

### vllm-omni-api
- Source: [PR #1509](vllm-project/vllm-omni#1509) - [Chore] remove unused logger in omni_diffusion (#531)

### vllm-omni-perf
- Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0

### vllm-omni-quantization
- Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0

### vllm-omni-distributed
- Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0

### vllm-omni-contrib
- Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0

### vllm-omni-video-gen
- Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading
- Changes:
  - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading

### vllm-omni-perf
- Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading
- Changes:
  - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading

### vllm-omni-api
- Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading
- Changes:
  - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading

### vllm-omni-cicd
- Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading
- Changes:
  - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading

### vllm-omni-contrib
- Source: [PR #1500](vllm-project/vllm-omni#1500) - [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version

### vllm-omni-cicd
- Source: [PR #1492](vllm-project/vllm-omni#1492) - [Platform] Enable layerwise offload on all hardware

### vllm-omni-image-gen
- Source: [PR #1491](vllm-project/vllm-omni#1491) - [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep…

### vllm-omni-cicd
- Source: [PR #1488](vllm-project/vllm-omni#1488) - [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda

### vllm-omni-audio-tts
- Source: [PR #1482](vllm-project/vllm-omni#1482) - [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements
- Changes:
  - Bug fix: [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements

### vllm-omni-perf
- Source: [PR #1468](vllm-project/vllm-omni#1468) - [BugFix] process request.num_cached_tokens if it equals to the initial value
- Changes:
  - Bug fix: [BugFix] process request.num_cached_tokens if it equals to the initial value

### vllm-omni-audio-tts
- Source: [PR #1455](vllm-project/vllm-omni#1455) - [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration
- Changes:
  - Bug fix: [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration

### vllm-omni-cicd
- Source: [PR #1449](vllm-project/vllm-omni#1449) - [Test] Reduce Perf test case and fix modify stage config
- Changes:
  - Bug fix: [Test] Reduce Perf test case and fix modify stage config

### vllm-omni-cicd
- Source: [PR #1448](vllm-project/vllm-omni#1448) - [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler
- Changes:
  - Bug fix: [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler

### vllm-omni-cicd
- Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output
- Changes:
  - New feature: [Qwen3TTS][Feat] Streaming output

### vllm-omni-api
- Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output
- Changes:
  - New feature: [Qwen3TTS][Feat] Streaming output

### vllm-omni-contrib
- Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output
- Changes:
  - New feature: [Qwen3TTS][Feat] Streaming output

### vllm-omni-audio-tts
- Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output
- Changes:
  - New feature: [Qwen3TTS][Feat] Streaming output

### vllm-omni-cicd
- Source: [PR #1435](vllm-project/vllm-omni#1435) - [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning

### vllm-omni-video-gen
- Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video

### vllm-omni-audio-tts
- Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video
SamitHuang added a commit to SamitHuang/vllm-omni that referenced this pull request Apr 2, 2026
Document the multi-thread weight loading startup optimization
introduced in PR vllm-project#1504, including configuration, CLI flags,
usage examples, and benchmark results.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants