Skip to content

[Docs] Add multi-thread weight loading documentation#2445

Merged
SamitHuang merged 1 commit into
vllm-project:mainfrom
SamitHuang:docs/multithread-weight-loading
Apr 9, 2026
Merged

[Docs] Add multi-thread weight loading documentation#2445
SamitHuang merged 1 commit into
vllm-project:mainfrom
SamitHuang:docs/multithread-weight-loading

Conversation

@SamitHuang
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang commented Apr 2, 2026

Purpose

Add documentation for the multi-thread weight loading startup optimization introduced in PR #1504. This feature loads safetensors shards in parallel using a thread pool, reducing model startup time by 5-6x for large diffusion models.

Updates docs/user_guide/diffusion_features.md to include:

  • A "Startup Optimization" entry in the feature overview table under Lossless Acceleration
  • A dedicated "Multi-Thread Weight Loading" section with:
    • Feature description and default behavior (enabled by default, 4 threads)
    • Configuration table with CLI flags (--disable-multithread-weight-load, --num-weight-load-threads) and Python parameters
    • Online serving and offline inference usage examples
    • Benchmark results (Qwen-Image: 168s → 27s, Wan2.2 I2V 14B: 283s → 56s on H800)
  • A link in the "Learn More" section

Test Plan

Documentation-only change. Verified markdown rendering and internal anchor links are correct.

Test Result

N/A (docs only).


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

Document the multi-thread weight loading startup optimization
introduced in PR vllm-project#1504, including configuration, CLI flags,
usage examples, and benchmark results.

Made-with: Cursor

Signed-off-by: samithuang <285365963@qq.com>
@SamitHuang SamitHuang requested a review from Gaohan123 April 2, 2026 08:34
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SamitHuang SamitHuang added the ready label to trigger buildkite CI label Apr 3, 2026
@SamitHuang SamitHuang merged commit a7bf405 into vllm-project:main Apr 9, 2026
6 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
Sy0307 pushed a commit to Sy0307/vllm-omni that referenced this pull request Apr 10, 2026
daixinning pushed a commit to daixinning/vllm-omni that referenced this pull request Apr 13, 2026
lengrongfu pushed a commit to lengrongfu/vllm-omni that referenced this pull request May 1, 2026
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants