Skip to content

[Feature] Add InstantTensor weight loader#36139

Merged
mgoin merged 2 commits intovllm-project:mainfrom
arlo-scitix:instanttensor
Mar 14, 2026
Merged

[Feature] Add InstantTensor weight loader#36139
mgoin merged 2 commits intovllm-project:mainfrom
arlo-scitix:instanttensor

Conversation

@arlo-scitix
Copy link
Contributor

@arlo-scitix arlo-scitix commented Mar 5, 2026

PR for this RFC.

Purpose

Speed up model loading and fully utilize the bandwidth of high-speed storage (e.g., 400 Gbps networked storage).

Test Plan

Load any model with any parallelism setting, on H20 141G for example:

vllm serve Qwen/Qwen3-30B-A3B --load-format instanttensor
vllm serve deepseek-ai/DeepSeek-R1 --load-format instanttensor --tensor-parallel-size 8 --enable-expert-parallel

Test Result

See README of our repo.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Mar 5, 2026

Documentation preview: https://vllm--36139.org.readthedocs.build/en/36139/

@mergify mergify bot added documentation Improvements or additions to documentation ci/build labels Mar 5, 2026
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for InstantTensor to accelerate model weight loading. The changes span dependency management, configuration, model loading logic, and documentation. My review focuses on ensuring the CUDA-specific nature of this new feature is made clear to users. I've provided suggestions to add an explicit check for CUDA availability in the code and to update the associated docstrings and documentation to mention this requirement. These changes will help prevent runtime errors and improve the user experience for those on non-CUDA platforms.

@robertgshaw2-redhat
Copy link
Collaborator

Is it possible to make this the default? What considerations should we have for its usage?

@arlo-scitix
Copy link
Contributor Author

@robertgshaw2-redhat, thanks. We’ve responded in the RFC.

@arlo-scitix arlo-scitix force-pushed the instanttensor branch 4 times, most recently from 6a7903e to c198df6 Compare March 7, 2026 07:28
@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Mar 9, 2026
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the initial integration, thank you!

@github-project-automation github-project-automation bot moved this to Ready in NVIDIA Mar 10, 2026
@mergify
Copy link

mergify bot commented Mar 10, 2026

Hi @arlo-aisys, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

@arlo-scitix arlo-scitix force-pushed the instanttensor branch 6 times, most recently from 3164795 to 2f1e5c4 Compare March 11, 2026 07:49
@arlo-scitix
Copy link
Contributor Author

@mgoin
We’ve fixed all issues that were causing the CI tests to fail. The remaining test failures don’t appear to be caused by InstantTensor. Who should we contact next among the maintainers to help move this PR forward?

@bbartels
Copy link
Contributor

@mgoin
We’ve fixed all issues that were causing the CI tests to fail. The remaining test failures don’t appear to be caused by InstantTensor. Who should we contact next among the maintainers to help move this PR forward?

You can post in the PR reviews channel on the vllm slack

Signed-off-by: arlo <264998716+arlo-aisys@users.noreply.github.com>
@mergify
Copy link

mergify bot commented Mar 14, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @arlo-aisys.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 14, 2026
Signed-off-by: arlo <arlo@scitix.ai>
@mergify mergify bot removed the needs-rebase label Mar 14, 2026
Copy link
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot!

@mgoin mgoin changed the title [Feature] Add support for InstantTensor [Feature] Add InstantTensor weight loader Mar 14, 2026
@mgoin mgoin removed the cpu Related to CPU backends label Mar 14, 2026
@mergify mergify bot added the cpu Related to CPU backends label Mar 14, 2026
@mgoin mgoin merged commit 8c29042 into vllm-project:main Mar 14, 2026
128 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Done in NVIDIA Mar 14, 2026
athrael-soju pushed a commit to athrael-soju/vllm that referenced this pull request Mar 16, 2026
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build cpu Related to CPU backends documentation Improvements or additions to documentation nvidia ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants