[Feature] Add InstantTensor weight loader by arlo-scitix · Pull Request #36139 · vllm-project/vllm

arlo-scitix · 2026-03-05T13:00:22Z

PR for this RFC.

Purpose

Speed up model loading and fully utilize the bandwidth of high-speed storage (e.g., 400 Gbps networked storage).

Test Plan

Load any model with any parallelism setting, on H20 141G for example:

vllm serve Qwen/Qwen3-30B-A3B --load-format instanttensor

vllm serve deepseek-ai/DeepSeek-R1 --load-format instanttensor --tensor-parallel-size 8 --enable-expert-parallel

Test Result

See README of our repo.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-03-05T13:01:05Z

Documentation preview: https://vllm--36139.org.readthedocs.build/en/36139/

github-actions · 2026-03-05T13:06:11Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

gemini-code-assist

Code Review

This pull request introduces support for InstantTensor to accelerate model weight loading. The changes span dependency management, configuration, model loading logic, and documentation. My review focuses on ensuring the CUDA-specific nature of this new feature is made clear to users. I've provided suggestions to add an explicit check for CUDA availability in the code and to update the associated docstrings and documentation to mention this requirement. These changes will help prevent runtime errors and improve the user experience for those on non-CUDA platforms.

docs/models/extensions/instanttensor.md

vllm/config/load.py

vllm/model_executor/model_loader/weight_utils.py

robertgshaw2-redhat · 2026-03-05T14:12:54Z

Is it possible to make this the default? What considerations should we have for its usage?

arlo-scitix · 2026-03-06T05:37:18Z

@robertgshaw2-redhat, thanks. We’ve responded in the RFC.

requirements/test.in

vllm/model_executor/model_loader/weight_utils.py

docs/models/extensions/instanttensor.md

mgoin

LGTM for the initial integration, thank you!

mergify · 2026-03-10T10:17:45Z

Hi @arlo-aisys, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

arlo-scitix · 2026-03-11T15:17:29Z

@mgoin
We’ve fixed all issues that were causing the CI tests to fail. The remaining test failures don’t appear to be caused by InstantTensor. Who should we contact next among the maintainers to help move this PR forward?

bbartels · 2026-03-12T01:13:12Z

@mgoin
We’ve fixed all issues that were causing the CI tests to fail. The remaining test failures don’t appear to be caused by InstantTensor. Who should we contact next among the maintainers to help move this PR forward?

You can post in the PR reviews channel on the vllm slack

Signed-off-by: arlo <264998716+arlo-aisys@users.noreply.github.com>

mergify · 2026-03-14T01:34:47Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @arlo-aisys.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: arlo <arlo@scitix.ai>

mgoin

LGTM, thanks a lot!

Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

arlo-scitix requested review from 22quinn, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 5, 2026 13:00

mergify bot added documentation Improvements or additions to documentation ci/build labels Mar 5, 2026

arlo-scitix mentioned this pull request Mar 5, 2026

[RFC]: Add InstantTensor Support in vLLM #36091

Open

1 task

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

docs/models/extensions/instanttensor.md Outdated Show resolved Hide resolved

vllm/config/load.py Outdated Show resolved Hide resolved

vllm/model_executor/model_loader/weight_utils.py Outdated Show resolved Hide resolved

arlo-scitix force-pushed the instanttensor branch 4 times, most recently from 6a7903e to c198df6 Compare March 7, 2026 07:28

mgoin added ready ONLY add when PR is ready to merge/full CI is needed nvidia labels Mar 9, 2026

github-project-automation bot added this to NVIDIA Mar 9, 2026

mgoin reviewed Mar 9, 2026

View reviewed changes

requirements/test.in Outdated Show resolved Hide resolved

vllm/model_executor/model_loader/weight_utils.py Outdated Show resolved Hide resolved

docs/models/extensions/instanttensor.md Show resolved Hide resolved

arlo-scitix force-pushed the instanttensor branch from c198df6 to 33305dd Compare March 10, 2026 07:42

mgoin approved these changes Mar 10, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Mar 10, 2026

arlo-scitix force-pushed the instanttensor branch 6 times, most recently from 3164795 to 2f1e5c4 Compare March 11, 2026 07:49

arlo-scitix force-pushed the instanttensor branch from 2f1e5c4 to 5200dce Compare March 11, 2026 19:41

arlo-scitix force-pushed the instanttensor branch from 5200dce to 2fd9c82 Compare March 12, 2026 17:33

[Feature] Add support for InstantTensor

73fc7c7

Signed-off-by: arlo <264998716+arlo-aisys@users.noreply.github.com>

arlo-scitix force-pushed the instanttensor branch from 2fd9c82 to 73fc7c7 Compare March 14, 2026 01:26

mergify bot added the needs-rebase label Mar 14, 2026

Merge branch 'main' into instanttensor

155a83a

Signed-off-by: arlo <arlo@scitix.ai>

mergify bot removed the needs-rebase label Mar 14, 2026

mgoin approved these changes Mar 14, 2026

View reviewed changes

mgoin changed the title ~~[Feature] Add support for InstantTensor~~ [Feature] Add InstantTensor weight loader Mar 14, 2026

mgoin removed the cpu Related to CPU backends label Mar 14, 2026

mergify bot added the cpu Related to CPU backends label Mar 14, 2026

mgoin merged commit 8c29042 into vllm-project:main Mar 14, 2026
128 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Mar 14, 2026

athrael-soju pushed a commit to athrael-soju/vllm that referenced this pull request Mar 16, 2026

[Feature] Add InstantTensor weight loader (vllm-project#36139)

4a60dfa

Signed-off-by: Athrael Soju <athrael.soju@gmail.com>

mgoin mentioned this pull request Mar 17, 2026

[Don't Merge] Test CI with InstantTensor as default load format #37309

Closed

Lucaskabela pushed a commit to Lucaskabela/vllm that referenced this pull request Mar 17, 2026

[Feature] Add InstantTensor weight loader (vllm-project#36139)

4d90c84

wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026

[Feature] Add InstantTensor weight loader (vllm-project#36139)

40e1661

Signed-off-by: wendyliu235 <wenjun.liu@intel.com>

lgeiger mentioned this pull request Mar 18, 2026

Publish prebuilt wheels scitix/InstantTensor#3

Closed

fxdawnn pushed a commit to fxdawnn/vllm that referenced this pull request Mar 19, 2026

[Feature] Add InstantTensor weight loader (vllm-project#36139)

43f2bbc

lgeiger mentioned this pull request Mar 20, 2026

Setup cibuildwheel to pre-build wheels scitix/InstantTensor#5

Merged

khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026

[Feature] Add InstantTensor weight loader (vllm-project#36139)

49df234

Uh oh!

Conversation

arlo-scitix commented Mar 5, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Mar 5, 2026

Uh oh!

github-actions bot commented Mar 5, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertgshaw2-redhat commented Mar 5, 2026

Uh oh!

arlo-scitix commented Mar 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 10, 2026

Uh oh!

arlo-scitix commented Mar 11, 2026

Uh oh!

bbartels commented Mar 12, 2026

Uh oh!

mergify bot commented Mar 14, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arlo-scitix commented Mar 5, 2026 •

edited by github-actions bot

Loading