[Quantization] - Added uses_meta_device_weights to quant config by Josephasafg · Pull Request #34645 · vllm-project/vllm

Josephasafg · 2026-02-16T20:31:25Z

Purpose

As more quant methods are starting to support online quantization we need a more robust way to check that they are loading dummy weights in the same way using process_weights_after_loading

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Josephasafg <ajgard7@gmail.com>

gemini-code-assist

Code Review

This pull request introduces a uses_meta_device_weights method to the quantization configuration, providing a more robust way to handle online quantization. The change refactors a hardcoded check for fp8 quantization to a more generic mechanism. The implementation is sound, but there's a critical issue where a None value for model_config.quantization could cause a crash. I've provided a suggestion to handle this case gracefully.

vllm/model_executor/model_loader/weight_utils.py

vllm/model_executor/layers/quantization/base_config.py

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Josephasafg · 2026-02-17T12:31:08Z

@vkuzo Thanks for the review!

Who should trigger the CI?

vllm/model_executor/layers/quantization/fp8.py

Signed-off-by: Josephasafg <ajgard7@gmail.com>

mgoin · 2026-02-17T21:06:22Z

@Josephasafg @vkuzo I would prefer to keep the information on the linear method itself rather than the top-level quant config. What do you think about this proposal

Add uses_meta_device: bool = False to QuantizeMethodBase in base_config.py
Set uses_meta_device = True on Fp8OnlineLinearMethod and Fp8OnlineMoEMethod (the methods that actually create weights on device="meta")
In initialize_dummy_weights, instead of checking the quant config class, iterate over model.modules() and check if any module has a quant_method with uses_meta_device = True

Something like this:

def initialize_dummy_weights(model, model_config, ...):
    meta_device_params: set[int] = set()
    for module in model.modules():
        qm = getattr(module, "quant_method", None)
        if qm is not None and getattr(qm, "uses_meta_device", False):
            for param in module.parameters(recurse=False):
                meta_device_params.add(id(param))

    for param in model.state_dict().values():
        if id(param) in meta_device_params \
                and param.device == torch.device("meta"):
            continue
        initialize_single_dummy_weight(param, low, high, seed)

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Josephasafg · 2026-02-17T21:27:50Z

@mgoin @vkuzo I made the change but made it a little simpler. How does this look?

def initialize_dummy_weights(
    model: torch.nn.Module,
    model_config: ModelConfig,
    low: float = -1e-3,
    high: float = 1e-3,
    seed: int = 1234,
) -> None:
    def uses_meta_device(module: torch.nn.Module) -> bool:
        quant_method = getattr(module, "quant_method", None)
        return getattr(quant_method, "uses_meta_device", False)

    has_online_quant = any(uses_meta_device(m) for m in model.modules())

    for param in model.state_dict().values():
        if has_online_quant and param.device == torch.device("meta"):
            # For online quantization, weights are created on meta device and
            # dummy weight init will happen in `process_weights_after_loading`.
            continue

        initialize_single_dummy_weight(param, low, high, seed)

mgoin

Nice work, I'm quite happy with this!

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com> Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com>

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com>

Josephasafg added 2 commits February 16, 2026 22:26

Added uses_meta_device_weights to quant config

bcc8c46

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Removed unused var

01d6a6d

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Josephasafg requested review from 22quinn, mgoin, pavanimajety, robertgshaw2-redhat, tlrmchlsmth and yewentao256 as code owners February 16, 2026 20:31

Josephasafg mentioned this pull request Feb 16, 2026

[Quantization] - Consolidate experts_int8 with FP8 Modular Kernels #33178

Closed

6 tasks

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

vllm/model_executor/model_loader/weight_utils.py Outdated Show resolved Hide resolved

vkuzo approved these changes Feb 17, 2026

View reviewed changes

vllm/model_executor/layers/quantization/base_config.py Outdated Show resolved Hide resolved

Josephasafg and others added 2 commits February 17, 2026 14:11

Updated docstring

bf0a3f3

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Merge branch 'main' into add_meta_weights_check

9adcf23

mgoin reviewed Feb 17, 2026

View reviewed changes

vllm/model_executor/layers/quantization/fp8.py Outdated Show resolved Hide resolved

Changed structure

c5eacea

Signed-off-by: Josephasafg <ajgard7@gmail.com>

Josephasafg requested a review from mgoin February 17, 2026 20:46

Simplified code

31281db

Signed-off-by: Josephasafg <ajgard7@gmail.com>

mgoin approved these changes Feb 18, 2026

View reviewed changes

mgoin added ready ONLY add when PR is ready to merge/full CI is needed quantization labels Feb 18, 2026

mgoin enabled auto-merge (squash) February 18, 2026 01:01

Merge branch 'main' into add_meta_weights_check

d108b0c

vllm-bot merged commit 1faa8cb into vllm-project:main Feb 18, 2026
57 of 62 checks passed

ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026

[Quantization] - Added uses_meta_device_weights to quant config (vllm…

c5231ca

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Quantization] - Added uses_meta_device_weights to quant config (vllm…

2929679

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Quantization] - Added uses_meta_device_weights to quant config (vllm…

99c6bbc

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com>

askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026

[Quantization] - Added uses_meta_device_weights to quant config (vllm…

01abb2d

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[Quantization] - Added uses_meta_device_weights to quant config (vllm…

b04de3d

…-project#34645) Signed-off-by: Josephasafg <ajgard7@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization] - Added uses_meta_device_weights to quant config#34645

[Quantization] - Added uses_meta_device_weights to quant config#34645
vllm-bot merged 7 commits intovllm-project:mainfrom
Josephasafg:add_meta_weights_check

Josephasafg commented Feb 16, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Josephasafg commented Feb 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

mgoin commented Feb 17, 2026

Uh oh!

Josephasafg commented Feb 17, 2026

Uh oh!

mgoin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Josephasafg commented Feb 16, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Josephasafg commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mgoin commented Feb 17, 2026

Uh oh!

Josephasafg commented Feb 17, 2026

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Josephasafg commented Feb 16, 2026 •

edited by github-actions bot

Loading

Josephasafg commented Feb 17, 2026 •

edited

Loading