Skip to content

[Quantization] - Added uses_meta_device_weights to quant config#34645

Merged
vllm-bot merged 7 commits intovllm-project:mainfrom
Josephasafg:add_meta_weights_check
Feb 18, 2026
Merged

[Quantization] - Added uses_meta_device_weights to quant config#34645
vllm-bot merged 7 commits intovllm-project:mainfrom
Josephasafg:add_meta_weights_check

Conversation

@Josephasafg
Copy link
Copy Markdown
Contributor

@Josephasafg Josephasafg commented Feb 16, 2026

Purpose

As more quant methods are starting to support online quantization we need a more robust way to check that they are loading dummy weights in the same way using process_weights_after_loading

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Josephasafg <ajgard7@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a uses_meta_device_weights method to the quantization configuration, providing a more robust way to handle online quantization. The change refactors a hardcoded check for fp8 quantization to a more generic mechanism. The implementation is sound, but there's a critical issue where a None value for model_config.quantization could cause a crash. I've provided a suggestion to handle this case gracefully.

Josephasafg and others added 2 commits February 17, 2026 14:11
Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg
Copy link
Copy Markdown
Contributor Author

Josephasafg commented Feb 17, 2026

@vkuzo Thanks for the review!

Who should trigger the CI?

Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg Josephasafg requested a review from mgoin February 17, 2026 20:46
@mgoin
Copy link
Copy Markdown
Member

mgoin commented Feb 17, 2026

@Josephasafg @vkuzo I would prefer to keep the information on the linear method itself rather than the top-level quant config. What do you think about this proposal

  1. Add uses_meta_device: bool = False to QuantizeMethodBase in base_config.py
  2. Set uses_meta_device = True on Fp8OnlineLinearMethod and Fp8OnlineMoEMethod (the methods that actually create weights on device="meta")
  3. In initialize_dummy_weights, instead of checking the quant config class, iterate over model.modules() and check if any module has a quant_method with uses_meta_device = True

Something like this:

def initialize_dummy_weights(model, model_config, ...):
    meta_device_params: set[int] = set()
    for module in model.modules():
        qm = getattr(module, "quant_method", None)
        if qm is not None and getattr(qm, "uses_meta_device", False):
            for param in module.parameters(recurse=False):
                meta_device_params.add(id(param))

    for param in model.state_dict().values():
        if id(param) in meta_device_params \
                and param.device == torch.device("meta"):
            continue
        initialize_single_dummy_weight(param, low, high, seed)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
@Josephasafg
Copy link
Copy Markdown
Contributor Author

@mgoin @vkuzo I made the change but made it a little simpler. How does this look?

def initialize_dummy_weights(
    model: torch.nn.Module,
    model_config: ModelConfig,
    low: float = -1e-3,
    high: float = 1e-3,
    seed: int = 1234,
) -> None:
    def uses_meta_device(module: torch.nn.Module) -> bool:
        quant_method = getattr(module, "quant_method", None)
        return getattr(quant_method, "uses_meta_device", False)

    has_online_quant = any(uses_meta_device(m) for m in model.modules())

    for param in model.state_dict().values():
        if has_online_quant and param.device == torch.device("meta"):
            # For online quantization, weights are created on meta device and
            # dummy weight init will happen in `process_weights_after_loading`.
            continue

        initialize_single_dummy_weight(param, low, high, seed)

Copy link
Copy Markdown
Member

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I'm quite happy with this!

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed quantization labels Feb 18, 2026
@mgoin mgoin enabled auto-merge (squash) February 18, 2026 01:01
@vllm-bot vllm-bot merged commit 1faa8cb into vllm-project:main Feb 18, 2026
57 of 62 checks passed
jasonozuzu-cohere pushed a commit to jasonozuzu-cohere/vllm that referenced this pull request Feb 18, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Jason Ozuzu <jasonozuzu@cohere.com>
ZJY0516 pushed a commit to ZJY0516/vllm that referenced this pull request Feb 23, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…-project#34645)

Signed-off-by: Josephasafg <ajgard7@gmail.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants