[CI]revert initialize_model context manager by jikunshang · Pull Request #38426 · vllm-project/vllm

jikunshang · 2026-03-28T10:14:01Z

Purpose

fix Language Models Tests (Extra Standard) 2 failed case in CI
#38032 modified initialize_model context manager, which may cause some memory realted issue.

fail latest nightly: https://buildkite.com/vllm/ci/builds/58555#019d3308-7864-45d2-ac05-e01db010dbd5
fail in commit 648edcf729 https://buildkite.com/vllm/ci/builds/58506#019d30f7-d28f-42f4-976e-b4f385f2fe42

seems Language Models Tests (Extra Standard) also failed due to this issue.

Test Plan

run Language Models Tests (Extra Standard) 2 case

Test Result

pass in https://buildkite.com/vllm/ci/builds/58571#019d33f0-f980-4d77-beb2-6af68ca92aed

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the model initialization logic in base_loader.py by nesting the target_device context manager within the set_default_torch_dtype context manager. Additionally, the log_model_inspection call has been moved outside of the target_device context. I have no feedback to provide as there are no review comments to evaluate.

jikunshang · 2026-03-28T11:23:28Z

@mgoin PTAL

noooop · 2026-03-28T15:19:37Z

Hope this PR can also fix the OOM issues for Entrypoints Integration (Pooling) and Language Models Test (Extended Pooling) in the Full CI run - daily.

cc @DarkLight1337

kylesayrs · 2026-03-28T20:20:42Z

Thanks for the fix. This breaks the online reloading logic, I'm working on debugging this

jikunshang · 2026-03-29T00:31:31Z

@kylesayrs you are right. I notice an online quant case fail in intel gpu ci. https://buildkite.com/vllm/intel-ci/builds/552#019d36ad-6cba-46a3-ae74-68b7b34f52f9

materialize_meta_tensor() creates tensors via torch.empty_strided() without a device= argument, relying on the ambient torch.device context. After vllm-project#38426 moved load_weights outside the device context (to fix OOM for non-quantized models), the online FP8 quantization path in _layerwise_process() started materializing tensors on CPU instead of GPU, causing process_weights_after_loading to fail with: NotImplementedError: Could not run '_C::dynamic_scaled_fp8_quant' with arguments from the 'CPU' backend. Fix: add an explicit device parameter to materialize_meta_tensor() and materialize_layer() (default None preserves backward compatibility), and pass current_platform.device_type from the call sites in layerwise.py and dummy_loader.py that run outside a device context. Signed-off-by: root <root@gb10-runpod.localdomain> Signed-off-by: haosdent <haosdent@gmail.com>

After vllm-project#38426 narrowed the `with target_device:` context to only wrap `initialize_model()`, code that relied on the ambient device context during `load_weights()` started creating tensors on CPU instead of GPU. This fixes three locations: 1. `materialize_meta_tensor()` / `materialize_layer()` — accept an explicit `device` parameter instead of relying on the ambient `torch.device` context. 2. `DummyModelLoader.load_weights()` — passes `device=current_platform.device_type` when materializing meta tensors. 3. `Fp8OnlineMoEMethod.process_weights_after_loading()` — the `torch.ones` calls for `w13_scale` / `w2_scale` now specify `device=layer.w13_weight.device` so the scale tensors land on the same GPU as the already-materialized weights. Signed-off-by: haosdent <haosdent@gmail.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Vinay Damodaran <vrdn@hey.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: neweyes <328719365@qq.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: EricccYang <yangyang4991@gmail.com>

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>

revert initialize_model context manager

0aa3d8a

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>

jikunshang requested a review from 22quinn as a code owner March 28, 2026 10:14

claude bot reviewed Mar 28, 2026

View reviewed changes

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 28, 2026

gemini-code-assist bot reviewed Mar 28, 2026

View reviewed changes

jikunshang mentioned this pull request Mar 28, 2026

[QeRL] Compose online quantization with quantized reloading #38032

Merged

Merge branch 'main' into kunshang/ci_fix_extra_stand

e6e2fe3

noooop enabled auto-merge (squash) March 28, 2026 14:59

noooop approved these changes Mar 28, 2026

View reviewed changes

noooop merged commit aa4eb0d into vllm-project:main Mar 28, 2026
53 checks passed

kylesayrs mentioned this pull request Mar 28, 2026

[QeRL] Fix online quantized reloading #38442

Merged

zhewenl mentioned this pull request Mar 29, 2026

Revert "[QeRL] Compose online quantization with quantized reloading" (#38032) #38446

Draft

haosdent mentioned this pull request Mar 29, 2026

[CI] Fix online FP8 quantization materializing tensors on CPU #38456

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CI]revert initialize_model context manager#38426

[CI]revert initialize_model context manager#38426
noooop merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/ci_fix_extra_stand

jikunshang commented Mar 28, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jikunshang commented Mar 28, 2026

Uh oh!

noooop commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

kylesayrs commented Mar 28, 2026

Uh oh!

jikunshang commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jikunshang commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jikunshang commented Mar 28, 2026

Uh oh!

noooop commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Mar 28, 2026

Uh oh!

jikunshang commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jikunshang commented Mar 28, 2026 •

edited

Loading

noooop commented Mar 28, 2026 •

edited

Loading