Skip to content

[CI]revert initialize_model context manager#38426

Merged
noooop merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/ci_fix_extra_stand
Mar 28, 2026
Merged

[CI]revert initialize_model context manager#38426
noooop merged 2 commits intovllm-project:mainfrom
jikunshang:kunshang/ci_fix_extra_stand

Conversation

@jikunshang
Copy link
Copy Markdown
Collaborator

@jikunshang jikunshang commented Mar 28, 2026

Purpose

fix Language Models Tests (Extra Standard) 2 failed case in CI
#38032 modified initialize_model context manager, which may cause some memory realted issue.

fail latest nightly: https://buildkite.com/vllm/ci/builds/58555#019d3308-7864-45d2-ac05-e01db010dbd5
fail in commit 648edcf729 https://buildkite.com/vllm/ci/builds/58506#019d30f7-d28f-42f4-976e-b4f385f2fe42

seems Language Models Tests (Extra Standard) also failed due to this issue.

Test Plan

run Language Models Tests (Extra Standard) 2 case

Test Result

pass in https://buildkite.com/vllm/ci/builds/58571#019d33f0-f980-4d77-beb2-6af68ca92aed


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
@jikunshang jikunshang requested a review from 22quinn as a code owner March 28, 2026 10:14
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the model initialization logic in base_loader.py by nesting the target_device context manager within the set_default_torch_dtype context manager. Additionally, the log_model_inspection call has been moved outside of the target_device context. I have no feedback to provide as there are no review comments to evaluate.

@jikunshang
Copy link
Copy Markdown
Collaborator Author

@mgoin PTAL

@noooop noooop enabled auto-merge (squash) March 28, 2026 14:59
@noooop
Copy link
Copy Markdown
Collaborator

noooop commented Mar 28, 2026

Hope this PR can also fix the OOM issues for Entrypoints Integration (Pooling) and Language Models Test (Extended Pooling) in the Full CI run - daily.

cc @DarkLight1337

@noooop noooop merged commit aa4eb0d into vllm-project:main Mar 28, 2026
53 checks passed
@kylesayrs
Copy link
Copy Markdown
Contributor

Thanks for the fix. This breaks the online reloading logic, I'm working on debugging this

@jikunshang
Copy link
Copy Markdown
Collaborator Author

@kylesayrs you are right. I notice an online quant case fail in intel gpu ci. https://buildkite.com/vllm/intel-ci/builds/552#019d36ad-6cba-46a3-ae74-68b7b34f52f9

haosdent added a commit to haosdent/vllm that referenced this pull request Mar 29, 2026
materialize_meta_tensor() creates tensors via torch.empty_strided()
without a device= argument, relying on the ambient torch.device context.

After vllm-project#38426 moved load_weights outside the device context (to fix OOM
for non-quantized models), the online FP8 quantization path in
_layerwise_process() started materializing tensors on CPU instead of
GPU, causing process_weights_after_loading to fail with:
  NotImplementedError: Could not run '_C::dynamic_scaled_fp8_quant'
  with arguments from the 'CPU' backend.

Fix: add an explicit device parameter to materialize_meta_tensor() and
materialize_layer() (default None preserves backward compatibility),
and pass current_platform.device_type from the call sites in
layerwise.py and dummy_loader.py that run outside a device context.

Signed-off-by: root <root@gb10-runpod.localdomain>

Signed-off-by: haosdent <haosdent@gmail.com>
haosdent added a commit to haosdent/vllm that referenced this pull request Mar 29, 2026
After vllm-project#38426 narrowed the `with target_device:` context to only wrap
`initialize_model()`, code that relied on the ambient device context
during `load_weights()` started creating tensors on CPU instead of GPU.

This fixes three locations:

1. `materialize_meta_tensor()` / `materialize_layer()` — accept an
   explicit `device` parameter instead of relying on the ambient
   `torch.device` context.

2. `DummyModelLoader.load_weights()` — passes
   `device=current_platform.device_type` when materializing meta
   tensors.

3. `Fp8OnlineMoEMethod.process_weights_after_loading()` — the
   `torch.ones` calls for `w13_scale` / `w2_scale` now specify
   `device=layer.w13_weight.device` so the scale tensors land on
   the same GPU as the already-materialized weights.

Signed-off-by: haosdent <haosdent@gmail.com>
Elm8116 pushed a commit to Elm8116/vllm that referenced this pull request Mar 30, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com>
vrdn-23 pushed a commit to vrdn-23/vllm that referenced this pull request Mar 30, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
benenzhu pushed a commit to benenzhu/vllm that referenced this pull request Mar 31, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>
neweyes pushed a commit to neweyes/vllm that referenced this pull request Mar 31, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: neweyes <328719365@qq.com>
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
bhargav-patel-29 pushed a commit to Bharatgen-Tech/vllm that referenced this pull request Apr 1, 2026
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants