[CI]revert initialize_model context manager#38426
Conversation
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request refactors the model initialization logic in base_loader.py by nesting the target_device context manager within the set_default_torch_dtype context manager. Additionally, the log_model_inspection call has been moved outside of the target_device context. I have no feedback to provide as there are no review comments to evaluate.
|
@mgoin PTAL |
|
Hope this PR can also fix the OOM issues for Entrypoints Integration (Pooling) and Language Models Test (Extended Pooling) in the Full CI run - daily. |
|
Thanks for the fix. This breaks the online reloading logic, I'm working on debugging this |
|
@kylesayrs you are right. I notice an online quant case fail in intel gpu ci. https://buildkite.com/vllm/intel-ci/builds/552#019d36ad-6cba-46a3-ae74-68b7b34f52f9 |
materialize_meta_tensor() creates tensors via torch.empty_strided() without a device= argument, relying on the ambient torch.device context. After vllm-project#38426 moved load_weights outside the device context (to fix OOM for non-quantized models), the online FP8 quantization path in _layerwise_process() started materializing tensors on CPU instead of GPU, causing process_weights_after_loading to fail with: NotImplementedError: Could not run '_C::dynamic_scaled_fp8_quant' with arguments from the 'CPU' backend. Fix: add an explicit device parameter to materialize_meta_tensor() and materialize_layer() (default None preserves backward compatibility), and pass current_platform.device_type from the call sites in layerwise.py and dummy_loader.py that run outside a device context. Signed-off-by: root <root@gb10-runpod.localdomain> Signed-off-by: haosdent <haosdent@gmail.com>
After vllm-project#38426 narrowed the `with target_device:` context to only wrap `initialize_model()`, code that relied on the ambient device context during `load_weights()` started creating tensors on CPU instead of GPU. This fixes three locations: 1. `materialize_meta_tensor()` / `materialize_layer()` — accept an explicit `device` parameter instead of relying on the ambient `torch.device` context. 2. `DummyModelLoader.load_weights()` — passes `device=current_platform.device_type` when materializing meta tensors. 3. `Fp8OnlineMoEMethod.process_weights_after_loading()` — the `torch.ones` calls for `w13_scale` / `w2_scale` now specify `device=layer.w13_weight.device` so the scale tensors land on the same GPU as the already-materialized weights. Signed-off-by: haosdent <haosdent@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Elham Harirpoush <elham.harirpoush@arm.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: zhutaoyu <zhutaoyu97@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: neweyes <328719365@qq.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: EricccYang <yangyang4991@gmail.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Purpose
fix
Language Models Tests (Extra Standard) 2failed case in CI#38032 modified
initialize_modelcontext manager, which may cause some memory realted issue.fail latest nightly: https://buildkite.com/vllm/ci/builds/58555#019d3308-7864-45d2-ac05-e01db010dbd5
fail in commit 648edcf729 https://buildkite.com/vllm/ci/builds/58506#019d30f7-d28f-42f4-976e-b4f385f2fe42
seems
Language Models Tests (Extra Standard)also failed due to this issue.Test Plan
run
Language Models Tests (Extra Standard) 2caseTest Result
pass in https://buildkite.com/vllm/ci/builds/58571#019d33f0-f980-4d77-beb2-6af68ca92aed
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.