Revert "[QeRL] Compose online quantization with quantized reloading" (#38032)#38446
Revert "[QeRL] Compose online quantization with quantized reloading" (#38032)#38446zhewenl wants to merge 1 commit intovllm-project:mainfrom
Conversation
…llm-project#38032)" This reverts commit 648edcf.
There was a problem hiding this comment.
Code Review
This pull request refactors the online quantization and weight reloading mechanism by introducing just-in-time weight materialization through a patched weight loader. This replaces the previous layer-wise processing approach for FP8 and MoE quantization methods. Additionally, the dummy weight initialization logic is updated to skip parameters on meta devices when online quantization is enabled. A critical issue was identified regarding the removal of the @torch.no_grad() decorator from the initialize_single_dummy_weight function, which is necessary for safe in-place weight modifications.
| initialize_single_dummy_weight(param, low, high, seed) | ||
|
|
||
|
|
||
| @torch.no_grad() |
There was a problem hiding this comment.
The @torch.no_grad() decorator should not be removed from initialize_single_dummy_weight. This function performs in-place modifications on model parameters (param.uniform_). It's crucial to wrap such operations in torch.no_grad() to prevent unintended side effects with the autograd engine and to adhere to best practices for weight manipulation.
|
latest fix #38442 |
|
This pull request has merge conflicts that must be resolved before it can be |
Revert of #38032
This reverts #38032 (merge commit 648edcf).
Reason: 2 new CI failures in build #58604 traced to this PR:
test_fp8_kv_scale_compile)test_online_quantization,test_online_quant_peak_mem,test_online_quant_load_format_dummy)Root cause:
reload/layerwise.py:_layerwise_process()callsprocess_weights_after_loading()which invokesops.scaled_fp8_quant()→dynamic_scaled_fp8_quanton CPU tensors, raisingNotImplementedError: Could not run '_C::dynamic_scaled_fp8_quant' with arguments from the 'CPU' backend.Note: Merge conflict in
base_loader.pywas auto-resolved (trivial formatting conflict from #38426). The resolution preserves current formatting while undoing only #38032 changes — please review carefully.Auto-generated by CI failure analyzer