[QeRL] Layerwise Reloading#32133
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the model reloading logic, replacing the previous torchao-specific online quantization mechanism with a more general, layer-wise approach. The new implementation in reload_utils.py leverages meta tensors to manage weight reloading and processing, aiming for better compatibility, especially with CUDA graphs. While the direction is good, the new reload_utils.py file contains several critical bugs that will prevent it from working correctly, including issues with dictionary manipulation, incorrect variable references, and improper attribute access on torch.nn.Module instances. I've provided specific comments and suggestions to fix these issues.
cabd2e3 to
8652b3f
Compare
72d7c4c to
14605fd
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
f62af57 to
1ddaff2
Compare
|
This pull request has merge conflicts that must be resolved before it can be |
95452a2 to
0128190
Compare
2755952 to
07aa6cc
Compare
| if weights_path is not None: | ||
| self.model_config.model = weights_path |
There was a problem hiding this comment.
Why is this needed? Also do we need to do validation on this path? It seems like a late state to change this shared config
There was a problem hiding this comment.
The ability to pass weights_path is a nice-to-have feature, makes testing easier and allows users to pass directory paths, which is a real use case.
Do you think we need to change anything more than self.model_config.model?
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
ad9b548 to
363b92e
Compare
Summary: vllm-project#32133 missed a rebase on vllm-project#32064, fixing the attention path import Test Plan: ```bash // before this PR, the test runner failed because the old attention // import path no longer exists pytest tests/quantization/test_fp8.py -s -x ``` Reviewers: Subscribers: Tasks: Tags: Signed-off-by: vasiliy <vasiliy@fb.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Pai <416932041@qq.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Purpose
Nomenclature
Details
initbut before weights are processed withprocess_weights_after_loading. The mapping between “checkpoint format” and “model format” is implemented bymodel.load_weights.process_weights_after_loadingDesign
a. the layer is materialized on device
b. the weights are loaded into the layer
c. the layer is processed into kernel format
d. the new tensors are copied into the original tensor storage
Changes
record_metadata_for_reloadingis called at the end ofinitialize_modelinitialize_layerwise_reloadis called before weights are loaded a second timefinalize_layerwise_reloadis called after weights are loaded a secondsupport_quantized_model_reload_from_hp_weightsis a decorator which callsinitialize_layerwise_reloadandfinalize_layerwise_reload, preserves backwards compatibility for torchao usersLAYER_RELOADING_INFOlm.collective_rpc("reload_weights")weights_iteratordirectlyweights_pathto point to another checkpoint on diskcheckpoint_formatcontrols whether the weights are provided in model format (default) or kernel format (quantized, renamed, and sharded)generate_prompt_perplexityby allowing users to pass a maskTesting
test_online_quant_config_dict_jsonto actually test online quantization via themodel.load_weightspathwayFuture Work