Skip DeepSpeed ZeRO Stage 3 model initialization when bnb#34395
Skip DeepSpeed ZeRO Stage 3 model initialization when bnb#34395LysandreJik merged 20 commits intohuggingface:mainfrom
Conversation
SunMarc
left a comment
There was a problem hiding this comment.
Thanks for the PR ! Left a few comments
| if hasattr(config, "quantization_config"): | ||
| vision_config.quantization_config = config.quantization_config | ||
| text_config.quantization_config = config.quantization_config |
There was a problem hiding this comment.
We don't want to have this in the model config. Otherwise, we would have to do it for every model. Also, we shouldn't need to do that as we quantize the model from the top level. Maybe we can propagate the quantization_config in from_pretrained ?
There was a problem hiding this comment.
quantization_config is already passed to the model class via from_pretrained, but the sub-models are instantiated using from_config, which does not include it. Perhaps we can propagate the quantization information using a context manager.
src/transformers/modeling_utils.py
Outdated
| is_quantized = hasattr(config, "quantization_config") | ||
|
|
||
| if is_deepspeed_zero3_enabled() and not is_quantized: |
There was a problem hiding this comment.
Not a huge fan of checking the quantization_config here as we don't really quantize the model with from_config. However, I'm not sure if there is an easier solution.Another solution would be to pass an arg in kwargs that we will pop.
There was a problem hiding this comment.
If we pass an argument to flag quantization, it would require changes in every composed model.
|
@SunMarc @muellerzr What do you think of this solution? |
muellerzr
left a comment
There was a problem hiding this comment.
Thanks I think this solution looks quite nice. cc @ArthurZucker
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
SunMarc
left a comment
There was a problem hiding this comment.
Works for me ! Thanks for iterating and coming up with this solution !
…e#34395) * Skip DeepSpeed ZeRO Stage 3 model initialization when it is intended to be quantized. * Propagate the quantization state using a context manager * make fixup
What does this PR do?
Skip DeepSpeed ZeRO Stage 3 model initialization when it is intended to be quantized.
Fixes #34378
Models:
Integrations: