Support loading hugging face checkpoint#1165
Conversation
* Support loading checkpoint with INC * load_cp explanation * Add torch_dtype bf16 for the model * Support getting quantized weight tensor --------- Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai>
| **model_kwargs, | ||
| ) | ||
| elif args.load_cp: | ||
| from neural_compressor.torch.quantization import load |
There was a problem hiding this comment.
@ulivne sounds like neural_compressor is missing from the requirements consider adding it!
There was a problem hiding this comment.
@ulivne sounds like
neural_compressoris missing from therequirementsconsider adding it!
neural_compressor is installed automatically as part of habana software stack. it replaces habana_quantization_toolkit which was also not part of requirements.
Support loading hugging face checkpoint huggingface#1165
Support loading hugging face checkpoint huggingface#1165
|
Removed test as it is uses a huggingface model, and we are not sure that we can use in our open source code (in terms of licence, intel regulations) |
this flag triggers neural compressor load api, which also supports loading of high precision models, and is planned to support loading of fp8 models in the future. it is correct that currently for gaudi only 4bit is suppoerted, however i think we should keep the load_cp name to avoid changing it in the future when we add loading of fp8 models. |
|
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| action="store_true", | ||
| help="Whether to load model from hugging face checkpoint.", | ||
| ) | ||
|
|
Co-authored-by: yan tomsinsky <ytomsinsky@habana.ai> Co-authored-by: Libin Tang <litang@habana.ai>
Support loading checkpoint with INC
load_cp explanation
Add torch_dtype bf16 for the model
Support getting quantized weight tensor