Hqq serialization#33141
Hqq serialization#33141SunMarc merged 29 commits intohuggingface:mainfrom mobiusml:hqq_serialization
Conversation
|
1/3 |
SunMarc
left a comment
There was a problem hiding this comment.
Nice ! Let's fix the issue regarding the torchao backend and we can merge this. I left a few comments
|
2/3: Multi-gpu loading 3/3: state_dict on the same safetensor chunk model_id = 'meta-llama/Meta-Llama-3-8B-Instruct' #OK
model_id = 'meta-llama/Meta-Llama-3-70B' #OK
model_id = "facebook/opt-125m" #OK
model_id = "meta-llama/Llama-2-13b-chat-hf" #OK
model_id = "microsoft/Phi-3-mini-128k-instruct" #OK
model_id = "google/gemma-2-9b-it" #OK
model_id = "google/gemma-2-2b" #OKso I think for the moment we can leave it until someone reports some issue, I can't reproduce the problem anyway. Next steps: |
|
|
Regarding this: #33141 (comment) |
|
Just for curiosity, what miss to merge? |
Waiting for @mobicham to check the latest review and give me to heads-up to merge ! This should be done soon ! Also it looks like that there are some conflits to fix |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
Thanks for iterating @mobicham! Merging! |
* HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
@mobicham minor documentation issue, but the transformers documentation page for quantization has a giant features matrix which still says serialization of HQQ models is not supported https://huggingface.co/docs/transformers/main/quantization/overview |
|
Would you like to open a PR to fix this @rohit-gupta ? |
|
@rohit-gupta thanks for flagging ! |
|
now model.save_pretrained(save_path) give this: |
|
@blap is this related to the latest transformer changes? Otherwise, which hqq version causes this? |
I think so. I didn't had this problem in the release of hqq in transformers. |
* HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
Transformers version 4.48.0.dev0 still has this problem... |
|
Any one from the HF team can track down this problem please? What changed ? Nothing on the hqq lib side changed much. |
|
@SunMarc ? |
|
Can you share your script @blap ? I'll have a look asap ! |
Error: |
|
So... |
|
@blap why don't you use the latest release ? It works fine last time I tried (last week) |
Which version do you use? Version 4.45.2 give me this: |
|
@blap |
I just got the same error in this version too. |
# pip install transformers==4.47.0;
# pip install hqq --upgrade;
##################################################################
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig
model_path = "meta-llama/Meta-Llama-3-8B-Instruct"
quant_model = "quant_model"
quant_config = HqqConfig(nbits=4, group_size=64, axis=1)
model = AutoModelForCausalLM.from_pretrained(model_path,
torch_dtype=torch.float16,
cache_dir='.',
device_map="cuda:0",
quantization_config=quant_config,
low_cpu_mem_usage=True)
tokenizer = AutoTokenizer.from_pretrained(model_path)
model.save_pretrained(quant_model)
tokenizer.save_pretrained(quant_model) |
|
I found the problem: |
Hmm interesting, thanks for flagging! Fixed here. Would recommend using 64 or 128 though, some of the fast kernels like Marlin in VLLM and TinyGemm in torchao don't support |
Follow-up to #32379
The goal of this PR is to add full support to save/load HQQ-quantized models directly in transformers. So far, serialization was done on the hqq-lib side via the
.ptformat which is not safe and doesn't work with very large models (>100B params) since the model is not sharded.What was done during this PR:
update_expected_keys()call in the quantizer. This allows loading quantized models that were initialized withtorch.nn.LinearinsteadFull gist to try it out: https://gist.github.com/mobicham/701dd564c52590203ee09631425ad797