-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecated shard_checkpoint
's replacement save_torch_state_dict
does not save tied embeddings
#35080
Comments
Thanks for the report ! The problem is that You can either do the cleaning step yourself (copy Also, I wanted to know why you decided not to use |
I would prefer a fix in huggingface hub. Why was the behavior changed from I'm generally not sure if I find it feasible to use the workaround you suggested. Doesn't that require that I know which keys per model that should not be discarded?
It's been a long time since I implemented |
cc @hanouticelina as you've been working on this lately. Would it be possible to check what we can do in |
A fix would be to modify The trade-off is that the resulting safetensors file might be larger. Also as far as i know, some frameworks (like tensorflow) don't support shared tensors, which could limit cross framework compatibility. |
I would appreciate if this change could make it into the new method :) |
after reviewing the looking at how AutoAWQ uses save_torch_state_dict, we have access to the model itself. So to fix this issue, we will add a helper function in huggingface_hub to handle the priority for discarding duplicate keys (similar to transformers' Once this is fixed in huggingface_hub, you'll be able to simply use + from huggingface_hub import save_torch_model
...
- save_torch_state_dict(
- state_dict=self.model.state_dict(),
+ save_torch_model(
+ model=self.model,
save_directory=save_dir,
max_shard_size=shard_size,
safe_serialization=safetensors,
force_contiguous=True,
) would this solution work for you? |
@hanouticelina Yes, that would work for me, as long as the model is saved correctly in the case of tied weights. Currently, AutoAWQ quantization is broken from version 0.2.7 because of this issue. So users will have to take extra steps as seen in casper-hansen/AutoAWQ#665 until this fix can be landed in AutoAWQ. |
Could you test with this PR to see if this solves the issue @casper-hansen ? |
@casper-hansen After reviewing this further, we decided not to add the duplicate keys handling logic directly in |
@hanouticelina @SunMarc this looks good. Is my understanding correct that passing in |
Ok, so I validated this works when I pass in import os
import safetensors
quant_path = "Qwen2.5-0.5B-Instruct-awq"
tensors = {}
with safetensors.safe_open(
os.path.join(quant_path, "model.safetensors"), framework="pt", device="cpu"
) as f:
print("model.embed_tokens.weight" in f.keys()) |
@casper-hansen we've just released a patch for feel free to ping us if there any additional question or issue related to that! |
Thanks @hanouticelina for the quick fix + release. |
System Info
Who can help?
@SunMarc @ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The following
shard_checkpoint
has been deprecated in favor ofsave_torch_state_dict
, so that's why I updated the saving mechanism in AutoAWQ to use the new method in casper-hansen/AutoAWQ#644. However, it seems there is a problem where tied embeddings are not correctly saved and thus causing problems during load time in vLLM and potentially other places not identified yet.Overview from casper-hansen/AutoAWQ#665 where you can also see the full reproduction scripts and the issues caused.
model.embed_tokens
lm_head
Qwen/Qwen2.5-1.5B-Instruct
transformers==4.46.3
load and saveautoawq==0.2.6
(shard_checkpoint
)autoawq==0.2.7.post2
(save_torch_state_dict
)Expected behavior
shard_checkpoint
seems to have saved tied weights which are important in a lot of engines compatible with Huggingface transformers. The expected behavior is therefore thatsave_torch_state_dict
would also do this since we are being migrated to use this new method.The text was updated successfully, but these errors were encountered: