-
Notifications
You must be signed in to change notification settings - Fork 603
Fix bugs in initial_load_in_hf when enable_weight_tying=true in Qwen3 #1964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
d57d3a0
a005f0f
978746f
63d5b5f
8a23f6e
bb5db8e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -104,6 +104,8 @@ def to_hf(self, state_dict: dict[str, Any]) -> dict[str, Any]: | |||||
| else: | ||||||
| if key not in to_hf_map: | ||||||
| continue | ||||||
| if self.model_args.enable_weight_tying and key == "output.weight": | ||||||
| continue | ||||||
| new_key = to_hf_map[key] | ||||||
| hf_state_dict[new_key] = value | ||||||
|
|
||||||
|
|
@@ -118,6 +120,11 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]: | |||||
| state_dict = {} | ||||||
| expert_weights_by_layer = {} # {layer: {abstract_key: {expert_id: tensor}}} | ||||||
|
|
||||||
| if self.model_args.enable_weight_tying and "lm_head.weight" not in hf_state_dict: | ||||||
| if "model.embed_tokens.weight" in hf_state_dict: | ||||||
| hf_state_dict = dict(hf_state_dict) # Make a copy to avoid modifying original | ||||||
|
||||||
| state_dict = self.sd_adapter.from_hf(hf_state_dict) |
It should be mutate the dictionary object (hf_state_dict)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the input
hf_state_dictwill not be used after calling from_hf() function:
state_dict = self.sd_adapter.from_hf(hf_state_dict) It should be mutate the dictionary object (
hf_state_dict)
Ok, I've removed the shallow copy.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here.
1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json
0.6B: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors
I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated
hf_state_dictwill still have keylm_head.weight. Did you verified the updated code still on par with HF forward? cc @shuhuayuThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for catching the bug when loading the qwen3 4b model. I did a forward parity check, it works well.
