Skip to content
Closed
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions torchtitan/models/qwen3/model/state_dict_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ def to_hf(self, state_dict: dict[str, Any]) -> dict[str, Any]:
else:
if key not in to_hf_map:
continue
if self.model_args.enable_weight_tying and key == "output.weight":
Copy link
Contributor

@wwwjn wwwjn Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By checking 0.6B and 1.7B model weights, they do have separate weights for embed_tokens and lm_head, and I assumed these 2 weight are the same (please correct me if I am wrong), so loading the same weights twice are ok here.

1.7B: https://huggingface.co/Qwen/Qwen3-1.7B/blob/main/model.safetensors.index.json
0.6B: https://huggingface.co/Qwen/Qwen3-0.6B/blob/main/model.safetensors

I see your change makes sense. Our previous code will fail when loading the 4B model weights: 4B model doesn't have "lm_head.weight" in their checkpoint files, but our translated hf_state_dict will still have key lm_head.weight. Did you verified the updated code still on par with HF forward? cc @shuhuayu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for catching the bug when loading the qwen3 4b model. I did a forward parity check, it works well.
Image

continue
new_key = to_hf_map[key]
hf_state_dict[new_key] = value

Expand All @@ -118,6 +120,15 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
state_dict = {}
expert_weights_by_layer = {} # {layer: {abstract_key: {expert_id: tensor}}}

if (
self.model_args.enable_weight_tying
and "lm_head.weight" not in hf_state_dict
):
if "model.embed_tokens.weight" in hf_state_dict:
hf_state_dict["lm_head.weight"] = hf_state_dict[
"model.embed_tokens.weight"
]

for key, value in hf_state_dict.items():
if "mlp.experts" in key:
abstract_key = re.sub(r"(\d+)", "{}", key, count=2)
Expand Down
Loading