-
Notifications
You must be signed in to change notification settings - Fork 605
Add qwen 2.5 #8355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add qwen 2.5 #8355
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
fc07cc2
Add qwen 2.5
jackzhxng 110abd0
Fix output embedding
jackzhxng 42fdb0d
Comment / lint
jackzhxng 3ab0bd9
Add 1.5 config
jackzhxng 0a17e3b
Comment
jackzhxng a27ed67
Remove qwen rope, use hf rope instead
jackzhxng 8aadf45
Back to meta
jackzhxng 8b0b9f9
Parametrize qkv bias
jackzhxng 52d7a11
Parametrize use hf rope
jackzhxng 347c6fb
Clean up convert_weights
jackzhxng 44aa34d
Add README.md
jackzhxng 93064d2
Bias for static attention
jackzhxng 7f398c5
Merge branch 'main' into jz/export_qwen
jackzhxng d25aaaa
Merge branch 'main' into jz/export_qwen
jackzhxng File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
{ | ||
"dim": 1536, | ||
"ffn_dim_multiplier": 1, | ||
"hidden_dim": 8960, | ||
"n_heads": 12, | ||
"n_kv_heads": 2, | ||
"n_layers": 28, | ||
"norm_eps": 1e-06, | ||
"rope_theta": 1000000.0, | ||
"use_scaled_rope": false, | ||
"vocab_size": 151936, | ||
"use_hf_rope": true, | ||
"attention_qkv_bias": true | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
## Summary | ||
Qwen 2.5 is the latest iteration of the Qwen series of large language models (LLMs) developed by Alibaba. At the moment, 1.5b is currently supporting, with plans in the future for adding the 0.5b and 3b versions. | ||
|
||
## Instructions | ||
|
||
Qwen 2.5 uses the same example code as Llama, while the checkpoint, model params, and tokenizer are different. Please see the [Llama README page](../llama/README.md) for details. | ||
|
||
All commands for exporting and running Llama on various backends should also be applicable to Qwen 2.5, by swapping the following args: | ||
``` | ||
--model qwen2_5 | ||
--params examples/models/qwen2_5/1_5b_config.json | ||
--checkpoint <path-to-meta-checkpoint> | ||
``` | ||
|
||
### Generate the Checkpoint | ||
The original checkpoint can be obtained from HuggingFace: | ||
``` | ||
huggingface-cli download Qwen/Qwen2.5-1.5B | ||
``` | ||
|
||
We then convert it to Meta's checkpoint format: | ||
``` | ||
python examples/models/qwen2_5/convert_weights.py <path-to-checkpoint-dir> <output-path> | ||
``` | ||
|
||
### Example export and run | ||
Here is an basic example for exporting and running Qwen 2.5, although please refer to [Llama README page](../llama/README.md) for more advanced usage. | ||
|
||
Export to XNNPack, no quantization: | ||
``` | ||
# No quantization | ||
# Set these paths to point to the downloaded files | ||
QWEN_CHECKPOINT=path/to/checkpoint.pth | ||
|
||
python -m examples.models.llama.export_llama \ | ||
--model "qwen2_5" \ | ||
--checkpoint "${QWEN_CHECKPOINT:?}" \ | ||
--params examples/models/qwen2_5/1_5b_config.json \ | ||
-kv \ | ||
--use_sdpa_with_kv_cache \ | ||
-d fp32 \ | ||
-X \ | ||
--metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}' \ | ||
--output_name="qwen2_5-1_5b.pte" | ||
--verbose | ||
``` | ||
|
||
Run using the executor runner: | ||
``` | ||
# Currently a work in progress, just need to enable HuggingFace json tokenizer in C++. | ||
# In the meantime, can run with an example Python runner with pybindings: | ||
|
||
python -m examples.models.llama.runner.native | ||
--model qwen2_5 | ||
--pte <path-to-pte> | ||
-kv | ||
--tokenizer <path-to-tokenizer>/tokenizer.json | ||
--tokenizer_config <path-to_tokenizer>/tokenizer_config.json | ||
--prompt "Who is the founder of Meta?" | ||
--params examples/models/qwen2_5/1_5b_config.json | ||
--max_len 64 | ||
--temperature 0 | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
import argparse | ||
from typing import Dict | ||
|
||
import torch | ||
|
||
from torchtune.models.convert_weights import get_mapped_key | ||
|
||
from torchtune.training import FullModelHFCheckpointer | ||
|
||
# Standard _FROM_META weight mapping of Meta weights to TorchTune + additional bias weight mappings. | ||
_QWEN_2_FROM_META = { | ||
"tok_embeddings.weight": "tok_embeddings.weight", | ||
"norm.weight": "norm.scale", | ||
"layers.{}.attention.wk.weight": "layers.{}.attn.k_proj.weight", | ||
"layers.{}.attention.wk.bias": "layers.{}.attn.k_proj.bias", | ||
"layers.{}.attention.wq.weight": "layers.{}.attn.q_proj.weight", | ||
"layers.{}.attention.wq.bias": "layers.{}.attn.q_proj.bias", | ||
"layers.{}.attention.wv.weight": "layers.{}.attn.v_proj.weight", | ||
"layers.{}.attention.wv.bias": "layers.{}.attn.v_proj.bias", | ||
"layers.{}.attention.wo.weight": "layers.{}.attn.output_proj.weight", | ||
"layers.{}.attention_norm.weight": "layers.{}.sa_norm.scale", | ||
"layers.{}.ffn_norm.weight": "layers.{}.mlp_norm.scale", | ||
"layers.{}.feed_forward.w1.weight": "layers.{}.mlp.w1.weight", | ||
"layers.{}.feed_forward.w2.weight": "layers.{}.mlp.w2.weight", | ||
"layers.{}.feed_forward.w3.weight": "layers.{}.mlp.w3.weight", | ||
} | ||
|
||
|
||
def qwen_2_tune_to_meta(state_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]: | ||
""" | ||
Convert a state dict from torchtune's format to Meta's format. This function | ||
doesn't handle any sharding or splitting of state dicts. It follows the | ||
state_dict IN -> state_dict OUT pattern. | ||
|
||
Args: | ||
state_dict (Dict[str, torch.Tensor]): State dict in torchtune's format. | ||
|
||
Returns: | ||
Dict[str, torch.Tensor]: State dict in Meta's format. | ||
""" | ||
converted_state_dict = {} | ||
inverted_mapping_dict = {v: k for k, v in _QWEN_2_FROM_META.items()} | ||
|
||
for key, value in state_dict.items(): | ||
new_key = get_mapped_key(key, inverted_mapping_dict) | ||
converted_state_dict[new_key] = value | ||
|
||
# 0.5b and 1.5b models share the same weights for tok_embeddings and output embeddings, see https://github.com/QwenLM/Qwen2.5/issues/733. | ||
converted_state_dict["output.weight"] = converted_state_dict[ | ||
"tok_embeddings.weight" | ||
] | ||
|
||
return converted_state_dict | ||
|
||
|
||
def main(): | ||
parser = argparse.ArgumentParser( | ||
description="Convert Qwen2 weights to Meta format." | ||
) | ||
parser.add_argument( | ||
"input_dir", | ||
type=str, | ||
help="Path to directory containing checkpoint files", | ||
) | ||
parser.add_argument("output", type=str, help="Path to the output checkpoint") | ||
|
||
args = parser.parse_args() | ||
|
||
# Don't necessarily need to use TorchTune checkpointer, can just aggregate checkpoint files by ourselves. | ||
checkpointer = FullModelHFCheckpointer( | ||
# checkpoint_dir="/home/jackzhxng/.cache/huggingface/hub/models--Qwen--Qwen2.5-1.5B/snapshots/8faed761d45a263340a0528343f099c05c9a4323/", | ||
checkpoint_dir=args.input_dir, | ||
checkpoint_files=["model.safetensors"], | ||
output_dir=".", | ||
model_type="QWEN2", | ||
) | ||
|
||
print("Loading checkpoint...") | ||
sd = checkpointer.load_checkpoint() | ||
|
||
print("Converting checkpoint...") | ||
sd = qwen_2_tune_to_meta(sd["model"]) | ||
# torch.save(sd, "/home/jackzhxng/models/qwen2_5-1_5b.pth") | ||
jackzhxng marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
torch.save(sd, args.output) | ||
print(f"Checkpoint saved to {args.output}") | ||
|
||
|
||
if __name__ == "__main__": | ||
main() |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.