-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert.py : Get rope scale from HuggingFace models #2772
Conversation
I am looking at the "rope_scaling": {
"factor": 4.0,
"type": "linear"
}, This PR seems to ignore the |
Makes sense! I added that check now, and tested it to make sure it is ignored if the type = linear check fails (and that it still works if type is linear). |
convert.py
Outdated
n_head = config["num_attention_heads"] | ||
n_head_kv = config["num_key_value_heads"] if "num_key_value_heads" in config else n_head | ||
f_norm_eps = config["rms_norm_eps"] | ||
f_rope_scale = config.get("rope_scaling", {}).get("factor", None) if config.get("rope_scaling", {}).get("type", "") == "linear" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dict.get returns None as a fallback by default. This would be more readable:
f_rope_scale = config.get("rope_scaling", {}).get("factor", None) if config.get("rope_scaling", {}).get("type", "") == "linear" else None | |
rope_scaling = config.get("rope_scaling", {}) | |
f_rope_scale = rope_scaling.get("factor") if rope_scaling.get("type") == "linear" else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be happy with even a couple of plain if
statements. But yeah, that one-liner a bit hard to read, anything that improves the readability would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was overfitting a bit to the other implementations trying to match the style. I am not a fan of tertiary operators or long lines, so I would also prefer an if/else
implementation (tested and committed now).
One other thing I had thought of is whether it is worth raising an Exception if Edit: On further thought, it could take a while to see where people converge on rope scaling methods. I get the sense linear might win out. So I think it is fine to merge now, since this fixes a lot of models on HF. |
Hey guys, just tried convert.py with a model with no rope_scaling defined and got this error:
I'd suggest to use this code instead:
|
Strange! What model was this? I do suspect there might be some variance in how rope scaling appears in config.json that could be better handled. If |
It was Samantha-1.11-70B, but what happened here is the default case. Most models currently out there don't have any rope scaling defined. The default in config.json is:
and that wasn't being handled with the code as it was. |
Ah that makes sense. Will submit the fix shortly, as soon as I try it on a couple models. |
* master: (773 commits) server : add `/detokenize` endpoint (ggerganov#2802) convert.py : advanced option (ggerganov#2753) llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814) flake.nix : add rocm support and cleanup (ggerganov#2808) llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817) main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528) llama : use std::abs in llama_sample_tail_free (ggerganov#2800) k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811) Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807) Fix HellaSwag (ggerganov#2805) flake : build llama.cpp on Intel with nix (ggerganov#2795) Handle null rope scaling value (ggerganov#2793) Fix spm whitespaces (ggerganov#2806) examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804) llama : fix struct decl (ggerganov#2790) Faster perplexity computation (ggerganov#2786) llama : add llama_beam_search() (ggerganov#2267) convert.py : Get rope scale from HuggingFace models (ggerganov#2772) llama-bench : add model sizes (ggerganov#2771) convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773) ...
* Get rope scale from HF models * Save rope scale only for linear scaling * Rewrite for clarity
I was trying to convert EverythingLM V2 with 16k context to GGUF and noticed that it generated nonsense. GGUF metadata showed that the rope scale was not kept, and I see it was indeed not read from config.json. This PR fixes that. The rope scale shows up in GGUF metadata (as 0.25 in this case) and model output is now coherent.
Suggesting @slaren for review because I saw you recently made a very related change.