Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py : Get rope scale from HuggingFace models #2772

Merged
merged 4 commits into from
Aug 25, 2023

Conversation

pnb
Copy link
Contributor

@pnb pnb commented Aug 24, 2023

I was trying to convert EverythingLM V2 with 16k context to GGUF and noticed that it generated nonsense. GGUF metadata showed that the rope scale was not kept, and I see it was indeed not read from config.json. This PR fixes that. The rope scale shows up in GGUF metadata (as 0.25 in this case) and model output is now coherent.

Suggesting @slaren for review because I saw you recently made a very related change.

@slaren
Copy link
Collaborator

slaren commented Aug 24, 2023

I am looking at the config.json of the model that you linked, and the rope scale looks like this:

  "rope_scaling": {
    "factor": 4.0,
    "type": "linear"
  },

This PR seems to ignore the type parameter. It would be good to add a check to make sure that it is linear.

@pnb
Copy link
Contributor Author

pnb commented Aug 24, 2023

Makes sense! I added that check now, and tested it to make sure it is ignored if the type = linear check fails (and that it still works if type is linear).

convert.py Outdated
n_head = config["num_attention_heads"]
n_head_kv = config["num_key_value_heads"] if "num_key_value_heads" in config else n_head
f_norm_eps = config["rms_norm_eps"]
f_rope_scale = config.get("rope_scaling", {}).get("factor", None) if config.get("rope_scaling", {}).get("type", "") == "linear" else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dict.get returns None as a fallback by default. This would be more readable:

Suggested change
f_rope_scale = config.get("rope_scaling", {}).get("factor", None) if config.get("rope_scaling", {}).get("type", "") == "linear" else None
rope_scaling = config.get("rope_scaling", {})
f_rope_scale = rope_scaling.get("factor") if rope_scaling.get("type") == "linear" else None

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be happy with even a couple of plain if statements. But yeah, that one-liner a bit hard to read, anything that improves the readability would be good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was overfitting a bit to the other implementations trying to match the style. I am not a fan of tertiary operators or long lines, so I would also prefer an if/else implementation (tested and committed now).

@pnb
Copy link
Contributor Author

pnb commented Aug 24, 2023

One other thing I had thought of is whether it is worth raising an Exception if type is not "linear", rather than silently ignoring rope scaling. I didn't implement that because it doesn't seem like the rest of the code cares about nonlinear scaling, so it appears kind of out of scope -- but I'll add it if that seems like something that could come up.

Edit: On further thought, it could take a while to see where people converge on rope scaling methods. I get the sense linear might win out. So I think it is fine to merge now, since this fixes a lot of models on HF.

@slaren slaren merged commit 28b2c99 into ggerganov:master Aug 25, 2023
4 checks passed
@TheBloke
Copy link
Contributor

TheBloke commented Aug 25, 2023

Hey guys, just tried convert.py with a model with no rope_scaling defined and got this error:

    if "rope_scaling" in config and config["rope_scaling"].get("type") == "linear":
AttributeError: 'NoneType' object has no attribute 'get'

I'd suggest to use this code instead:

rope_scaling = config.get("rope_scaling", None)
if isinstance(rope_scaling, dict) and rope_scaling.get("type", None) == "linear":
   f_rope_scale = rope_scaling.get("factor", None)
else:
   f_rope_scale = None

@pnb
Copy link
Contributor Author

pnb commented Aug 25, 2023

Strange! What model was this? I do suspect there might be some variance in how rope scaling appears in config.json that could be better handled. If rope_scaling is not a dict, for example, which I gather might be the case from your example code.

@TheBloke
Copy link
Contributor

It was Samantha-1.11-70B, but what happened here is the default case. Most models currently out there don't have any rope scaling defined. The default in config.json is:

    "rope_scaling": null,

and that wasn't being handled with the code as it was.

@pnb
Copy link
Contributor Author

pnb commented Aug 25, 2023

Ah that makes sense. Will submit the fix shortly, as soon as I try it on a couple models.

mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Aug 26, 2023
* master: (773 commits)
  server : add `/detokenize` endpoint (ggerganov#2802)
  convert.py : advanced option (ggerganov#2753)
  llama : use Unicode Escape Sequence to replace encoded characters (ggerganov#2814)
  flake.nix : add rocm support and cleanup (ggerganov#2808)
  llama : move #includes out of _GNU_SOURCE conditional (ggerganov#2817)
  main : fix bug (penalize_nl=false doesn't work) + suppress warning on mingw (ggerganov#1528)
  llama : use std::abs in llama_sample_tail_free (ggerganov#2800)
  k-quants : remove unnecessary tensor shape restrictions (ggerganov#2811)
  Better perplexity for 2- and 3-bit quantization for LLaMA-v2-70B (ggerganov#2807)
  Fix HellaSwag (ggerganov#2805)
  flake : build llama.cpp on Intel with nix (ggerganov#2795)
  Handle null rope scaling value (ggerganov#2793)
  Fix spm whitespaces (ggerganov#2806)
  examples : skip unnecessary external lib in server README.md how-to (ggerganov#2804)
  llama : fix struct decl (ggerganov#2790)
  Faster perplexity computation (ggerganov#2786)
  llama : add llama_beam_search() (ggerganov#2267)
  convert.py : Get rope scale from HuggingFace models (ggerganov#2772)
  llama-bench : add model sizes (ggerganov#2771)
  convert.py : export rope freq_base when converting CodeLlama from an HF model (ggerganov#2773)
  ...
akawrykow pushed a commit to akawrykow/llama.cpp that referenced this pull request Aug 29, 2023
* Get rope scale from HF models

* Save rope scale only for linear scaling

* Rewrite for clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants