- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13.5k
 
Closed
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)
Description
What happened?
The lm_head layer for a Gemma2 LoRA adapter is not converted by convert_lora_to_gguf.py, and therefore not applied at inference (ruining performance of the adapter).
How to reproduce:
Expand
- LoRA fine-tune Gemma2 with 
pytorch/peftincludinglm_headin thetarget_modulesparam:config = LoraConfig(target_modules=["lm_head"], ...)
 - Save the adapter.
 - Convert the adapter debugging
then the
python convert_lora_to_gguf.py <adapter folder> --base <base model folder> --outtype f32
lm_headlayer is skipped by this line inconvert_hf_to_gguf.py(and no error is raised):if name == "lm_head.weight": logger.debug(f"Skipping get tensor {name!r} in safetensors so that convert can end normally.") return []
 - Run 
llama-clito check that indeed no lora layer is applied in the respective line in llama.cpp:./llama-cli -m base/model/path/Base-F32.gguf \ --lora lora/model/path/Lora-F32-LoRA.gguf \ -p "Hello Gemma2" -n 50 
Expected behaviour
I think this is a bug because a user might have trained an adapter that is applied to the the lm_head layer, so skipping it on conversion will destroy the adapter's performance. I think the code should either:
- raise an error saying 
Cannot convert Gemma2 adapter with lm_head layer 
or
- handle the lm_head layer (although it might be tricky for merging adapters as the 
lm_headlayer shares the weights with theembedlayer in Gemma2, probably leading to having to create a new tensor for thelm_headto merge the adapter to). 
Comments
- I think the script 
convert_lora_to_gguf.pywas introduced in PR Refactor lora adapter support #8332, so maybe the @ngxson knows if skipping thelm_headis the desired outcome of if it is actually a bug. Otherwise I'm happy to try figure out why this happens. - This is not the case for, say, Phi3, which converts the 
lm_headlora layer correctly. - I can provide more code/models to reproduce the bug easily if that helps.
 
Name and Version
version: 3524 (bc0f887)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0
What operating system are you seeing the problem on?
MacOS, but it should be a platform-independent problem.
Relevant log output
No response
Metadata
Metadata
Assignees
Labels
bug-unconfirmedmedium severityUsed to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable)