Re-enable manual LoRA adapter free#19983
Open
PopFlamingo wants to merge 2 commits intoggml-org:masterfrom
Open
Re-enable manual LoRA adapter free#19983PopFlamingo wants to merge 2 commits intoggml-org:masterfrom
PopFlamingo wants to merge 2 commits intoggml-org:masterfrom
Conversation
ggerganov
reviewed
Mar 2, 2026
ggerganov
approved these changes
Mar 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR proposes re-enabling manual LoRA adapter free (the
llama_adapter_lora_freefunction), that had been previously deprecated as part of #18490.Motivation
After reading the discussion on why
llama_adapter_lora_freewas deprecated and made a no-op, my understanding is that it was considered that this had no real use case, however I am currently working on a project where having the ability to unload adapters is very important due to memory constraints (mobile devices).Without
llama_adapter_lora_free, we have to fully unload and re-load the model, as well as lose any contexts (and their cached tokens) associated with it, just to free the memory associated with those LoRAs. I am not aware of any other way to achieve this.Summary of changes
The
llama_adapter_lora_freefunction has been re-enabled and un-deprecated; callingllama_adapter_lora_freestays optional as the LoRA will still be freed if necessary when its parent model is released. Documentation comments have been updated to reflect this new behavior.Concretely we make sure to remove the freed LoRA from the ownership list of its owning model to prevent double frees.
Other related issues
Issue #19153 would benefit from this change as well, since I don't think the requested feature would be implementable without
llama_adapter_lora_free.