UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch by loci-dev · Pull Request #1142 · auroralabs-loci/llama.cpp

loci-dev · 2026-02-03T02:18:10Z

Note

Source pull request: ggml-org/llama.cpp#19280

Overview

Fix for ggml-org/llama.cpp#19217

Currently we are setting the LoRa config for each and every token request, even if the configuration between batches has not changed. It appears in this PR, in llama-context.cpp, line 1078 we added schedule reserving for when we set new LoRa configs, and so now when we have a LoRa config, for every batch we are decoding, we are reserving the scheduler.

This change adds a field to the server slot that holds the last batche's LoRa config. For each batch, we run a check to see if the config is the same as the previous batch, and only if it is not do we go ahead and set it. Otherwise, proceed.

Testing

I was able to re-produce the issue relatively easily with some debug logs

./llama-server -hf bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0 --lora ~/Downloads/LoRA-Llama-3.1-8B-MultiReflection-f16.gguf -c 1024 --parallel 1 --host 0.0.0.0 --port 8080 --verbose

and was able to see this endlessly along with huge memory spikes:

Setting sched_need_reserve to TRUE in the set_adapter_lora functionset_embeddings: value = 0
Reserving memory during decoding

sched_reserve: reserving ...
sched_reserve: max_nodes = 3692
srv    operator(): http: streamed chunk: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" requests"}}],"created":1770077607,"id":"chatcmpl-1gqsKPcqCDHRRMZFNqfWqnl557ARXVc9","model":"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0","system_fingerprint":"b7916-0dfcd3b60","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":109,"prompt_ms":628.118,"prompt_per_token_ms":5.762550458715597,"prompt_per_second":173.5342722227352,"predicted_n":2,"predicted_ms":180.088,"predicted_per_token_ms":90.044,"predicted_per_second":11.105681666740704}}


sched_reserve: reserving full memory module
sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
sched_reserve:       MTL0 compute buffer size =   509.12 MiB
sched_reserve:        CPU compute buffer size =    18.01 MiB
sched_reserve: graph nodes  = 1903
sched_reserve: graph splits = 2

After these changes, the issue no longer appears and memory remains stable.

loci-review · 2026-02-03T03:11:06Z

No meaningful performance changes were detected across 115472 analyzed functions in the following binaries: build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.libllama.so, build.bin.libmtmd.so, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.llama-quantize, build.bin.llama-tokenize, build.bin.llama-qwen2vl-cli, build.bin.llama-bench, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

fix: only reset LoRa configs when they have changed from previous batch

fd59631

loci-dev temporarily deployed to PROD__AL_DEMO February 3, 2026 02:18 — with GitHub Actions Inactive

loci-dev force-pushed the main branch from c125e77 to 49ff2cd Compare February 3, 2026 03:08

loci-dev force-pushed the main branch 10 times, most recently from 048ad94 to 6c1fde6 Compare February 3, 2026 13:32

loci-dev force-pushed the main branch 10 times, most recently from 823244c to bab7d39 Compare February 19, 2026 02:17

loci-dev force-pushed the main branch 6 times, most recently from 4a5a4c2 to 45aacad Compare February 24, 2026 02:17

loci-dev force-pushed the main branch 8 times, most recently from 13648e6 to 1d064d0 Compare March 3, 2026 02:17

loci-dev force-pushed the main branch 8 times, most recently from 551dfb5 to 55a969e Compare March 11, 2026 02:16

loci-dev force-pushed the main branch 10 times, most recently from 5ac00d6 to 998dd7a Compare March 18, 2026 02:17

loci-dev force-pushed the main branch 4 times, most recently from 945fa3a to 0e8e1d6 Compare March 20, 2026 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch#1142

UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch#1142
loci-dev wants to merge 1 commit intomainfrom
loci/pr-19280-bug-fix-memory-leak

loci-dev commented Feb 3, 2026

Uh oh!

loci-review bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Feb 3, 2026

Overview

Testing

Uh oh!

loci-review bot commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants