llama-model: fix insonsistent ctxs <-> bufs order #16581

JohannesGaessler · 2025-10-14T19:23:36Z

While working on code for automating memory allocation across devices I've noticed that the order of the vectors for ggml_context and ggml_backend_buffer_t in llama_model::impl can be inconsistent. To my understanding the reason for this is that ctxs is filled in the order in which the ggml_context structs are created for the model tensors. However, the corresponding backend buffers are then created by iterating over ctx_map. A std::map is sorted by its keys, so ctx_map is sorted by the arrangement of ggml_backend_buffer_type_t in virtual memory. As a consequence the order for bufs can be inconsistent with the order of ctxs.

The way I fixed it in this PR is to define a comparator for the map that sorts it based on the buffer type name and to defer the population of ctxs until later when bufs is populated. To my understanding there should be no functional difference for the master branch (order of some prints and tensors_by_name would change) but I'm spinning this out into a standalone PR regardless since it seems like an easy fix vs. the amount of time required for debugging.

slaren · 2025-10-14T21:24:00Z

src/llama-model.cpp

                buf_map.emplace(idx, buf);
            }
        }
+        pimpl->ctxs.emplace_back(ctx);


I don't think moving this here is good, it is too far departed from the place it is allocated, and I suspect it may cause a leak if there is an error in between.

I don't quite understand why the order is important to you, can you elaborate on that?

In my prototype I used llama_model like this:

std::map<ggml_backend_buffer_type_t, size_t> llama_model::memory_breakdown() const { std::map<ggml_backend_buffer_type_t, size_t> ret; for (size_t i = 0; i < pimpl->bufs.size(); i++) { ggml_backend_buffer_t buf = pimpl->bufs[i].get(); ggml_backend_buffer_type_t buft = ggml_backend_buffer_get_type(buf); if (hparams.no_alloc) { GGML_ASSERT(ggml_backend_buffer_get_base(buf) == nullptr); ret[buft] += ggml_backend_alloc_ctx_tensors_from_buft_size(pimpl->ctxs[i].get(), buft); } else { GGML_ASSERT(ggml_backend_buffer_get_base(buf) != nullptr); ret[buft] += ggml_backend_buffer_get_size(buf); } } return ret; }

The code implicitly assumed that pimpl->bufs[i] would correspond to pimpl->ctx[i]. This is not universally true however. So when I moved from my single GPU desktop to another machine with multiple GPUs I suddenly got unexpected results. In my opinion this is a footgun. I think it would be preferable to have a consistent order, I would also consider it to be fine to add a comment warning that the order is not guaranteed to be the same. From my end I can write code that will produce correct results regardless of how the vectors are ordered.

I don't really agree that this is a footgun - nowhere it is stated that the vectors of contexts and buffers are related to each other, and nothing warrants assuming that they are. I am not completely sure that this is a good approach, but if you want to make the association between context and buffer explicit, then the two vectors could be joined as a single vector, such as std::vector<std::pair<ggml_context_ptr, ggml_backend_buffer_ptr>>.

JohannesGaessler · 2025-10-15T13:39:11Z

Thank you for pointing out the potential memory leak, I changed ctx_map to use a unique pointer for ggml_context which is then later moved into the vector. There was an error being raised when ctx->bufs would be empty but to my understanding this is redundant because the code is already raising an error when the returned ggml_backend_buffer_t is empty.

slaren · 2025-10-15T19:38:50Z

src/llama-model.cpp

    for (auto & it : ctx_map) {
-        ggml_backend_buffer_type_t buft = it.first;
-        ggml_context * ctx              = it.second;
+        ggml_backend_buffer_type_t   buft    = it.first;
+        ggml_context_ptr           & ctx_ptr = it.second;
+        ggml_context               * ctx     = ctx_ptr.get();


It may be a matter of preference, but you could use structured bindings such as:

for (auto & [buft, ctx_ptr] : ctx_map) { ggml_context * ctx = ctx_ptr.get();

Thank you, I just wasn't aware that that is valid syntax.

JohannesGaessler requested a review from CISC as a code owner October 14, 2025 19:23

CISC requested a review from slaren October 14, 2025 20:34

slaren reviewed Oct 14, 2025

View reviewed changes

JohannesGaessler force-pushed the llama-model-fix-ctx-order branch from d8e131f to b53509c Compare October 15, 2025 13:33

slaren approved these changes Oct 15, 2025

View reviewed changes

llama-model: fix insonsistent ctxs <-> bufs order

092e2e2

JohannesGaessler force-pushed the llama-model-fix-ctx-order branch from b53509c to 092e2e2 Compare October 17, 2025 11:52

JohannesGaessler merged commit 66b0dbc into ggml-org:master Oct 17, 2025
68 of 70 checks passed

FMayran pushed a commit to FMayran/llama.cpp that referenced this pull request Oct 23, 2025

llama-model: fix insonsistent ctxs <-> bufs order (ggml-org#16581)

df8029a

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025

llama-model: fix insonsistent ctxs <-> bufs order (ggml-org#16581)

6a04bac

JohannesGaessler mentioned this pull request Oct 23, 2025

llama: consistent ctx <-> buf order for KV cache #16746

Merged

SomeOddCodeGuy mentioned this pull request Oct 24, 2025

Misc. bug: Metal_Mapped buffer size now incorrectly reporting total on split models (possible memory issues beyond reporting) #16762

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama-model: fix insonsistent ctxs <-> bufs order #16581

llama-model: fix insonsistent ctxs <-> bufs order #16581

Uh oh!

JohannesGaessler commented Oct 14, 2025 •

edited

Loading

Uh oh!

slaren Oct 14, 2025

Uh oh!

JohannesGaessler Oct 14, 2025

Uh oh!

slaren Oct 15, 2025

Uh oh!

JohannesGaessler commented Oct 15, 2025

Uh oh!

slaren Oct 15, 2025

Uh oh!

JohannesGaessler Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llama-model: fix insonsistent ctxs <-> bufs order #16581

llama-model: fix insonsistent ctxs <-> bufs order #16581

Uh oh!

Conversation

JohannesGaessler commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

slaren Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler commented Oct 15, 2025

Uh oh!

slaren Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JohannesGaessler commented Oct 14, 2025 •

edited

Loading