-
Notifications
You must be signed in to change notification settings - Fork 13.5k
llama-model: fix insonsistent ctxs <-> bufs order #16581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-model: fix insonsistent ctxs <-> bufs order #16581
Conversation
src/llama-model.cpp
Outdated
| buf_map.emplace(idx, buf); | ||
| } | ||
| } | ||
| pimpl->ctxs.emplace_back(ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think moving this here is good, it is too far departed from the place it is allocated, and I suspect it may cause a leak if there is an error in between.
I don't quite understand why the order is important to you, can you elaborate on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my prototype I used llama_model like this:
std::map<ggml_backend_buffer_type_t, size_t> llama_model::memory_breakdown() const {
std::map<ggml_backend_buffer_type_t, size_t> ret;
for (size_t i = 0; i < pimpl->bufs.size(); i++) {
ggml_backend_buffer_t buf = pimpl->bufs[i].get();
ggml_backend_buffer_type_t buft = ggml_backend_buffer_get_type(buf);
if (hparams.no_alloc) {
GGML_ASSERT(ggml_backend_buffer_get_base(buf) == nullptr);
ret[buft] += ggml_backend_alloc_ctx_tensors_from_buft_size(pimpl->ctxs[i].get(), buft);
} else {
GGML_ASSERT(ggml_backend_buffer_get_base(buf) != nullptr);
ret[buft] += ggml_backend_buffer_get_size(buf);
}
}
return ret;
}The code implicitly assumed that pimpl->bufs[i] would correspond to pimpl->ctx[i]. This is not universally true however. So when I moved from my single GPU desktop to another machine with multiple GPUs I suddenly got unexpected results. In my opinion this is a footgun. I think it would be preferable to have a consistent order, I would also consider it to be fine to add a comment warning that the order is not guaranteed to be the same. From my end I can write code that will produce correct results regardless of how the vectors are ordered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really agree that this is a footgun - nowhere it is stated that the vectors of contexts and buffers are related to each other, and nothing warrants assuming that they are. I am not completely sure that this is a good approach, but if you want to make the association between context and buffer explicit, then the two vectors could be joined as a single vector, such as std::vector<std::pair<ggml_context_ptr, ggml_backend_buffer_ptr>>.
d8e131f to
b53509c
Compare
|
Thank you for pointing out the potential memory leak, I changed |
src/llama-model.cpp
Outdated
| for (auto & it : ctx_map) { | ||
| ggml_backend_buffer_type_t buft = it.first; | ||
| ggml_context * ctx = it.second; | ||
| ggml_backend_buffer_type_t buft = it.first; | ||
| ggml_context_ptr & ctx_ptr = it.second; | ||
| ggml_context * ctx = ctx_ptr.get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a matter of preference, but you could use structured bindings such as:
for (auto & [buft, ctx_ptr] : ctx_map) {
ggml_context * ctx = ctx_ptr.get();There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I just wasn't aware that that is valid syntax.
b53509c to
092e2e2
Compare
While working on code for automating memory allocation across devices I've noticed that the order of the vectors for
ggml_contextandggml_backend_buffer_tinllama_model::implcan be inconsistent. To my understanding the reason for this is thatctxsis filled in the order in which theggml_contextstructs are created for the model tensors. However, the corresponding backend buffers are then created by iterating overctx_map. Astd::mapis sorted by its keys, soctx_mapis sorted by the arrangement ofggml_backend_buffer_type_tin virtual memory. As a consequence the order forbufscan be inconsistent with the order ofctxs.The way I fixed it in this PR is to define a comparator for the map that sorts it based on the buffer type name and to defer the population of
ctxsuntil later whenbufsis populated. To my understanding there should be no functional difference for the master branch (order of some prints andtensors_by_namewould change) but I'm spinning this out into a standalone PR regardless since it seems like an easy fix vs. the amount of time required for debugging.