-
Couldn't load subscription status.
- Fork 13.4k
Description
Expected Behavior
./simple.cpp with TheBloke's Llama-2-7b-Chat-GGUF should run without issue.
Current Behavior
./simple ~/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF/blobs/08a5566d61d7cb6b420c3e4387a39e0078e1f2fe5f055f3a03887385304d4bfa
(https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)
results in
Hello my name isSegmentation fault (core dumped)
The model works fine with main.
I'm running ubuntu latest with everything up to date. compiled with make (no cuda, etc.).
The line that fails is
llama.cpp: 1453 (
llama_kv_cache_find_slot)cache.cells[cache.head + i].seq_id.insert(batch.seq_id[i][j]);
The initilization of llama_batch::seq_id in simple.cpp seems suspect - but I'm not nearly knowlegeable about what seq_id should be to fix it.
llama_batch batch = llama_batch_init(512, 0, 1);
// evaluate the initial prompt
batch.n_tokens = tokens_list.size();
for (int32_t i = 0; i < batch.n_tokens; i++) {
batch.token[i] = tokens_list[i];
batch.pos[i] = i;
batch.seq_id[i] = 0;
batch.logits[i] = false;
}
// llama_decode will output logits only for the last token of the prompt
batch.logits[batch.n_tokens - 1] = true;Time permitting I may take a stab at porting whatever seems to be working for main over.