-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server: various fixes for the prompt field in /completion #5300
Conversation
server : fix deadlock when prompt array contains strings and numbers server : removed an unnecessary generation when generating multi-prompts server : removed an unnecessary assert
Can you also test images as part of the array? (afaik its "[img-NUM]" + image data in another json entry) |
I wanted the old behavior so I just commented out the lines in my local build. We were discussing about how to deal with the conflicting intents using the same prompt format, using arrays for either multiple prompts or concatenated array prompt, see #4476 It feels less intuitive if the treatment of a prompt array depends on the element type of the array. I'm not sure which is better, 2-level nested array, or use separate names ("prompt" and "prompts"). |
After spending some time testing a llava (-m llava-v1.5-7b/ggml-model-q8_0.gguf --mmproj llava-v1.5-7b/mmproj-model-f16.gguf) I don't think this PR changes any behaviour with regards to images but I can't be sure. I tested everything mentioned in this post both with and without the patch, and the issues mentioned exist both with and without the patch. As far as what happens when there's images in the prompt array: if there's only strings in the array then they get processed by split_multiprompt_task() which then feeds them back into request_completion() again to be processed as single prompts through queue_tasks.post(task). What works: What doesn't work: Where things get murky: Here's an example of what I mean, with llama.cpp before this PR:
After this PR:
The two images used were a cat and a city skyline. There may still be some similarities in the images as the cat is on white stairs, however after dozens of prompts on both repos with and without the patch I don't think it matters. Initially I was testing with a picture of a dog rather than a city skyline, which was incredibly frustrating as the repo without the patch would occasionally distinguish between the two. If anybody else is going to repeat this test I strongly suggest using two unambiguously different images. To make matters even worse I added a regular prompt without an [img-id] tag into the mix:
After patch:
The empty "prompt" response here is indicating that the initial prompt is using image data. This becomes clear if you test just a single prompt with and without image_data. Some things that may have complicated my testing is that I'm using ROCm, I was using the same prompt (with separate image ids) for both of the prompts asking it to describe images, and I was using q8 for the language model and f16 for the image model. I only noticed the latter two issues after the fact, I did briefly test different prompts ( |
I don't think this PR changed any image related behavior, either. The parsing and processing of |
…#5300) server : fix deadlock when prompt array contains strings and numbers server : removed an unnecessary generation when generating multi-prompts server : removed an unnecessary assert
…#5300) server : fix deadlock when prompt array contains strings and numbers server : removed an unnecessary generation when generating multi-prompts server : removed an unnecessary assert
The two main problems were:
split_multiprompt_task()
was completely stalling server slots if the prompt array had any numbers/tokens in it, the git commit calls this a deadlock but that's probably wrong.Either of these issues prevent tokens from being used in the prompt array.
Commit 48c857a also introduced a bug by removing the
return
beforesplit_multiprompt_task()
which caused an unused and unnecessary single prompt generation of all combined elements in the prompt array.Finally, an unnecessary assert was removed. This assert wouldn't abort unless
#include <cassert>
was included, but that seems unnecessary. Was changed tosend_error()
instead. The code should never be hit, but if it is the resulting page will be 404 (as per the following code).Tested the following prompts with
curl -w "\n" --request POST --url http://localhost:8080/completion -d '{"n_predict":24}'
:Not tested: /infill with no prompt. cf. #4027 - it's segfaulting for me even with no changes. I don't think these changes will reintroduce that bug.