-
Notifications
You must be signed in to change notification settings - Fork 9
FEAT: Support rerank #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I formatted the code and modified the rerank API to support str/bytes json. Since the CI only uses bge-reranker-v2-m3-Q8_0.gguf, the Makefile only downloads bge-reranker-v2-m3-Q8_0.gguf. I pushed the commits directly to your branch. |
Great! I am going to add the support of json string/bytes as what you recently did for the embedding API. Happy to see you already did it. |
|
You may download the rerank model from huggingface if it is still the networking issue. |
|
Interesting error that only happen on Mac and seems it is a hardware related issue according to this. |
I reviewed the server binding, and the cleanup is well handled. There may be some issues caused by llama.cpp. I tested running test_llama_server_multimodal in a dedicated process, and it still crashes at test_llama_server_rerank. |
|
When I switched to Qwen3-Reranker-0.6B.Q2_K.gguf, I got this error: /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/src/llama-graph.cpp:1907: RANK pooling requires either cls+cls_b or cls_out+cls_out_b related issue: https://huggingface.co/Mungert/Qwen3-Reranker-4B-GGUF/discussions/1 Also, the model bge-reranker-v2-m3-Q2_K.gguf crashes on macOS CI. The rerank feature in llama.cpp is still quite experimental, it is not an issue with our binding. I can skip the test and merge this PR. |
Yes, same error from my side under Linux platform. As mentioned in the very beginning, llamacpp currently still working on supporting qwen3-reranker, that is why using bge-reranker-v2-m3 for testing. Will keep looking at the progress from llamacpp. |
The original intention was to add support for Qwen3-Reranker gguf format in xinference. However, it was found that llama.cpp's support for Qwen3-Reranker was not yet complete.
For the time being, bge-reranker-v2-m3 was used for testing.
More discussion on llama.cpp's support for Qwen3-Reranker can be found here, and llama.cpp is also working on supporting qwen3 rerank, see here.