FEAT: Support rerank #82

harryzwh · 2025-09-14T07:38:37Z

The original intention was to add support for Qwen3-Reranker gguf format in xinference. However, it was found that llama.cpp's support for Qwen3-Reranker was not yet complete.
For the time being, bge-reranker-v2-m3 was used for testing.
More discussion on llama.cpp's support for Qwen3-Reranker can be found here, and llama.cpp is also working on supporting qwen3 rerank, see here.

codingl2k1 · 2025-09-14T11:39:39Z

I formatted the code and modified the rerank API to support str/bytes json. Since the CI only uses bge-reranker-v2-m3-Q8_0.gguf, the Makefile only downloads bge-reranker-v2-m3-Q8_0.gguf. I pushed the commits directly to your branch.

harryzwh · 2025-09-14T11:49:54Z

I formatted the code and modified the rerank API to support str/bytes json. Since the CI only uses bge-reranker-v2-m3-Q8_0.gguf, the Makefile only downloads bge-reranker-v2-m3-Q8_0.gguf. I pushed the commits directly to your branch.

Great! I am going to add the support of json string/bytes as what you recently did for the embedding API. Happy to see you already did it.

harryzwh · 2025-09-14T12:33:56Z

You may download the rerank model from huggingface if it is still the networking issue.
https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q2_K.gguf

harryzwh · 2025-09-14T13:34:11Z

Interesting error that only happen on Mac and seems it is a hardware related issue according to this.

codingl2k1 · 2025-09-14T14:18:23Z

Interesting error that only happen on Mac and seems it is a hardware related issue according to this.

I reviewed the server binding, and the cleanup is well handled. There may be some issues caused by llama.cpp. I tested running test_llama_server_multimodal in a dedicated process, and it still crashes at test_llama_server_rerank.

codingl2k1 · 2025-09-14T14:53:23Z

When I switched to Qwen3-Reranker-0.6B.Q2_K.gguf, I got this error: /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/src/llama-graph.cpp:1907: RANK pooling requires either cls+cls_b or cls_out+cls_out_b

related issue: https://huggingface.co/Mungert/Qwen3-Reranker-4B-GGUF/discussions/1

Also, the model bge-reranker-v2-m3-Q2_K.gguf crashes on macOS CI. The rerank feature in llama.cpp is still quite experimental, it is not an issue with our binding. I can skip the test and merge this PR.

This reverts commit d96acce.

harryzwh · 2025-09-14T15:11:58Z

When I switched to Qwen3-Reranker-0.6B.Q2_K.gguf, I got this error: /home/runner/work/xllamacpp/xllamacpp/thirdparty/llama.cpp/src/llama-graph.cpp:1907: RANK pooling requires either cls+cls_b or cls_out+cls_out_b

related issue: https://huggingface.co/Mungert/Qwen3-Reranker-4B-GGUF/discussions/1

Also, the model bge-reranker-v2-m3-Q2_K.gguf crashes on macOS CI. The rerank feature in llama.cpp is still quite experimental, it is not an issue with our binding. I can skip the test and merge this PR.

Yes, same error from my side under Linux platform. As mentioned in the very beginning, llamacpp currently still working on supporting qwen3-reranker, that is why using bge-reranker-v2-m3 for testing. Will keep looking at the progress from llamacpp.

harryzwh and others added 5 commits September 8, 2025 22:53

Update .gitmodules

e9ea613

add support of rerank

69c178b

Merge branch 'main' into rerank

3062dd1

Fix lint

3e9b52e

rerank new API

7f84aa7

Use small model for testing

c7ff3eb

Show logs for CI

bc00900

codingl2k1 added 2 commits September 14, 2025 21:46

Try to fix CI OOM

48c1c6b

Standalone test case test_llama_server_rerank

38ff763

Try to test rerank with Qwen3-Reranker-0.6B.Q2_K.gguf

d96acce

codingl2k1 added 2 commits September 14, 2025 22:54

Revert "Try to test rerank with Qwen3-Reranker-0.6B.Q2_K.gguf"

fc3301d

This reverts commit d96acce.

Rerank test skipped on macOS

64b94b5

codingl2k1 merged commit 6a69587 into xorbitsai:main Sep 14, 2025
4 checks passed

harryzwh deleted the rerank branch September 14, 2025 16:46

harryzwh mentioned this pull request Nov 7, 2025

Add support of rerank model for llamacpp xorbitsai/inference#4227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FEAT: Support rerank #82

FEAT: Support rerank #82

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FEAT: Support rerank #82

FEAT: Support rerank #82

Uh oh!

Conversation

harryzwh commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

codingl2k1 commented Sep 14, 2025

Uh oh!

harryzwh commented Sep 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants