Skip to content

Conversation

@BetterAndBetterII
Copy link

@BetterAndBetterII BetterAndBetterII commented Jun 30, 2025

Summary

  • Changes: Updated CrossEncoder class to accept prompt template parameters and apply them in prediction and ranking.
  • Main purpose: Added support for Qwen3Reranker.
  • Example: Added CrossEncoder example to support dynamic prompt template and default configuration.
  • Test cases: Added test cases to verify the functionality of prompt template and the correctness of default configuration.

Best Practice

  1. Use Sequence Classification Model https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls
  2. Modify config.json: add default prompt template like this:
{
  ...
  "sentence_transformers": {
    "version": "xxx",
    "prompt_template": "Instruct: {instruction}\nQuery: {query}\nDocument: {document}",
    "prompt_template_kwargs": {
      "instruction": "Given a query, find the most relevant document."
    }
  },
  ...
}
  1. Simply use model.predict like normal

More customized usage

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
instruction = "Given a web search query, retrieve relevant passages that answer the query"
query_template = f"{prefix}<Instruct>: {instruction}\n<Query>: {{query}}\n"
document_template = f"<Document>: {{document}}{suffix}"

template = query_template + document_template
template_scores = model.predict(sentence_pairs, prompt_template=template)

or

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n"
instruct_template = f"{prefix}<Instruct>: {{instruction}}\n<Query>: {{query}}\n<Document>: {{document}}{suffix}"
instruct_kwargs = {"instruction": "Given a query, find the most relevant document."}

instruction_scores_1 = model.predict(
    sentence_pairs, prompt_template=instruct_template, prompt_template_kwargs=instruct_kwargs
)

Test Cases:

  1. Regression test: ensure that previous functions are not affected
  2. Test of passing prompt_template, passing or not should get different scores
  3. Test of passing prompt_template+prompt_template_kwargs, different prompt_template_kwargs should have different scores
  4. Read the default prompt_template and instruction configuration from config.json, different prompt_template, instruction should have different scores, and no errors will be reported

Test Result:

  • All Tests in CrossEncoder Passed

Additional Example E2E Demo on Qwen3-Reranker-0.6B:

--Instruction and Correct template is very important to Qwen3-Reranker--

--- 1. Reranking without any template (Incorrect Usage of Qwen3 Reranker) ---
Query: What is the capital of China?
0.9746  The capital of China is Beijing.
0.6800  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.


--- 2. Reranking with a runtime prompt_template ---
Using template: <|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: Given a web search query, retrieve relevant passages that answer the query
<Query>: {query}
<Document>: {document}<|im_end|>
<|im_start|>assistant
<think>

</think>


Query: What is the capital of China?
0.9995  The capital of China is Beijing.
0.0000  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.


--- 3. Reranking with a dynamic instruction ---
Using template: <|im_start|>system
Judge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>
<|im_start|>user
<Instruct>: {instruction}
<Query>: {query}
<Document>: {document}<|im_end|>
<|im_start|>assistant
<think>

</think>


With instruction 1: 'Given a query, find the most relevant document.'
0.9976  The capital of China is Beijing.
0.0000  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.

With instruction 2: 'Given a question, find the incorrect answer.'
0.9921  The capital of China is Beijing.
0.0001  Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.

Misc:

…ple to support dynamic prompt template and default configuration. Updated CrossEncoder class to accept prompt template parameters and apply them in prediction and ranking. Added test cases to verify the functionality of prompt template and the correctness of default configuration.
@BetterAndBetterII BetterAndBetterII marked this pull request as ready for review June 30, 2025 11:13
@BetterAndBetterII
Copy link
Author

cc @tomaarsen
ready for review

@tomaarsen
Copy link
Member

Hello!

This is really cool! I've been planning to add support for the "decoder-style" rerankers in a future version after v5.0. For context, Sentence Transformers v5.0 is scheduled for tomorrow, and in the interest of avoiding feature creep for v5.0, I'll have a look at this in more detail after tomorrow.
Much appreciated!

  • Tom Aarsen

@BetterAndBetterII
Copy link
Author

Hello!  你好!

This is really cool! I've been planning to add support for the "decoder-style" rerankers in a future version after v5.0. For context, Sentence Transformers v5.0 is scheduled for tomorrow, and in the interest of avoiding feature creep for v5.0, I'll have a look at this in more detail after tomorrow.这真的很酷!我一直计划在 v5.0 之后的未来版本中添加对“解码器风格”重排序器的支持。作为背景,Sentence Transformers v5.0 计划在明天发布,为了避免 v5.0 的功能膨胀,我会在明天之后更详细地查看这个。 Much appreciated!  非常感谢!

  • Tom Aarsen  汤姆·阿尔森

Thank you very much! Looking forward to it

@BetterAndBetterII
Copy link
Author

Is the version upgrade going smoothly? If you have time, can you evaluate my PR? @tomaarsen

@tomaarsen
Copy link
Member

tomaarsen commented Jul 10, 2025

So far so good re. the release!
I started looking at this PR yesterday. I like the approach, but I'm considering seeing if we can upgrade the prompts functionality for all model archetypes in such a way that we allow the complex prompts that are required here.
And then beyond that, I think it would be valuable to support the CausalLM-style models out of the box, i.e. without converting them to Sequence Classification models. I'm doing some brainstorming for both.

  • Tom Aarsen

@tomaarsen
Copy link
Member

I'm looking into potentially reusing apply_chat_template, but I'm unsure whether that would work nicely with truncation. Asking my colleagues about it now.

  • Tom Aarsen

@BetterAndBetterII
Copy link
Author

BetterAndBetterII commented Jul 10, 2025

I'm looking into potentially reusing apply_chat_template, but I'm unsure whether that would work nicely with truncation. Asking my colleagues about it now.

  • Tom Aarsen

It is indeed more reasonable to reuse apply_chat_template, so that you don't need to care about special tokens such as <|xxx|>. In this way, the incoming Instruct needs to be manually attached to the user message. I don't know if this is a good idea. But it is definitely a good thing for servers like vLLM that are compatible with OpenAI endpoints.

@tomaarsen
Copy link
Member

P.s. the latest update is that there's no convenient way to handle the truncation, nor a commonly agreed upon truncation strategy. To be as model agnostic as possible, we'd have to support various different options, but it gets quite messy quite quickly.

@tomaarsen
Copy link
Member

Hello @BetterAndBetterII,

Apologies for the delay. Locally, I've been working on this problem some more. Particularly, my goal is to support not just https://huggingface.co/tomaarsen/Qwen3-Reranker-0.6B-seq-cls, but also https://huggingface.co/Qwen/Qwen3-Reranker-0.6B itself. It will require a full refactor of the CrossEncoder class to be more like the SentenceTransformer/SparseEncoder classes, i.e. with modules that are executed sequentially. This allows me to create separate modules wrapping AutoModelForSequenceClassification (the current default) and AutoModelForCausalLM (like Qwen3), and use them as required.

That would then also include support for templating akin to what you proposed here.

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants