Skip to content

Commit b8266b7

Browse files
noooophmellor
authored andcommitted
[Misc] Fix examples openai_pooling_client.py (vllm-project#24853)
Signed-off-by: wang.yuqi <[email protected]> Co-authored-by: Harry Mellor <[email protected]>
1 parent 6c05536 commit b8266b7

17 files changed

+105
-11
lines changed

docs/models/pooling_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ outputs = llm.embed(["Follow the white rabbit."],
228228
print(outputs[0].outputs)
229229
```
230230

231-
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
231+
A code example can be found here: <gh-file:examples/offline_inference/pooling/embed_matryoshka_fy.py>
232232

233233
### Online Inference
234234

@@ -258,4 +258,4 @@ Expected output:
258258
{"id":"embd-5c21fc9a5c9d4384a1b021daccaf9f64","object":"list","created":1745476417,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-0.3828125,-0.1357421875,0.03759765625,0.125,0.21875,0.09521484375,-0.003662109375,0.1591796875,-0.130859375,-0.0869140625,-0.1982421875,0.1689453125,-0.220703125,0.1728515625,-0.2275390625,-0.0712890625,-0.162109375,-0.283203125,-0.055419921875,-0.0693359375,0.031982421875,-0.04052734375,-0.2734375,0.1826171875,-0.091796875,0.220703125,0.37890625,-0.0888671875,-0.12890625,-0.021484375,-0.0091552734375,0.23046875]}],"usage":{"prompt_tokens":8,"total_tokens":8,"completion_tokens":0,"prompt_tokens_details":null}}
259259
```
260260

261-
An OpenAI client example can be found here: <gh-file:examples/online_serving/openai_embedding_matryoshka_fy.py>
261+
An OpenAI client example can be found here: <gh-file:examples/online_serving/pooling/openai_embedding_matryoshka_fy.py>

docs/models/supported_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -530,7 +530,7 @@ These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) A
530530
```
531531

532532
!!! note
533-
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/qwen3_reranker.py>.
533+
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/pooling/qwen3_reranker.py>.
534534

535535
```bash
536536
vllm serve Qwen/Qwen3-Reranker-0.6B --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'

docs/serving/openai_compatible_server.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -239,7 +239,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai
239239
If the model has a [chat template][chat-template], you can replace `inputs` with a list of `messages` (same schema as [Chat API][chat-api])
240240
which will be treated as a single prompt to the model.
241241

242-
Code example: <gh-file:examples/online_serving/openai_embedding_client.py>
242+
Code example: <gh-file:examples/online_serving/pooling/openai_embedding_client.py>
243243

244244
#### Multi-modal inputs
245245

@@ -313,7 +313,7 @@ and passing a list of `messages` in the request. Refer to the examples below for
313313
`MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of the minimum image size for text query embeddings. See the full code
314314
example below for details.
315315

316-
Full example: <gh-file:examples/online_serving/openai_chat_embedding_client_for_multimodal.py>
316+
Full example: <gh-file:examples/online_serving/pooling/openai_chat_embedding_client_for_multimodal.py>
317317

318318
#### Extra parameters
319319

@@ -421,7 +421,7 @@ Our Pooling API encodes input prompts using a [pooling model](../models/pooling_
421421

422422
The input format is the same as [Embeddings API][embeddings-api], but the output data can contain an arbitrary nested list, not just a 1-D list of floats.
423423

424-
Code example: <gh-file:examples/online_serving/openai_pooling_client.py>
424+
Code example: <gh-file:examples/online_serving/pooling/openai_pooling_client.py>
425425

426426
[](){ #classification-api }
427427

@@ -431,7 +431,7 @@ Our Classification API directly supports Hugging Face sequence-classification mo
431431

432432
We automatically wrap any other transformer via `as_seq_cls_model()`, which pools on the last token, attaches a `RowParallelLinear` head, and applies a softmax to produce per-class probabilities.
433433

434-
Code example: <gh-file:examples/online_serving/openai_classification_client.py>
434+
Code example: <gh-file:examples/online_serving/pooling/openai_classification_client.py>
435435

436436
#### Example Requests
437437

@@ -760,7 +760,7 @@ endpoints are compatible with both [Jina AI's re-rank API interface](https://jin
760760
[Cohere's re-rank API interface](https://docs.cohere.com/v2/reference/rerank) to ensure compatibility with
761761
popular open-source tools.
762762

763-
Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py>
763+
Code example: <gh-file:examples/online_serving/pooling/jinaai_rerank_client.py>
764764

765765
#### Example Request
766766

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Pooling models
2+
3+
## Convert llm model to seq cls
4+
5+
```bash
6+
# for BAAI/bge-reranker-v2-gemma
7+
# Caution: "Yes" and "yes" are two different tokens
8+
python examples/offline_inference/pooling/convert_model_to_seq_cls.py --model_name BAAI/bge-reranker-v2-gemma --classifier_from_tokens '["Yes"]' --method no_post_processing --path ./bge-reranker-v2-gemma-seq-cls
9+
# for mxbai-rerank-v2
10+
python examples/offline_inference/pooling/convert_model_to_seq_cls.py --model_name mixedbread-ai/mxbai-rerank-base-v2 --classifier_from_tokens '["0", "1"]' --method from_2_way_softmax --path ./mxbai-rerank-base-v2-seq-cls
11+
# for Qwen3-Reranker
12+
python examples/offline_inference/pooling/convert_model_to_seq_cls.py --model_name Qwen/Qwen3-Reranker-0.6B --classifier_from_tokens '["no", "yes"]' --method from_2_way_softmax --path ./Qwen3-Reranker-0.6B-seq-cls
13+
```
14+
15+
## Embed jina_embeddings_v3 usage
16+
17+
Only text matching task is supported for now. See <gh-pr:16120>
18+
19+
```bash
20+
python examples/offline_inference/pooling/embed_jina_embeddings_v3.py
21+
```
22+
23+
## Embed matryoshka dimensions usage
24+
25+
```bash
26+
python examples/offline_inference/pooling/embed_matryoshka_fy.py
27+
```
28+
29+
## Qwen3 reranker usage
30+
31+
```bash
32+
python qwen3_reranker.py
33+
```

examples/online_serving/openai_embedding_long_text/service.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ echo " - API Key: $API_KEY"
120120
echo " - Native Pooling: $POOLING_TYPE | Cross-chunk: MEAN"
121121
echo ""
122122
echo "🧪 Test the server with:"
123-
echo " python examples/online_serving/openai_embedding_long_text_client.py"
123+
echo " python examples/online_serving/openai_embedding_long_text/client.py"
124124
echo ""
125125
echo "📚 Enhanced features enabled:"
126126
echo " ✅ Intelligent native pooling type detection"
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Pooling models
2+
3+
## Cohere rerank usage
4+
5+
```bash
6+
python examples/online_serving/pooling/cohere_rerank_client.py
7+
```
8+
9+
## Jinaai rerank usage
10+
11+
```bash
12+
python examples/online_serving/pooling/jinaai_rerank_client.py
13+
```
14+
15+
## Openai chat embedding for multimodal usage
16+
17+
```bash
18+
python examples/online_serving/pooling/openai_chat_embedding_client_for_multimodal.py
19+
```
20+
21+
## Openai classification usage
22+
23+
```bash
24+
python examples/online_serving/pooling/openai_classification_client.py
25+
```
26+
27+
## Openai embedding usage
28+
29+
```bash
30+
python examples/online_serving/pooling/openai_embedding_client.py
31+
```
32+
33+
## Openai embedding matryoshka dimensions usage
34+
35+
```bash
36+
python examples/online_serving/pooling/openai_embedding_matryoshka_fy.py
37+
```
38+
39+
## Openai pooling usage
40+
41+
```bash
42+
python examples/online_serving/pooling/openai_pooling_client.py
43+
```

0 commit comments

Comments
 (0)