Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -391,20 +391,24 @@ steps:
- examples/
commands:
- pip install tensorizer # for tensorizer test
# for basic
- python3 offline_inference/basic/chat.py
- python3 offline_inference/basic/generate.py --model facebook/opt-125m
- python3 offline_inference/basic/generate.py --model meta-llama/Llama-2-13b-chat-hf --cpu-offload-gb 10
- python3 offline_inference/basic/chat.py
- python3 offline_inference/prefix_caching.py
- python3 offline_inference/llm_engine_example.py
- python3 offline_inference/basic/classify.py
- python3 offline_inference/basic/embed.py
- python3 offline_inference/basic/score.py
# for multi-modal models
- python3 offline_inference/audio_language.py --seed 0
- python3 offline_inference/vision_language.py --seed 0
- python3 offline_inference/vision_language_pooling.py --seed 0
- python3 offline_inference/vision_language_multi_image.py --seed 0
- python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/encoder_decoder_multimodal.py --model-type whisper --seed 0
- python3 offline_inference/basic/classify.py
- python3 offline_inference/basic/embed.py
- python3 offline_inference/basic/score.py
# for pooling models
- python3 pooling/pooling/vision_language_pooling.py --seed 0
# for features demo
- python3 offline_inference/prefix_caching.py
- python3 offline_inference/llm_engine_example.py
- python3 others/tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 others/tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference/spec_decode.py --test --method eagle --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 2048
# https://github.com/vllm-project/vllm/pull/26682 uses slightly more memory in PyTorch 2.9+ causing this test to OOM in 1xL4 GPU
- python3 offline_inference/spec_decode.py --test --method eagle3 --num_spec_tokens 3 --dataset-name hf --dataset-path philschmid/mt-bench --num-prompts 80 --temp 0 --top-p 1.0 --top-k -1 --tp 1 --enable-chunked-prefill --max-model-len 1536
Expand Down
4 changes: 2 additions & 2 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -146,10 +146,10 @@ mkdocs.yaml @hmellor
/requirements/kv_connectors.txt @NickLucche

# Pooling models
/examples/*/pooling/ @noooop
/examples/pooling @noooop
/tests/models/*/pooling* @noooop
/tests/entrypoints/pooling @noooop
/vllm/entrypoints/pooling @aarnphm @chaunceyjiang @noooop
/vllm/entrypoints/pooling @noooop
/vllm/config/pooler.py @noooop
/vllm/pooling_params.py @noooop
/vllm/model_executor/layers/pooler.py @noooop
Expand Down
6 changes: 1 addition & 5 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,7 @@ nav:
- Getting Started:
- getting_started/quickstart.md
- getting_started/installation
- Examples:
- examples/README.md
- Offline Inference: examples/offline_inference
- Online Serving: examples/online_serving
- Others: examples/others
- Examples: examples
- General:
- usage/v1_guide.md
- usage/*
Expand Down
2 changes: 1 addition & 1 deletion docs/design/io_processor_plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ The `post_process*` methods take `PoolingRequestOutput` objects as input and gen
The `validate_or_generate_params` method is used for validating with the plugin any `SamplingParameters`/`PoolingParameters` received with the user request, or to generate new ones if none are specified. The function always returns the validated/generated parameters.
The `output_to_response` method is used only for online serving and converts the plugin output to the `IOProcessorResponse` type that is then returned by the API Server. The implementation of the `/pooling` serving endpoint is available here [vllm/entrypoints/openai/serving_pooling.py](../../vllm/entrypoints/pooling/pooling/serving.py).

An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/online_serving/pooling/prithvi_geospatial_mae.py](../../examples/online_serving/pooling/prithvi_geospatial_mae.py)) and offline ([examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py](../../examples/offline_inference/pooling/prithvi_geospatial_mae_io_processor.py)) inference examples.
An example implementation of a plugin that enables generating geotiff images with the PrithviGeospatialMAE model is available [here](https://github.com/IBM/terratorch/tree/main/terratorch/vllm/plugins/segmentation). Please, also refer to our online ([examples/pooling/plugin/prithvi_geospatial_mae_client.py](../../examples/pooling/plugin/prithvi_geospatial_mae_client.py)) and offline ([examples/pooling/plugin/prithvi_geospatial_mae_io_processor.py](../../examples/pooling/plugin/prithvi_geospatial_mae_io_processor.py)) inference examples.

## Using an IO Processor plugin

Expand Down
154 changes: 84 additions & 70 deletions docs/mkdocs/hooks/generate_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
import itertools
import logging
from dataclasses import dataclass, field
from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import Literal

Expand All @@ -16,13 +17,18 @@
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"


def fix_case(text: str) -> str:
def title(text: str) -> str:
# Default title case
text = text.replace("_", " ").replace("/", " - ").title()
# Custom substitutions
subs = {
"io": "IO",
"api": "API",
"cli": "CLI",
"cpu": "CPU",
"llm": "LLM",
"mae": "MAE",
"ner": "NER",
"tpu": "TPU",
"gguf": "GGUF",
"lora": "LoRA",
Expand All @@ -48,71 +54,65 @@ class Example:
Attributes:
path (Path): The path to the main directory or file.
category (str): The category of the document.
main_file (Path): The main file in the directory.
other_files (list[Path]): list of other files in the directory.
title (str): The title of the document.

Properties::
main_file() -> Path | None: Determines the main file in the given path.
other_files() -> list[Path]: Determines other files in the directory excluding
the main file.
title() -> str: Determines the title of the document.

Methods:
__post_init__(): Initializes the main_file, other_files, and title attributes.
determine_main_file() -> Path: Determines the main file in the given path.
determine_other_files() -> list[Path]: Determines other files in the directory excluding the main file.
determine_title() -> str: Determines the title of the document.
generate() -> str: Generates the documentation content.
""" # noqa: E501
"""

path: Path
category: str = None
main_file: Path = field(init=False)
other_files: list[Path] = field(init=False)
title: str = field(init=False)

def __post_init__(self):
self.main_file = self.determine_main_file()
self.other_files = self.determine_other_files()
self.title = self.determine_title()

@property
def is_code(self) -> bool:
return self.main_file.suffix != ".md"

def determine_main_file(self) -> Path:
"""
Determines the main file in the given path.
If the path is a file, it returns the path itself. Otherwise, it searches
for Markdown files (*.md) in the directory and returns the first one found.
Returns:
Path: The main file path, either the original path if it's a file or the first
Markdown file found in the directory.
Raises:
IndexError: If no Markdown files are found in the directory.
""" # noqa: E501
return self.path if self.path.is_file() else list(self.path.glob("*.md")).pop()

def determine_other_files(self) -> list[Path]:
"""
Determine other files in the directory excluding the main file.
category: str

This method checks if the given path is a file. If it is, it returns an empty list.
Otherwise, it recursively searches through the directory and returns a list of all
files that are not the main file.
@cached_property
def main_file(self) -> Path | None:
"""Determines the main file in the given path.

Returns:
list[Path]: A list of Path objects representing the other files in the directory.
""" # noqa: E501
If path is a file, it returns the path itself. If path is a directory, it
searches for Markdown files (*.md) in the directory and returns the first one
found. If no Markdown files are found, it returns None."""
# Single file example
if self.path.is_file():
return self.path
# Multi file example with a README
if md_paths := list(self.path.glob("*.md")):
return md_paths[0]
# Multi file example without a README
return None

@cached_property
def other_files(self) -> list[Path]:
"""Determine other files in the directory excluding the main file.

If path is a file, it returns an empty list. Otherwise, it returns every file
in the directory except the main file in a list."""
# Single file example
if self.path.is_file():
return []
# Multi file example
is_other_file = lambda file: file.is_file() and file != self.main_file
return [file for file in self.path.rglob("*") if is_other_file(file)]

def determine_title(self) -> str:
if not self.is_code:
# Specify encoding for building on Windows
with open(self.main_file, encoding="utf-8") as f:
first_line = f.readline().strip()
match = re.match(r"^#\s+(?P<title>.+)$", first_line)
if match:
return match.group("title")
return fix_case(self.path.stem.replace("_", " ").title())
return sorted(file for file in self.path.rglob("*") if is_other_file(file))

@cached_property
def is_code(self) -> bool:
return self.main_file is not None and self.main_file.suffix != ".md"

@cached_property
def title(self) -> str:
# Generate title from filename if no main md file found
if self.main_file is None or self.is_code:
return title(self.path.stem)
# Specify encoding for building on Windows
with open(self.main_file, encoding="utf-8") as f:
first_line = f.readline().strip()
match = re.match(r"^#\s+(?P<title>.+)$", first_line)
if match:
return match.group("title")
raise ValueError(f"Title not found in {self.main_file}")

def fix_relative_links(self, content: str) -> str:
"""
Expand Down Expand Up @@ -156,24 +156,35 @@ def generate(self) -> str:
# included files containing code fences too
code_fence = "``````"

if self.is_code:
content += (
f"{code_fence}{self.main_file.suffix[1:]}\n"
f'--8<-- "{self.main_file}"\n'
f"{code_fence}\n"
)
if self.main_file is not None:
# Single file example or multi file example with a README
if self.is_code:
content += (
f"{code_fence}{self.main_file.suffix[1:]}\n"
f'--8<-- "{self.main_file}"\n'
f"{code_fence}\n"
)
else:
with open(self.main_file, encoding="utf-8") as f:
# Skip the title from md snippets as it's been included above
main_content = f.readlines()[1:]
content += self.fix_relative_links("".join(main_content))
content += "\n"
else:
with open(self.main_file) as f:
# Skip the title from md snippets as it's been included above
main_content = f.readlines()[1:]
content += self.fix_relative_links("".join(main_content))
content += "\n"
# Multi file example without a README
for file in self.other_files:
file_title = title(str(file.relative_to(self.path).with_suffix("")))
content += f"## {file_title}\n\n"
content += (
f'{code_fence}{file.suffix[1:]}\n--8<-- "{file}"\n{code_fence}\n\n'
)
return content

if not self.other_files:
return content

content += "## Example materials\n\n"
for file in sorted(self.other_files):
for file in self.other_files:
content += f'??? abstract "{file.relative_to(self.path)}"\n'
if file.suffix != ".md":
content += f" {code_fence}{file.suffix[1:]}\n"
Expand All @@ -200,11 +211,13 @@ def on_startup(command: Literal["build", "gh-deploy", "serve"], dirty: bool):
glob_patterns = ["*.py", "*.md", "*.sh"]
# Find categorised examples
for category in categories:
logger.info("Processing category: %s", category.stem)
globs = [category.glob(pattern) for pattern in glob_patterns]
for path in itertools.chain(*globs):
examples.append(Example(path, category.stem))
# Find examples in subdirectories
for path in category.glob("*/*.md"):
globs = [category.glob(f"*/{pattern}") for pattern in glob_patterns]
for path in itertools.chain(*globs):
examples.append(Example(path.parent, category.stem))

# Generate the example documentation
Expand All @@ -217,3 +230,4 @@ def on_startup(command: Literal["build", "gh-deploy", "serve"], dirty: bool):
with open(doc_path, "w+", encoding="utf-8") as f:
f.write(example.generate())
logger.debug("Example generated: %s", doc_path.relative_to(ROOT_DIR))
logger.info("Total examples generated: %d", len(examples))
4 changes: 2 additions & 2 deletions docs/models/pooling_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ outputs = llm.embed(
print(outputs[0].outputs)
```

A code example can be found here: [examples/offline_inference/pooling/embed_matryoshka_fy.py](../../examples/offline_inference/pooling/embed_matryoshka_fy.py)
A code example can be found here: [examples/pooling/embed/embed_matryoshka_fy.py](../../examples/pooling/embed/embed_matryoshka_fy.py)

### Online Inference

Expand Down Expand Up @@ -304,7 +304,7 @@ Expected output:
{"id":"embd-5c21fc9a5c9d4384a1b021daccaf9f64","object":"list","created":1745476417,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-0.3828125,-0.1357421875,0.03759765625,0.125,0.21875,0.09521484375,-0.003662109375,0.1591796875,-0.130859375,-0.0869140625,-0.1982421875,0.1689453125,-0.220703125,0.1728515625,-0.2275390625,-0.0712890625,-0.162109375,-0.283203125,-0.055419921875,-0.0693359375,0.031982421875,-0.04052734375,-0.2734375,0.1826171875,-0.091796875,0.220703125,0.37890625,-0.0888671875,-0.12890625,-0.021484375,-0.0091552734375,0.23046875]}],"usage":{"prompt_tokens":8,"total_tokens":8,"completion_tokens":0,"prompt_tokens_details":null}}
```

An OpenAI client example can be found here: [examples/online_serving/pooling/openai_embedding_matryoshka_fy.py](../../examples/online_serving/pooling/openai_embedding_matryoshka_fy.py)
An OpenAI client example can be found here: [examples/pooling/embed/openai_embedding_matryoshka_fy.py](../../examples/pooling/embed/openai_embedding_matryoshka_fy.py)

## Deprecated Features

Expand Down
4 changes: 2 additions & 2 deletions docs/models/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) A
```

!!! note
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: [examples/offline_inference/pooling/qwen3_reranker.py](../../examples/offline_inference/pooling/qwen3_reranker.py).
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: [examples/pooling/score/qwen3_reranker.py](../../examples/pooling/score/qwen3_reranker.py).

```bash
vllm serve Qwen/Qwen3-Reranker-0.6B --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
Expand Down Expand Up @@ -605,7 +605,7 @@ These models primarily support the [`LLM.encode`](./pooling_models.md#llmencode)
| `ModernBertForTokenClassification` | ModernBERT-based | `disham993/electrical-ner-ModernBERT-base` | | |

!!! note
Named Entity Recognition (NER) usage, please refer to [examples/offline_inference/pooling/ner.py](../../examples/offline_inference/pooling/ner.py), [examples/online_serving/pooling/ner_client.py](../../examples/online_serving/pooling/ner_client.py).
Named Entity Recognition (NER) usage, please refer to [examples/pooling/token_classify/ner.py](../../examples/pooling/token_classify/ner.py), [examples/pooling/token_classify/ner_client.py](../../examples/pooling/token_classify/ner_client.py).

## List of Multimodal Language Models

Expand Down
Loading