vllm-project · zhewenl · Mar 21, 2026
@@ -31,29 +31,28 @@ Of course, we also have "plugin" tasks that allow users to customize input and o
 
 ### Pooling Tasks
 
-| Pooling Tasks         | Granularity   | Outputs                                         |
-|-----------------------|---------------|-------------------------------------------------|
-| `classify` (see note) | Sequence-wise | probability vector of classes for each sequence |
-| `embed`               | Sequence-wise | vector representations for each sequence        |
-| `token_classify`      | Token-wise    | probability vector of classes for each token    |
-| `token_embed`         | Token-wise    | vector representations for each token           |
+| Pooling Tasks      | Granularity   | Outputs                                         |
+|--------------------|---------------|-------------------------------------------------|
+| `classify`         | Sequence-wise | probability vector of classes for each sequence |
+| `score` (see note) | Sequence-wise | reranker score for each sequence                |
+| `embed`            | Sequence-wise | vector representations for each sequence        |
+| `token_classify`   | Token-wise    | probability vector of classes for each token    |
+| `token_embed`      | Token-wise    | vector representations for each token           |
 
 !!! note
     Within classification tasks, there is a specialized subcategory: Cross-encoder (aka reranker) models. These models are a subset of classification models that accept two prompts as input and output num_labels equal to 1.
 
 ### Score Types
 
-The scoring models is designed to compute similarity scores between two input prompts. It supports three model types (aka `score_type`): `cross-encoder`, `late-interaction`, and `bi-encoder`.
+| Pooling Tasks      | Granularity   | Outputs                                         | Score Types        | scoring function         |
+|--------------------|---------------|-------------------------------------------------|--------------------|--------------------------|
+| `classify`         | Sequence-wise | probability vector of classes for each sequence | nan                | nan                      |
+| `score` (see note) | Sequence-wise | reranker score for each sequence                | `cross-encoder`    | linear classifier        |
+| `embed`            | Sequence-wise | vector representations for each sequence        | `bi-encoder`       | cosine similarity        |
+| `token_classify`   | Token-wise    | probability vector of classes for each token    | nan                | nan                      |
+| `token_embed`      | Token-wise    | vector representations for each token           | `late-interaction` | late interaction(MaxSim) |
 
-| Pooling Tasks         | Granularity   | Outputs                                      | Score Types        | scoring function         |
-|-----------------------|---------------|----------------------------------------------|--------------------|--------------------------|
-| `classify` (see note) | Sequence-wise | reranker score for each sequence             | `cross-encoder`    | linear classifier        |
-| `embed`               | Sequence-wise | vector representations for each sequence     | `bi-encoder`       | cosine similarity        |
-| `token_classify`      | Token-wise    | probability vector of classes for each token | nan                | nan                      |
-| `token_embed`         | Token-wise    | vector representations for each token        | `late-interaction` | late interaction(MaxSim) |
-
-!!! note
-    Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.
+The score models is designed to compute similarity scores between two input prompts. It supports three model types (aka `score_type`): `cross-encoder`, `late-interaction`, and `bi-encoder`.
 
 ### Pooling Usages
 
@@ -86,16 +85,14 @@ enabling the corresponding APIs.
 
 ### Offline APIs corresponding to pooling tasks
 
-| Task             | APIs                                                                                  |
-|------------------|---------------------------------------------------------------------------------------|
-| `embed`          | `LLM.embed(...)`, `LLM.encode(..., pooling_task="embed")`, `LLM.score(...)`(see note) |
-| `classify`       | `LLM.classify(...)`, `LLM.encode(..., pooling_task="classify")`, `LLM.score(...)`     |
-| `token_classify` | `LLM.reward(...)`, `LLM.encode(..., pooling_task="token_classify")`                   |
-| `token_embed`    | `LLM.encode(..., pooling_task="token_embed")`, `LLM.score(...)`                       |
-| `plugin`         | `LLM.encode(..., pooling_task="plugin")`                                              |
-
-!!! note
-    Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.
+| Task             | APIs                                                                       |
+|------------------|----------------------------------------------------------------------------|
+| `embed`          | `LLM.embed(...)`,`LLM.encode(..., pooling_task="embed")`, `LLM.score(...)` |
+| `classify`       | `LLM.classify(...)`, `LLM.encode(..., pooling_task="classify")`            |
+| `score`          | `LLM.score(...)`                                                           |
+| `token_classify` | `LLM.reward(...)`, `LLM.encode(..., pooling_task="token_classify")`        |
+| `token_embed`    | `LLM.encode(..., pooling_task="token_embed")`, `LLM.score(...)`            |
+| `plugin`         | `LLM.encode(..., pooling_task="plugin")`                                   |
 
 ### `LLM.classify`
 
@@ -209,11 +206,11 @@ If `--runner pooling` has been set (manually or automatically) but the model doe
 vLLM will attempt to automatically convert the model according to the architecture names
 shown in the table below.
 
-| Architecture                                    | `--convert` | Supported pooling tasks      |
-|-------------------------------------------------|-------------|------------------------------|
-| `*ForTextEncoding`, `*EmbeddingModel`, `*Model` | `embed`     | `token_embed`, `embed`       |
-| `*ForRewardModeling`, `*RewardModel`            | `embed`     | `token_embed`, `embed`       |
-| `*For*Classification`, `*ClassificationModel`   | `classify`  | `token_classify`, `classify` |
+| Architecture                                    | `--convert` | Supported pooling tasks               |
+| ----------------------------------------------- | ----------- | ------------------------------------- |
+| `*ForTextEncoding`, `*EmbeddingModel`, `*Model` | `embed`     | `token_embed`, `embed`                |
+| `*ForRewardModeling`, `*RewardModel`            | `embed`     | `token_embed`, `embed`                |
+| `*For*Classification`, `*ClassificationModel`   | `classify`  | `token_classify`, `classify`, `score` |
 
 !!! tip
     You can explicitly set `--convert <type>` to specify how to convert the model.
@@ -254,7 +251,3 @@ Pooling models now default support all pooling, you can use it without any setti
 
 - Extracting hidden states prefers using `token_embed` task.
 - Named Entity Recognition (NER) and reward models prefers using `token_classify` task.
-
-### Score task
-
-`score` task is deprecated and will be removed in v0.20. Please use `classify` instead. Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.
@@ -17,8 +17,6 @@ The key distinction between (sequence) classification and token classification l
 
 Many classification models support both (sequence) classification and token classification. For further details on token classification, please refer to [this page](token_classify.md).
 
-Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled, please refer to [this page](scoring.md).
-
 ## Typical Use Cases
 
 ### Classification
@@ -56,7 +54,7 @@ If your model is not in the above list, we will try to automatically convert the
 
 Cross-encoder (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1. Most classification models can also be used as [cross-encoder models](scoring.md#cross-encoder-models). For more information on cross-encoder models, please refer to [this page](scoring.md).
 
---8<-- "docs/models/pooling_models/scoring.md:supported-cross-encoder-models"
+--8<-- "docs/models/pooling_models/scoring.md:supported-score-models"
 
 ### Reward Models
 

@@ -10,28 +10,25 @@ The score models is designed to compute similarity scores between two input prom
 - Model Usage: Scoring
 - Pooling Task:
 
-| Score Types        | Pooling Tasks         | scoring function         |
-|--------------------|-----------------------|--------------------------|
-| `cross-encoder`    | `classify` (see note) | linear classifier        |
-| `late-interaction` | `token_embed`         | late interaction(MaxSim) |
-| `bi-encoder`       | `embed`               | cosine similarity        |
+| Score Types        | Pooling Tasks | scoring function         |
+|--------------------|---------------|--------------------------|
+| `cross-encoder`    | `score`       | linear classifier        |
+| `late-interaction` | `token_embed` | late interaction(MaxSim) |
+| `bi-encoder`       | `embed`       | cosine similarity        |
 
 - Offline APIs:
     - `LLM.score`
 - Online APIs:
     - [Score API](scoring.md#score-api) (`/score`)
     - [Rerank API](scoring.md#rerank-api) (`/rerank`, `/v1/rerank`, `/v2/rerank`)
 
-!!! note
-    Only when a classification model outputs num_labels equal to 1 can it be used as a scoring model and have its scoring API enabled.
-
 ## Supported Models
 
 ### Cross-encoder models
 
 [Cross-encoder](https://www.sbert.net/examples/applications/cross-encoder/README.html) (aka reranker) models are a subset of classification models that accept two prompts as input and output num_labels equal to 1.
 
---8<-- [start:supported-cross-encoder-models]
+--8<-- [start:supported-score-models]
 
 #### Text-only Models
 
@@ -102,7 +99,7 @@ The score models is designed to compute similarity scores between two input prom
     vllm serve Qwen/Qwen3-VL-Reranker-2B --hf_overrides '{"architectures": ["Qwen3VLForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
     ```
 
---8<-- [end:supported-cross-encoder-models]
+--8<-- [end:supported-score-models]
 
 ### Late-interaction models
 

diff --git a/tests/test_pooling_params.py b/tests/test_pooling_params.py
@@ -74,7 +74,7 @@ def test_embed_dimensions(model_info: EmbedModelInfo):
         pooling_params.verify(model_config)
 
 
-@pytest.mark.parametrize("task", ["classify"])
+@pytest.mark.parametrize("task", ["score", "classify"])
 def test_classify(task):
     model_config = MockModelConfig(pooler_config=PoolerConfig(seq_pooling_type="CLS"))
 

@@ -1435,10 +1435,10 @@ def requires_raw_input_tokens(self) -> bool:
     @property
     def score_type(self) -> ScoreType:
         """
-        Scoring API handles score/rerank for:\n
-        - "classify" task (score_type: cross-encoder models)\n
-        - "embed" task (score_type: bi-encoder models)\n
-        - "token_embed" task (score_type: late interaction models)\n
+        Score API handles score/rerank for:
+        - "score" task (score_type: cross-encoder models)
+        - "embed" task (score_type: bi-encoder models)
+        - "token_embed" task (score_type: late interaction models)
         """
         # fixme: self._model_info.score_type is the score type before
         #  as_seq_cls_model, which is "bi-encoder", rather than the

@@ -1477,9 +1477,9 @@ def _cross_encoding_score(
             data_1 = data_1 * len(data_2)
 
         if pooling_params is None:
-            pooling_params = PoolingParams(task="classify")
+            pooling_params = PoolingParams(task="score")
         elif pooling_params.task is None:
-            pooling_params.task = "classify"
+            pooling_params.task = "score"
 
         pooling_params_list = list[PoolingParams]()
 

@@ -22,7 +22,7 @@
 from starlette.datastructures import State
 
 import vllm.envs as envs
-from vllm.config import ModelConfig, VllmConfig
+from vllm.config import VllmConfig
 from vllm.engine.arg_utils import AsyncEngineArgs
 from vllm.engine.protocol import EngineClient
 from vllm.entrypoints.chat_utils import load_chat_template
@@ -155,9 +155,7 @@ async def build_async_engine_client_from_engine_args(
 
 
 def build_app(
-    args: Namespace,
-    supported_tasks: tuple["SupportedTask", ...] | None = None,
-    model_config: ModelConfig | None = None,
+    args: Namespace, supported_tasks: tuple["SupportedTask", ...] | None = None
 ) -> FastAPI:
     if supported_tasks is None:
         warnings.warn(
@@ -193,7 +191,7 @@ def build_app(
         attach_router as register_sagemaker_api_router,
     )
 
-    register_sagemaker_api_router(app, supported_tasks, model_config)
+    register_sagemaker_api_router(app, supported_tasks)
 
     if "generate" in supported_tasks:
         from vllm.entrypoints.openai.generate.api_router import (
@@ -244,7 +242,7 @@ def build_app(
     if any(task in POOLING_TASKS for task in supported_tasks):
         from vllm.entrypoints.pooling import register_pooling_api_routers
 
-        register_pooling_api_routers(app, supported_tasks, model_config)
+        register_pooling_api_routers(app, supported_tasks)
 
     app.root_path = args.root_path
     app.add_middleware(
@@ -585,10 +583,8 @@ async def build_and_serve(
         uvicorn_kwargs["log_config"] = log_config
 
     supported_tasks = await engine_client.get_supported_tasks()
-    model_config = engine_client.model_config
-
     logger.info("Supported tasks: %s", supported_tasks)
-    app = build_app(args, supported_tasks, model_config)
+    app = build_app(args, supported_tasks)
     await init_app_state(engine_client, app.state, args, supported_tasks)
 
     logger.info("Starting vLLM server on %s", listen_address)

@@ -5,9 +5,6 @@
 
 from fastapi import FastAPI
 
-from vllm.config import ModelConfig
-from vllm.logger import init_logger
-
 if TYPE_CHECKING:
     from argparse import Namespace
 
@@ -20,30 +17,9 @@
     RequestLogger = object
     SupportedTask = object
 
-logger = init_logger(__name__)
-
-
-def enable_scoring_api(
-    supported_tasks: tuple["SupportedTask", ...],
-    model_config: ModelConfig | None = None,
-) -> bool:
-    if any(t in supported_tasks for t in ("embed", "token_embed")):
-        return True
-
-    if model_config is not None and "classify" in supported_tasks:
-        num_labels = getattr(model_config.hf_config, "num_labels", 0)
-        if num_labels != 1:
-            logger.debug_once("Score API is only enabled for num_labels == 1.")
-            return False
-        return True
-
-    return False
-
 
 def register_pooling_api_routers(
-    app: FastAPI,
-    supported_tasks: tuple["SupportedTask", ...],
-    model_config: ModelConfig | None = None,
+    app: FastAPI, supported_tasks: tuple["SupportedTask", ...]
 ):
     from vllm.entrypoints.pooling.pooling.api_router import router as pooling_router
 
@@ -61,7 +37,11 @@ def register_pooling_api_routers(
 
         app.include_router(embed_router)
 
-    if enable_scoring_api(supported_tasks, model_config):
+    # Score API handles score/rerank for:
+    # - "score" task (score_type: cross-encoder models)
+    # - "embed" task (score_type: bi-encoder models)
+    # - "token_embed" task (score_type: late interaction models)
+    if any(t in supported_tasks for t in ("score", "embed", "token_embed")):
         from vllm.entrypoints.pooling.score.api_router import router as score_router
 
         app.include_router(score_router)
@@ -81,8 +61,6 @@ def init_pooling_state(
     from vllm.entrypoints.pooling.score.serving import ServingScores
     from vllm.tasks import POOLING_TASKS
 
-    model_config = engine_client.model_config
-
     resolved_chat_template = load_chat_template(args.chat_template)
 
     state.serving_pooling = (
@@ -124,6 +102,10 @@ def init_pooling_state(
         if "classify" in supported_tasks
         else None
     )
+    # Score API handles score/rerank for:
+    # - "score" task (score_type: cross-encoder models)
+    # - "embed" task (score_type: bi-encoder models)
+    # - "token_embed" task (score_type: late interaction models)
     state.serving_scores = (
         ServingScores(
             engine_client,
@@ -132,6 +114,6 @@ def init_pooling_state(
             score_template=resolved_chat_template,
             log_error_stack=args.log_error_stack,
         )
-        if enable_scoring_api(supported_tasks, model_config)
+        if any(t in supported_tasks for t in ("embed", "score", "token_embed"))
         else None
     )
@@ -35,7 +35,7 @@ def build_tok_params(self, model_config: ModelConfig) -> TokenizeParams:
             max_total_tokens_param="max_model_len",
         )
 
-    def to_pooling_params(self, task: PoolingTask = "classify"):
+    def to_pooling_params(self, task: PoolingTask = "score"):
         return PoolingParams(
             task=task,
             use_activation=self.use_activation,
@@ -111,7 +111,7 @@ def build_tok_params(self, model_config: ModelConfig) -> TokenizeParams:
             max_total_tokens_param="max_model_len",
         )
 
-    def to_pooling_params(self, task: PoolingTask = "classify"):
+    def to_pooling_params(self, task: PoolingTask = "score"):
         return PoolingParams(
             task=task,
             use_activation=self.use_activation,

@@ -413,7 +413,7 @@ async def _cross_encoding_score(
         # Schedule the request and get the result generator.
         generators: list[AsyncGenerator[PoolingRequestOutput, None]] = []
 
-        default_pooling_params = request.to_pooling_params("classify")
+        default_pooling_params = request.to_pooling_params("score")
 
         for i, engine_prompt in enumerate(engine_prompts):
             request_id_item = f"{request_id}-{i}"