[Benchmark] Use truncation by default for pooling benchmarks by DarkLight1337 · Pull Request #26992 · vllm-project/vllm

DarkLight1337 · 2025-10-16T06:03:10Z

Purpose

Enable truncation by default for the benchmarks to avoid dropped requests.

This is also consistent with the processing from Infinity: https://github.com/search?q=repo%3Amichaelfeil%2Finfinity%20max_length&type=code

Related: #24235

cc @noooop @maxdebayser

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

noooop

LGTM

gemini-code-assist

Code Review

This pull request enables truncation by default for pooling benchmarks to prevent dropped requests for models with short context lengths. This is achieved by adding "truncate_prompt_tokens": -1 to the request payloads for embedding and reranking functions. The change also removes now-redundant manual truncation logic from a preprocessing function. The changes are logical and align with the stated purpose. I've identified one issue where a request function for reranking was missing a call to a common payload update function, which would prevent certain parameters from being passed. Addressing this will improve consistency and correctness.

gemini-code-assist · 2025-10-16T06:05:11Z

vllm/benchmarks/lib/endpoint_request_func.py

+        # this is to avoid dropping some of the requests.
+        "truncate_prompt_tokens": -1,
    }



This function is missing a call to _update_payload_common, which is present in other similar request functions like async_request_openai_embeddings and async_request_openai_embeddings_chat. This omission causes extra_body and ignore_eos from request_func_input to be ignored for rerank requests, which is likely unintended and prevents passing extra parameters to the rerank endpoint. Please add the call to _update_payload_common here for consistency and correctness.

Suggested change

_update_payload_common(payload, request_func_input)

cc @maxdebayser is it intended that you don't call this?

We can fix it in a separate PR if so

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

@codex fix this CI failure
@codex address that feedback

chatgpt-codex-connector · 2025-10-16T06:05:13Z

vllm/benchmarks/lib/endpoint_request_func.py

        # Image input
        request_func_input.prompt = ""


Infinity CLIP benchmark no longer truncates prompts by default

The commit removes the fallback that injected truncate_prompt_tokens=-1 in _preprocess_clip, but async_request_infinity_embeddings_clip still relies on this helper before forwarding to async_request_infinity_embeddings. The Infinity request payload (lines 702‑718) never sets a truncation parameter, so CLIP runs via the Infinity backend now send full prompts even though CLIP models only accept 77 tokens. When running pooling benchmarks against Infinity with dataset entries longer than 77 tokens, requests will again be dropped or fail, which is the regression this change was meant to prevent.

Useful? React with 👍 / 👎.

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

[Benchmark] Use truncation by default for pooling benchmarks

d1eb1e9

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

DarkLight1337 requested a review from noooop October 16, 2025 06:03

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025

mergify bot added the performance Performance-related issues label Oct 16, 2025

noooop approved these changes Oct 16, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 16, 2025

View reviewed changes

DarkLight1337 mentioned this pull request Oct 16, 2025

Change the default value of truncate_prompt_tokens in the embedding/rerank/pooling model to -1 #24235

Open

5 tasks

DarkLight1337 merged commit 17838e5 into vllm-project:main Oct 16, 2025
50 checks passed

DarkLight1337 deleted the truncate-prompt-tokens branch October 16, 2025 08:02

Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

b913a78

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

008e032

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

14bd846

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

7687443

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

13e93e3

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Benchmark] Use truncation by default for pooling benchmarks (vllm-pr…

fe8b22d

…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Benchmark] Use truncation by default for pooling benchmarks#26992

[Benchmark] Use truncation by default for pooling benchmarks#26992
DarkLight1337 merged 1 commit intovllm-project:mainfrom
DarkLight1337:truncate-prompt-tokens

DarkLight1337 commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

noooop left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 16, 2025

Uh oh!

DarkLight1337 Oct 16, 2025

Uh oh!

DarkLight1337 Oct 16, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

DarkLight1337 commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

noooop left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DarkLight1337 commented Oct 16, 2025 •

edited by github-actions bot

Loading