Skip to content

[Benchmark] Use truncation by default for pooling benchmarks#26992

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
DarkLight1337:truncate-prompt-tokens
Oct 16, 2025
Merged

[Benchmark] Use truncation by default for pooling benchmarks#26992
DarkLight1337 merged 1 commit intovllm-project:mainfrom
DarkLight1337:truncate-prompt-tokens

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Oct 16, 2025

Purpose

Enable truncation by default for the benchmarks to avoid dropped requests.

This is also consistent with the processing from Infinity: https://github.com/search?q=repo%3Amichaelfeil%2Finfinity%20max_length&type=code

Related: #24235

cc @noooop @maxdebayser

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
@DarkLight1337 DarkLight1337 requested a review from noooop October 16, 2025 06:03
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
@mergify mergify bot added the performance Performance-related issues label Oct 16, 2025
Copy link
Collaborator

@noooop noooop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables truncation by default for pooling benchmarks to prevent dropped requests for models with short context lengths. This is achieved by adding "truncate_prompt_tokens": -1 to the request payloads for embedding and reranking functions. The change also removes now-redundant manual truncation logic from a preprocessing function. The changes are logical and align with the stated purpose. I've identified one issue where a request function for reranking was missing a call to a common payload update function, which would prevent certain parameters from being passed. Addressing this will improve consistency and correctness.

# this is to avoid dropping some of the requests.
"truncate_prompt_tokens": -1,
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function is missing a call to _update_payload_common, which is present in other similar request functions like async_request_openai_embeddings and async_request_openai_embeddings_chat. This omission causes extra_body and ignore_eos from request_func_input to be ignored for rerank requests, which is likely unintended and prevents passing extra parameters to the rerank endpoint. Please add the call to _update_payload_common here for consistency and correctness.

Suggested change
_update_payload_common(payload, request_func_input)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @maxdebayser is it intended that you don't call this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix it in a separate PR if so

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.

Comment on lines 643 to 644
# Image input
request_func_input.prompt = ""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Infinity CLIP benchmark no longer truncates prompts by default

The commit removes the fallback that injected truncate_prompt_tokens=-1 in _preprocess_clip, but async_request_infinity_embeddings_clip still relies on this helper before forwarding to async_request_infinity_embeddings. The Infinity request payload (lines 702‑718) never sets a truncation parameter, so CLIP runs via the Infinity backend now send full prompts even though CLIP models only accept 77 tokens. When running pooling benchmarks against Infinity with dataset entries longer than 77 tokens, requests will again be dropped or fail, which is the regression this change was meant to prevent.

Useful? React with 👍 / 👎.

@DarkLight1337 DarkLight1337 merged commit 17838e5 into vllm-project:main Oct 16, 2025
50 checks passed
@DarkLight1337 DarkLight1337 deleted the truncate-prompt-tokens branch October 16, 2025 08:02
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 16, 2025
…oject#26992)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025
BoyuanFeng pushed a commit to BoyuanFeng/vllm that referenced this pull request Oct 17, 2025
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…oject#26992)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…oject#26992)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants