[Benchmark] Use truncation by default for pooling benchmarks#26992
[Benchmark] Use truncation by default for pooling benchmarks#26992DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Code Review
This pull request enables truncation by default for pooling benchmarks to prevent dropped requests for models with short context lengths. This is achieved by adding "truncate_prompt_tokens": -1 to the request payloads for embedding and reranking functions. The change also removes now-redundant manual truncation logic from a preprocessing function. The changes are logical and align with the stated purpose. I've identified one issue where a request function for reranking was missing a call to a common payload update function, which would prevent certain parameters from being passed. Addressing this will improve consistency and correctness.
| # this is to avoid dropping some of the requests. | ||
| "truncate_prompt_tokens": -1, | ||
| } | ||
|
|
There was a problem hiding this comment.
This function is missing a call to _update_payload_common, which is present in other similar request functions like async_request_openai_embeddings and async_request_openai_embeddings_chat. This omission causes extra_body and ignore_eos from request_func_input to be ignored for rerank requests, which is likely unintended and prevents passing extra parameters to the rerank endpoint. Please add the call to _update_payload_common here for consistency and correctness.
| _update_payload_common(payload, request_func_input) |
There was a problem hiding this comment.
cc @maxdebayser is it intended that you don't call this?
There was a problem hiding this comment.
We can fix it in a separate PR if so
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR.
| # Image input | ||
| request_func_input.prompt = "" |
There was a problem hiding this comment.
Infinity CLIP benchmark no longer truncates prompts by default
The commit removes the fallback that injected truncate_prompt_tokens=-1 in _preprocess_clip, but async_request_infinity_embeddings_clip still relies on this helper before forwarding to async_request_infinity_embeddings. The Infinity request payload (lines 702‑718) never sets a truncation parameter, so CLIP runs via the Infinity backend now send full prompts even though CLIP models only accept 77 tokens. When running pooling benchmarks against Infinity with dataset entries longer than 77 tokens, requests will again be dropped or fail, which is the regression this change was meant to prevent.
Useful? React with 👍 / 👎.
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…oject#26992) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Purpose
Enable truncation by default for the benchmarks to avoid dropped requests.
This is also consistent with the processing from Infinity: https://github.com/search?q=repo%3Amichaelfeil%2Finfinity%20max_length&type=code
Related: #24235
cc @noooop @maxdebayser
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.