Skip to content

[Feature] default --extra-body param to disable thinking in vllm bench serve#26784

Merged
DarkLight1337 merged 5 commits intovllm-project:mainfrom
lengrongfu:feat/opt-bench
Oct 15, 2025
Merged

[Feature] default --extra-body param to disable thinking in vllm bench serve#26784
DarkLight1337 merged 5 commits intovllm-project:mainfrom
lengrongfu:feat/opt-bench

Conversation

@lengrongfu
Copy link
Copy Markdown
Contributor

@lengrongfu lengrongfu commented Oct 14, 2025

Purpose

FIX: #26760

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
@mergify mergify bot added the performance Performance-related issues label Oct 14, 2025
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the benchmark serving script by renaming sampling_params to extra_body for better clarity, as it now includes more than just sampling parameters. It also introduces a change to disable 'thinking' by default in chat templates during benchmarks. My review focuses on a key aspect of this new feature: while disabling thinking by default is a good goal, the current implementation hardcodes this setting, which limits the benchmark's flexibility. I've suggested making this configurable via a command-line argument to maintain the tool's versatility.

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should not be applied by default. Users should be able to pass --extra-body explicitly via CLI which is merged with the sampling params

@lengrongfu
Copy link
Copy Markdown
Contributor Author

I think this should not be applied by default. Users should be able to pass --extra-body explicitly via CLI which is merged with the sampling params

Ok, i will add --extra-body param. test comand:

$ vllm bench serve --model Qwen/Qwen3-0.6B --backend openai --max-concurrency 1 --num-prompts 10 --extra-body '{"chat_template_kwargs":{"enable_thinking":false}}'

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
@lengrongfu lengrongfu changed the title [Feature] default disable thinking in vllm bench serve [Feature] default --extra-body param to disable thinking in vllm bench serve Oct 14, 2025
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 14, 2025 16:10
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 14, 2025
@lengrongfu
Copy link
Copy Markdown
Contributor Author

CI fail not related to the current modification.

@DarkLight1337 DarkLight1337 merged commit a27b288 into vllm-project:main Oct 15, 2025
46 checks passed
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
@lengrongfu lengrongfu deleted the feat/opt-bench branch October 21, 2025 02:54
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
…h serve (vllm-project#26784)

Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Option to disable thinking in vllm bench serve

3 participants