[Frontend] Improve the performance of is_reasoning_end#25735
[Frontend] Improve the performance of is_reasoning_end#25735DarkLight1337 merged 8 commits intovllm-project:mainfrom
is_reasoning_end#25735Conversation
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
0f33848 to
7b42fff
Compare
💡 Codex Reviewhttps://github.com/vllm-project/vllm/blob/0f3384862593027cd7ea8433c9c5314536011ecc/vllm/reasoning/basic_parsers.py#L60-L66 The new incremental ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. |
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
/cc @njhill Due to the various uses of |
|
Can you edit the PR description accordingly? |
|
@DarkLight1337 I've updated the PR description. this optimization provides only a very small performance improvement. 😂 |
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
…ct#25735) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Purpose
is_reasoning_endis executed once for every token generated during the reasoning phase, which results in poor performance.Optimize it to perform incremental checks instead.Currently,
is_reasoning_endcannot support incremental checking for the following reason:When
stream=true,is_reasoning_endchecksoutput.token_idsandres.prompt_token_idsseparately. This leads to non-deterministic inputs tois_reasoning_endwithin the same request, making incremental checking impossible.Therefore, the only feasible optimization at present is to modify
is_reasoning_endto search backward for theend_tokenfrom the end of the token sequence.Test Plan
main
this pr
Test Result
The performance improvement is very marginal.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.