[Model] Add Reasoning Parser for Granite Models#14202
[Model] Add Reasoning Parser for Granite Models#14202DarkLight1337 merged 21 commits intovllm-project:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
0affa66 to
f1d3367
Compare
|
Nice use of this new feature! Will try in a bit cc @gaocegege |
There was a problem hiding this comment.
Thanks for the contribution!
Could you please rebase the upstream? In a previous PR to support reasoning outputs in structured outputs https://github.com/vllm-project/vllm/pull/12955/files#diff-ea8b8ff63961713ccb62d78e53e96404b587b7828cb9fee08a9e5576bf563673R1065, we moved the CLI argument --reasoning-parser to https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L1076
Thus you may need to add a new choice there.
|
Hi, I updated the docs in this PR #14114 Maybe you should rebase the docs too. Just FYI |
4f4cbe1 to
ecadd2d
Compare
There was a problem hiding this comment.
Adding a warning for now since this is already a large PR, but I think adding a GraniteReasoner for guided decoding could be a follow-up later?
There was a problem hiding this comment.
We can just fail for now.
There was a problem hiding this comment.
@aarnphm This will behave the same way as models with no reasoner. The intention here was mostly to clarify that there isn't a reasoning backend for granite in case users conflate it with --enable-reasoning / --reasoning-parser granite being supported for these models
|
Awesome thanks @gaocegege! It's been rebased 😄 |
gaocegege
left a comment
There was a problem hiding this comment.
Thanks for your contribution! 🎉 👍
|
@mgoin Please give it another review, thanks! |
There was a problem hiding this comment.
| response_content = current_text[current_chunk_end + 1:] | |
| response_content = current_text[current_chunk_end + 1:] | |
| parsed_content = True |
parsed_content flag doesn't seem to be updated, so maybe helpful to set it?
Very minor suggestion, totally optional
There was a problem hiding this comment.
Hey @b8zhong, thanks for the suggestion! For now, I'd prefer to keep it as is since it returns immediately after parsing the response content. I.e., once this condition is met, there is no need to keep going, so updating the flag won't do anything 🙂
|
This pull request has merge conflicts that must be resolved before it can be |
|
@alex-jw-brooks Hi could you please resolve the conflicts |
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
aarnphm
left a comment
There was a problem hiding this comment.
tiny comments,otherwise lgtm.
There was a problem hiding this comment.
Can we rever this to reduce code change? thanks.
There was a problem hiding this comment.
We can just fail for now.
Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
d64254f to
ab83ec1
Compare
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
a9aeb12 to
a73c789
Compare
|
Thanks @aarnphm - it's ready for another look when you have a moment 🙂 |
|
Hi @mgoin, can you please take a look at this PR when you have a moment? |
DarkLight1337
left a comment
There was a problem hiding this comment.
Since this is basically coming from the model vendor, I'll just stamp it. The code looks reasonable to me
|
Can you fix the merge conflicts? |
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
|
Thanks for the review @DarkLight1337! Sure should be resolved now 🤞 |
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
This PR adds a reasoning parser for Granite 3.2 models! These models have an optional chat template kwarg
thinkingthat changes the system prompt to enable reasoning. 😄The format of the text is expected to be:
There have been reports of quantized versions of the model using
Here'sinstead ofHere isthough, so this PR matches on both.Examples
Start the server with a granite (3.2) language model that has reasoning and the
graniteparser:Snippets are copied from the docs, with the only change being adding
chat_template_kwargswiththinking=True. Without this, reasoning will be disabled, and it'll generally parse everything intocontent.No streaming:
With streaming:
Example output (run from the streaming snippet above)