[Model] Add Reasoning Parser for Granite Models by alex-jw-brooks · Pull Request #14202 · vllm-project/vllm

alex-jw-brooks · 2025-03-04T11:41:27Z

This PR adds a reasoning parser for Granite 3.2 models! These models have an optional chat template kwarg thinking that changes the system prompt to enable reasoning. 😄

The format of the text is expected to be:

Here is my thought process: <reasoning_content> Here is my response: <content>

There have been reports of quantized versions of the model using Here's instead of Here is though, so this PR matches on both.

Examples

Start the server with a granite (3.2) language model that has reasoning and the granite parser:

python vllm/entrypoints/openai/api_server.py \
    --device cuda \
    --model ibm-granite/granite-3.2-8b-instruct \
    --tokenizer ibm-granite/granite-3.2-8b-instruct \
    --enable-reasoning \
    --reasoning-parser granite

Snippets are copied from the docs, with the only change being adding chat_template_kwargs with thinking=True. Without this, reasoning will be disabled, and it'll generally parse everything into content.

No streaming:

from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id

# Round 1
messages = [
    {
        "role": "user",
        "content": "9.11 and 9.8, which is greater?"
    }
]
response = client.chat.completions.create(model=model, messages=messages, extra_body={"chat_template_kwargs": {"thinking": True}})

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

print("reasoning_content:", reasoning_content)
print("content:", content)

With streaming:

import json

import requests

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

models = requests.get(
    f"{openai_api_base}/models",
    headers={
        "Authorization": f"Bearer {openai_api_key}"
    },
).json()
model = models["data"][0]["id"]

# Streaming chat completions
messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]

response = requests.post(
    f"{openai_api_base}/chat/completions",
    headers={"Authorization": f"Bearer {openai_api_key}"},
    json={
        "model": model,
        "messages": messages,
        "chat_template_kwargs": {"thinking": True},
        "stream": True
    },
)



print("client: Start streaming chat completions...")
printed_reasoning_content = False
printed_content = False
# Make the streaming request
if response.status_code == 200:
    # Process the streaming response
    for line in response.iter_lines():
        if line:  # Filter out keep-alive new lines
            # Decode the line and parse the JSON
            decoded_line = line.decode("utf-8")
            if decoded_line.startswith("data:"):
                data = decoded_line[5:].strip()  # Remove "data:" prefix
                if data == "[DONE]":  # End of stream
                    print("\nclient: Stream completed.")
                    break
                try:
                    # Parse the JSON data
                    chunk = json.loads(data)
                    reasoning_content = chunk["choices"][0]["delta"].get(
                        "reasoning_content", "")
                    content = chunk["choices"][0]["delta"].get("content", "")

                    if reasoning_content:
                        if not printed_reasoning_content:
                            printed_reasoning_content = True
                            print("reasoning_content:", end="", flush=True)
                        print(reasoning_content, end="", flush=True)
                    elif content:
                        if not printed_content:
                            printed_content = True
                            print("\ncontent:", end="", flush=True)
                        # Extract and print the content
                        print(content, end="", flush=True)
                except json.JSONDecodeError:
                    print("Error decoding JSON:", decoded_line)
else:
    print(f"Error: {response.status_code} - {response.text}")

Example output (run from the streaming snippet above)

reasoning_content:
This is a straightforward comparison of two numbers. The task is to determine which is larger: 9.11 or 9.8. 

I need to recall the value of these decimal numbers and compare them. Given both are very close, it requires precise comprehension to understand which has the larger value—specifically focusing on the tenths and hundredths places.


content:

9.8 is greater than 9.11. 

Let's break down the comparison:

- Both numbers are above 9, so we're comparing the decimal parts.
- 9.11 has a '11' in the hundredths place.
- 9.8 has an '80' in the hundredths place, which is larger (even if it's ten times, 80 > 11).

Therefore, 9.8 > 9.11.
client: Stream completed.

github-actions · 2025-03-04T11:41:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mgoin · 2025-03-04T20:31:39Z

Nice use of this new feature! Will try in a bit cc @gaocegege

gaocegege

Thanks for the contribution!

Could you please rebase the upstream? In a previous PR to support reasoning outputs in structured outputs https://github.com/vllm-project/vllm/pull/12955/files#diff-ea8b8ff63961713ccb62d78e53e96404b587b7828cb9fee08a9e5576bf563673R1065, we moved the CLI argument --reasoning-parser to https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L1076

Thus you may need to add a new choice there.

docs/source/features/reasoning_outputs.md

gaocegege · 2025-03-06T05:35:00Z

Hi, I updated the docs in this PR #14114

Maybe you should rebase the docs too. Just FYI

alex-jw-brooks · 2025-03-06T08:33:29Z

vllm/model_executor/guided_decoding/reasoner/__init__.py

Adding a warning for now since this is already a large PR, but I think adding a GraniteReasoner for guided decoding could be a follow-up later?

We can just fail for now.

@aarnphm This will behave the same way as models with no reasoner. The intention here was mostly to clarify that there isn't a reasoning backend for granite in case users conflate it with --enable-reasoning / --reasoning-parser granite being supported for these models

alex-jw-brooks · 2025-03-06T08:40:07Z

Awesome thanks @gaocegege! It's been rebased 😄

gaocegege

Thanks for your contribution! 🎉 👍

gaocegege · 2025-03-06T12:39:20Z

@mgoin Please give it another review, thanks!

b8zhong · 2025-03-07T04:35:38Z

vllm/entrypoints/openai/reasoning_parsers/granite_reasoning_parser.py

Suggested change

response_content = current_text[current_chunk_end + 1:]

response_content = current_text[current_chunk_end + 1:]

parsed_content = True

parsed_content flag doesn't seem to be updated, so maybe helpful to set it?
Very minor suggestion, totally optional

Hey @b8zhong, thanks for the suggestion! For now, I'd prefer to keep it as is since it returns immediately after parsing the response content. I.e., once this condition is met, there is no need to keep going, so updating the flag won't do anything 🙂

mergify · 2025-03-07T04:36:19Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alex-jw-brooks.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gaocegege · 2025-03-10T01:58:05Z

@alex-jw-brooks Hi could you please resolve the conflicts

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

aarnphm

tiny comments,otherwise lgtm.

aarnphm · 2025-03-12T08:19:10Z

docs/source/features/reasoning_outputs.md

Can we rever this to reduce code change? thanks.

Sure, done!

aarnphm · 2025-03-12T08:21:11Z

vllm/model_executor/guided_decoding/reasoner/__init__.py

We can just fail for now.

Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

alex-jw-brooks · 2025-03-12T22:28:10Z

Thanks @aarnphm - it's ready for another look when you have a moment 🙂

aarnphm

great work!

alex-jw-brooks · 2025-03-14T15:45:52Z

Hi @mgoin, can you please take a look at this PR when you have a moment?

gaocegege · 2025-03-24T06:28:34Z

@mgoin @simon-mo Could you please take a look at this?

DarkLight1337

Since this is basically coming from the model vendor, I'll just stamp it. The code looks reasonable to me

DarkLight1337 · 2025-03-26T03:36:06Z

Can you fix the merge conflicts?

mergify · 2025-03-26T03:36:29Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @alex-jw-brooks.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

alex-jw-brooks · 2025-03-26T11:50:25Z

Thanks for the review @DarkLight1337! Sure should be resolved now 🤞

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

alex-jw-brooks requested review from DarkLight1337, robertgshaw2-redhat and simon-mo as code owners March 4, 2025 11:41

mergify bot added documentation Improvements or additions to documentation frontend labels Mar 4, 2025

alex-jw-brooks force-pushed the granite_reasoning branch from 0affa66 to f1d3367 Compare March 4, 2025 12:09

DarkLight1337 requested a review from mgoin March 4, 2025 15:28

gaocegege reviewed Mar 5, 2025

View reviewed changes

docs/source/features/reasoning_outputs.md Outdated Show resolved Hide resolved

alex-jw-brooks force-pushed the granite_reasoning branch from 4f4cbe1 to ecadd2d Compare March 6, 2025 08:27

mergify bot added the structured-output label Mar 6, 2025

alex-jw-brooks commented Mar 6, 2025

View reviewed changes

alex-jw-brooks requested a review from gaocegege March 6, 2025 08:40

gaocegege approved these changes Mar 6, 2025

View reviewed changes

b8zhong reviewed Mar 7, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 7, 2025

gaocegege mentioned this pull request Mar 10, 2025

[Refactor][Frontend] Keep all logic about reasoning into one class #14428

Merged

alex-jw-brooks added 7 commits March 10, 2025 16:31

Implement granite reasoning parser for non streaming

784c170

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Add granite reasoning parser to init pkg

4438a38

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Add preliminary test for non streaming granite rparser

3278ca7

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Implement granite reasoning parser streaming

07e58a8

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Add additional granite reasoning parser tests

6980ea8

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Add docstrings for granite reasoning parser

f6ff0bc

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

Add more streaming tests & cleanup

1f2f690

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025

aarnphm approved these changes Mar 12, 2025

View reviewed changes

Update docs/source/features/reasoning_outputs.md

ab83ec1

Co-authored-by: Joe Runde <joe@joerun.de> Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

alex-jw-brooks force-pushed the granite_reasoning branch from d64254f to ab83ec1 Compare March 12, 2025 21:52

Revert precommit bullet formatting

a73c789

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

alex-jw-brooks force-pushed the granite_reasoning branch from a9aeb12 to a73c789 Compare March 12, 2025 22:19

aarnphm approved these changes Mar 13, 2025

View reviewed changes

aarnphm mentioned this pull request Mar 13, 2025

[Frontend] Support tool calling and reasoning parser #14511

Merged

DarkLight1337 approved these changes Mar 26, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 26, 2025

Merge branch 'main' into granite_reasoning

0c3dfa8

mergify bot removed the needs-rebase label Mar 26, 2025

Fix precommit

da09717

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>

DarkLight1337 enabled auto-merge (squash) March 26, 2025 12:20

DarkLight1337 merged commit 1711b92 into vllm-project:main Mar 26, 2025
40 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[Model] Add Reasoning Parser for Granite Models (vllm-project#14202)

01c5b59

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Model] Add Reasoning Parser for Granite Models (vllm-project#14202)

de10568

Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com> Co-authored-by: Joe Runde <joe@joerun.de>

	response_content = current_text[current_chunk_end + 1:]
	response_content = current_text[current_chunk_end + 1:]
	parsed_content = True

Uh oh!

Conversation

alex-jw-brooks commented Mar 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Examples

Uh oh!

github-actions bot commented Mar 4, 2025

Uh oh!

mgoin commented Mar 4, 2025

Uh oh!

gaocegege left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gaocegege commented Mar 6, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Mar 6, 2025

Uh oh!

gaocegege left a comment

Choose a reason for hiding this comment

Uh oh!

gaocegege commented Mar 6, 2025

Uh oh!

b8zhong Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 7, 2025

Uh oh!

gaocegege commented Mar 10, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Mar 12, 2025

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented Mar 14, 2025

Uh oh!

gaocegege commented Mar 24, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 commented Mar 26, 2025

Uh oh!

mergify bot commented Mar 26, 2025

Uh oh!

alex-jw-brooks commented Mar 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alex-jw-brooks commented Mar 4, 2025 •

edited by github-actions bot

Loading

gaocegege left a comment •

edited

Loading

b8zhong Mar 7, 2025 •

edited

Loading