Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Sep 29, 2025

Overview:

Earlier reasoning parsing was happening as a part of DeltaCreation Process. Ideally, it is a separate post/pre processing step that should happen either after delta creation or before it. Keeping the design of the code impact, this PR moves the reasoning parsing to a separate transformation step similar to tool call parsing step.

The transformation step happens after delta creation and applies standard stream folding logic.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Optional reasoning-content extraction in streaming responses, exposing separated reasoning and normal text when enabled (supports Basic, Mistral, Kimi formats).
  • Bug Fixes

    • More reliable end-of-stream signaling in chat streaming responses.
    • Preserves whitespace in reasoning content for accurate display.
  • Refactor

    • Simplified streaming path by removing embedded reasoning parsing from the generator; end-user behavior unchanged unless reasoning parsing is enabled.
  • Tests

    • Added comprehensive tests covering reasoning-content parsing across multiple parser styles.

@ayushag-nv ayushag-nv requested a review from GuanLuo September 29, 2025 22:43
@ayushag-nv ayushag-nv requested a review from a team as a code owner September 29, 2025 22:43
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv ayushag-nv marked this pull request as draft September 29, 2025 22:43
@github-actions github-actions bot added the feat label Sep 29, 2025
@ayushag-nv ayushag-nv changed the title feat: reasoning parser transformation [WIP] feat: reasoning parser transformation Sep 29, 2025
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
@ayushag-nv ayushag-nv changed the title [WIP] feat: reasoning parser transformation feat: reasoning parser transformation Sep 30, 2025
@ayushag-nv ayushag-nv marked this pull request as ready for review September 30, 2025 17:41
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 30, 2025

Walkthrough

Implements a streaming API signature change for choice creation, relocates reasoning-content parsing from the delta generator to the preprocessor with a new streaming wrapper, expands preprocessor public fields and methods, adjusts tests accordingly, and alters a parser to preserve whitespace in reasoning text.

Changes

Cohort / File(s) Summary of Changes
Choice API update (streaming)
lib/llm/src/engines.rs, lib/llm/tests/http-service.rs, lib/llm/tests/http_metrics.rs
Updated calls to create_choice to a four-argument form; finish reason repositioned; removed trailing None(s). Tests aligned with new signature.
Preprocessor reasoning parsing integration
lib/llm/src/preprocessor.rs, lib/llm/tests/test_reasoning_parser.rs
Added on-demand reasoning parsing via parse_reasoning_content_from_stream, ReasoningState, and ReasoningParser integration; expanded OpenAIPreprocessor public fields; added comprehensive tests for multiple parser styles and passthrough behavior.
Delta generator simplification (remove reasoning handling)
lib/llm/src/protocols/openai/chat_completions/delta.rs
Removed reasoning_parser state and related methods; create_choice no longer accepts reasoning_content; streaming deltas no longer include reasoning_content; post-processing path simplified.
Parser behavior adjustment (whitespace)
lib/parsers/src/reasoning/base_parser.rs
BasicReasoningParser now preserves leading/trailing whitespace in reasoning_text (removed trim).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Preprocessor
  participant Stream as Model Stream
  participant DeltaGen as DeltaGenerator
  participant API as OpenAI-Compatible API

  Note over Client,API: New flow (reasoning parsed in preprocessor)

  Client->>Preprocessor: Request chat completion (stream)
  Preprocessor->>Preprocessor: should_parse_reasoning?
  alt Reasoning parser configured
    Preprocessor->>Preprocessor: parse_reasoning_content_from_stream()
    Preprocessor->>Stream: Initialize wrapped stream with ReasoningParser
    loop For each chunk
      Stream-->>Preprocessor: Chunk { content, reasoning_content? }
      Preprocessor->>Preprocessor: Update delta.content and delta.reasoning_content
      Preprocessor-->>DeltaGen: Annotated chunk (content/reasoning split)
      DeltaGen->>API: create_choice(index, text, finish_reason, logprobs)
    end
  else No parser configured
    Preprocessor-->>DeltaGen: Pass-through stream chunks
    loop For each chunk
      DeltaGen->>API: create_choice(index, text, finish_reason, logprobs)
    end
  end
  API-->>Client: Streamed responses
Loading
sequenceDiagram
  autonumber
  participant Client
  participant DeltaGen as DeltaGenerator
  participant API as OpenAI-Compatible API

  Note over Client,API: Previous flow (reasoning parsed in DeltaGenerator)

  Client->>DeltaGen: Request chat completion (stream)
  loop For each chunk
    DeltaGen->>DeltaGen: create_reasoning_content(text, tokens)
    DeltaGen->>API: create_choice(index, normal_text, finish_reason, logprobs)
    Note right of API: reasoning_content previously embedded/handled here
  end
  API-->>Client: Streamed responses
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I thump my paw: the streams now flow,
With thoughts up front, not far below.
The delta’s light, its load made lean,
While preprocessors parse unseen.
Whitespace kept, no trims tonight—
Four args hop in, and all’s alright. 🥕🐇

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description follows the required template by including Overview and Related Issues sections, but the Details and Where should the reviewer start? sections remain unfilled with placeholder text instead of concrete change descriptions and file references, leaving critical information incomplete. Please update the Details section to describe the specific code changes introduced by this PR and populate the Where should the reviewer start? section with the paths or modules that reviewers should examine first.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat: reasoning parser transformation” succinctly conveys that this pull request introduces a feature-level change centering on refactoring or transforming the reasoning parser, which aligns with the main modifications observed across multiple modules. It follows the conventional commit style with a clear prefix and accurately reflects the primary intent without extraneous details.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97a9854 and a1cc09a.

📒 Files selected for processing (7)
  • lib/llm/src/engines.rs (1 hunks)
  • lib/llm/src/preprocessor.rs (4 hunks)
  • lib/llm/src/protocols/openai/chat_completions/delta.rs (2 hunks)
  • lib/llm/tests/http-service.rs (1 hunks)
  • lib/llm/tests/http_metrics.rs (1 hunks)
  • lib/llm/tests/test_reasoning_parser.rs (1 hunks)
  • lib/parsers/src/reasoning/base_parser.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

  • lib/llm/src/engines.rs
  • lib/llm/tests/http-service.rs
  • lib/llm/src/protocols/openai/chat_completions/delta.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

  • lib/llm/src/protocols/openai/chat_completions/delta.rs
🧬 Code graph analysis (5)
lib/llm/tests/http_metrics.rs (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • create_choice (197-239)
lib/llm/src/preprocessor.rs (2)
lib/llm/src/protocols/openai.rs (1)
  • new (242-247)
lib/parsers/src/reasoning/mod.rs (1)
  • get_reasoning_parser_from_name (178-194)
lib/llm/src/engines.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • create_choice (197-239)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • create_choice (142-170)
lib/llm/tests/http-service.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • create_choice (197-239)
lib/llm/src/protocols/openai/completions/delta.rs (1)
  • create_choice (142-170)
lib/llm/tests/test_reasoning_parser.rs (2)
lib/llm/src/preprocessor.rs (2)
  • parse_reasoning_content_from_stream (682-727)
  • new (114-120)
lib/parsers/src/reasoning/base_parser.rs (1)
  • new (17-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

197-239: Updated four-arg create_choice looks good

Signature and call-site updates line up with the streamlined streaming API, and the delta payload still mirrors the OpenAI contract.

Signed-off-by: ayushag <[email protected]>
Copy link
Contributor

@elyasmnvidian elyasmnvidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a very simple test case with reasoning + tool calling for nemotron and gpt oss. We can try to even rearrange it with (Reasoning, text, tool calling) and do all permutations of order to see if it fails in any case.

@ayushag-nv ayushag-nv force-pushed the ayushag/reasoning-parser-refactor branch from 4193cc0 to ddd0bdf Compare October 1, 2025 04:35
…arser-refactor' into ayushag/reasoning-parser-refactor
@ayushag-nv
Copy link
Contributor Author

@zhongdaor-nv

DEBUG: Reasoning parsed chunk: Annotated {
    data: Some(
        CreateChatCompletionStreamResponse {
            id: "test-id",
            choices: [
                ChatChoiceStream {
                    index: 0,
                    delta: ChatCompletionStreamResponseDelta {
                        content: None,
                        function_call: None,
                        tool_calls: None,
                        role: Some(
                            Assistant,
                        ),
                        refusal: None,
                        reasoning_content: Some(
                            "Let me analyze this request. I need to get the current weather for San Francisco.",
                        ),
                    },
                    finish_reason: None,
                    logprobs: None,
                },
            ],
            created: 1234567890,
            model: "test-model",
            service_tier: None,
            system_fingerprint: Some(
                "test-fingerprint",
            ),
            object: "chat.completion.chunk",
            usage: None,
        },
    ),
    id: Some(
        "test-id",
    ),
    event: None,
    comment: None,
}
DEBUG: Reasoning parsed chunk: Annotated {
    data: Some(
        CreateChatCompletionStreamResponse {
            id: "test-id",
            choices: [
                ChatChoiceStream {
                    index: 0,
                    delta: ChatCompletionStreamResponseDelta {
                        content: None,
                        function_call: None,
                        tool_calls: None,
                        role: Some(
                            Assistant,
                        ),
                        refusal: None,
                        reasoning_content: None,
                    },
                    finish_reason: None,
                    logprobs: None,
                },
            ],
            created: 1234567890,
            model: "test-model",
            service_tier: None,
            system_fingerprint: Some(
                "test-fingerprint",
            ),
            object: "chat.completion.chunk",
            usage: None,
        },
    ),
    id: Some(
        "test-id",
    ),
    event: None,
    comment: None,
}
DEBUG: Reasoning parsed chunk: Annotated {
    data: Some(
        CreateChatCompletionStreamResponse {
            id: "test-id",
            choices: [
                ChatChoiceStream {
                    index: 0,
                    delta: ChatCompletionStreamResponseDelta {
                        content: Some(
                            "I'll check the weather for you.",
                        ),
                        function_call: None,
                        tool_calls: None,
                        role: Some(
                            Assistant,
                        ),
                        refusal: None,
                        reasoning_content: None,
                    },
                    finish_reason: None,
                    logprobs: None,
                },
            ],
            created: 1234567890,
            model: "test-model",
            service_tier: None,
            system_fingerprint: Some(
                "test-fingerprint",
            ),
            object: "chat.completion.chunk",
            usage: None,
        },
    ),
    id: Some(
        "test-id",
    ),
    event: None,
    comment: None,
}

Current reasoning parser implementation does not work well when there is tool calling output in the input stream

@ayushag-nv
Copy link
Contributor Author

Can we add a very simple test case with reasoning + tool calling for nemotron and gpt oss. We can try to even rearrange it with (Reasoning, text, tool calling) and do all permutations of order to see if it fails in any case.

@elyasmnvidian Added nemotron and gpt-oss test. Gpt-oss test fails , I think there is some fundamental issue with the parsing. I will fix in follow up

@elyasmnvidian elyasmnvidian self-requested a review October 1, 2025 16:18
Copy link
Contributor

@elyasmnvidian elyasmnvidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, approved

@ayushag-nv
Copy link
Contributor Author

/ok to test cfd7f8a

@ayushag-nv ayushag-nv requested a review from rmccorm4 October 1, 2025 17:03
@rmccorm4 rmccorm4 requested a review from zhongdaor-nv October 1, 2025 17:05
Copy link
Contributor

@zhongdaor-nv zhongdaor-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ayush!

@rmccorm4 rmccorm4 merged commit 2a2bf58 into main Oct 1, 2025
23 checks passed
@rmccorm4 rmccorm4 deleted the ayushag/reasoning-parser-refactor branch October 1, 2025 17:43
nv-tusharma pushed a commit that referenced this pull request Oct 20, 2025
@rmccorm4 rmccorm4 added the frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` label Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feat frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants