Skip to content

Conversation

@AyRickk
Copy link

@AyRickk AyRickk commented Nov 26, 2025

Related to #5992


Summary by cubic

Adds configurable thinking output format support for vLLM by parsing custom tags in streamed responses, while preserving vLLM’s native reasoning_content during streaming. Models using tags like … or … now emit proper “thinking” and “assistant” chunks.

  • New Features
    • Add VllmOptions: thinkingOpenTag and thinkingCloseTag to configure custom thinking tags (both required).
    • Implement ThinkingTagExtractor to split streamed content into thinking vs. regular content, with partial-tag handling and flush on stream end.
    • Override vLLM streaming to emit “thinking” chunks first and pass through assistant chunks; use direct SSE to preserve reasoning_content. Behavior is unchanged when options are not set.
    • Add unit and integration tests; update vLLM provider and reference docs with configuration examples.

Written for commit d99b93c. Summary will update automatically on new commits.

@AyRickk AyRickk requested a review from a team as a code owner November 26, 2025 19:51
@AyRickk AyRickk requested review from Patrick-Erichsen and removed request for a team November 26, 2025 19:51
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 26, 2025
@github-actions
Copy link

github-actions bot commented Nov 26, 2025


Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


1 out of 2 committers have signed the CLA.
✅ (AyRickk)[https://github.com/AyRickk]
@continue Agent
Continue Agent seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

@AyRickk
Copy link
Author

AyRickk commented Nov 26, 2025

I have read the CLA Document and I hereby sign the CLA

@AyRickk
Copy link
Author

AyRickk commented Nov 26, 2025

recheck

- Add new section in vLLM provider docs explaining thinking output format options
- Document thinkingOpenTag and thinkingCloseTag properties in YAML reference
- Document thinkingOpenTag and thinkingCloseTag properties in JSON reference
- Include configuration examples for both YAML and JSON formats

Co-authored-by: nate <[email protected]>

Generated with [Continue](https://continue.dev)

Co-Authored-By: Continue <[email protected]>
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 26, 2025
@continue
Copy link
Contributor

continue bot commented Nov 26, 2025

Added documentation for the new thinking output format configuration feature:

Changes made:

  1. vLLM Provider Documentation ()

    • Added new "Thinking output format" section explaining the two supported formats (standard reasoning_content field and custom tags)
    • Included configuration examples for both YAML and JSON formats showing how to use thinkingOpenTag and thinkingCloseTag
    • Added link to vLLM's reasoning outputs documentation
  2. YAML Reference ()

    • Documented thinkingOpenTag and thinkingCloseTag properties in the models section
    • Added example configuration showing vLLM with custom thinking tags
    • Included cross-reference to vLLM provider docs
  3. JSON Reference ()

    • Documented thinkingOpenTag and thinkingCloseTag properties for config.json
    • Added cross-reference to vLLM provider docs for examples

The documentation maintains the existing level of detail and focuses on practical usage of the new feature.

…eaming

Add comprehensive integration tests to verify the ThinkingTagExtractor works correctly when integrated with BaseLLM's streamChat method. Tests cover:
- Single and multiple chunk scenarios
- Partial tag handling at chunk boundaries
- Flush behavior at stream end
- Multiple thinking blocks
- Custom tag formats
- Interaction with native thinking role chunks

Co-authored-by: nate <[email protected]>
@continue
Copy link
Contributor

continue bot commented Nov 26, 2025

Added integration tests in core/llm/thinkingTagIntegration.vitest.ts to verify ThinkingTagExtractor works correctly with BaseLLM's streaming functionality.

The tests cover:

  • Extraction of thinking content from single and multiple chunks
  • Handling of partial tags at chunk boundaries
  • Flush behavior when stream ends
  • Multiple thinking blocks in a single stream
  • Custom tag format support (e.g., <reasoning>, [THINK])
  • Interaction with native thinking role chunks
  • Correct behavior when thinking tags are not configured

These integration tests complement the existing unit tests for ThinkingTagExtractor by verifying the end-to-end behavior when the extractor is integrated with the actual streaming pipeline.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1412">
Custom provider streaming now suppresses assistant tool-call messages with empty `content`, so downstream tool executions are lost.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1413">
Tool-call/usage-only assistant chunks are dropped because they are now yielded only when `content` is non-empty, preventing tools from ever executing.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1412">
Tool/function-call chunks are dropped because new guard filters out assistant messages whose content is an empty string, so streaming tool calls no longer reach the caller.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

Copy link
Collaborator

@RomneyDa RomneyDa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AyRickk see comments, I think we could merge if

  • functionality is siloed to the VLLM class
  • use a single thinkingTagName not duplicate tag configs (unless it would be normal for them to be different in VLLM, I'm not sure)
  • Update config yaml schema as in comment, and in config-yaml package (or it will not work for yaml configs)
  • On first glance I wonder if the tag parsing logic could be significantly more concise, maybe worth another look/approach?
  • Follow up on/clean up test/docs commits. This is a new feature and we are tweaking when it runs (i.e. we might not run for community PRs) but I think in this case it made sense

Copilot AI added a commit to AyRickk/continue that referenced this pull request Nov 27, 2025
Per reviewer feedback on PR continuedev#8901:
- Remove ThinkingTagExtractor class from core/llm/index.ts (keep in separate file)
- Remove thinkingOpenTag/thinkingCloseTag from BaseLLM class
- Remove thinking extractor logic from processChatChunk and streamChat in BaseLLM
- Remove thinkingOpenTag/thinkingCloseTag from LLMOptions in core/index.d.ts
- Remove thinkingTagIntegration.vitest.ts (BaseLLM integration test)

The feature is now vLLM-specific only, handled by the Vllm class.

Co-authored-by: AyRickk <[email protected]>
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Nov 27, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Nov 27, 2025
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (reviewed changes from recent commits).

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/thinkingTagIntegration.vitest.ts">

<violation number="1" location="core/llm/thinkingTagIntegration.vitest.ts:30">
Tests duplicate the production `_streamChat` logic inside `MockVllm`, so none of these cases exercise the real vLLM streaming implementation—regressions in `Vllm._streamChat` would still pass.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@AyRickk
Copy link
Author

AyRickk commented Nov 27, 2025

@RomneyDa is it ok ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants