feat: add configurable thinking output format support for vLLM #8901

AyRickk · 2025-11-26T19:51:23Z

Related to #5992

Summary by cubic

Adds configurable thinking output format support for vLLM by parsing custom tags in streamed responses, while preserving vLLM’s native reasoning_content during streaming. Models using tags like … or … now emit proper “thinking” and “assistant” chunks.

New Features
- Add VllmOptions: thinkingOpenTag and thinkingCloseTag to configure custom thinking tags (both required).
- Implement ThinkingTagExtractor to split streamed content into thinking vs. regular content, with partial-tag handling and flush on stream end.
- Override vLLM streaming to emit “thinking” chunks first and pass through assistant chunks; use direct SSE to preserve reasoning_content. Behavior is unchanged when options are not set.
- Add unit and integration tests; update vLLM provider and reference docs with configuration examples.

^{Written for commit d99b93c. Summary will update automatically on new commits.}

github-actions · 2025-11-26T19:51:32Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

1 out of 2 committers have signed the CLA.
✅ (AyRickk)[https://github.com/AyRickk]
❌ @continue Agent
Continue Agent seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

AyRickk · 2025-11-26T19:52:06Z

I have read the CLA Document and I hereby sign the CLA

AyRickk · 2025-11-26T19:52:40Z

recheck

- Add new section in vLLM provider docs explaining thinking output format options - Document thinkingOpenTag and thinkingCloseTag properties in YAML reference - Document thinkingOpenTag and thinkingCloseTag properties in JSON reference - Include configuration examples for both YAML and JSON formats Co-authored-by: nate <[email protected]> Generated with [Continue](https://continue.dev) Co-Authored-By: Continue <[email protected]>

continue · 2025-11-26T19:55:46Z

Added documentation for the new thinking output format configuration feature:

Changes made:

vLLM Provider Documentation ()
- Added new "Thinking output format" section explaining the two supported formats (standard reasoning_content field and custom tags)
- Included configuration examples for both YAML and JSON formats showing how to use thinkingOpenTag and thinkingCloseTag
- Added link to vLLM's reasoning outputs documentation
YAML Reference ()
- Documented thinkingOpenTag and thinkingCloseTag properties in the models section
- Added example configuration showing vLLM with custom thinking tags
- Included cross-reference to vLLM provider docs
JSON Reference ()
- Documented thinkingOpenTag and thinkingCloseTag properties for config.json
- Added cross-reference to vLLM provider docs for examples

The documentation maintains the existing level of detail and focuses on practical usage of the new feature.

…eaming Add comprehensive integration tests to verify the ThinkingTagExtractor works correctly when integrated with BaseLLM's streamChat method. Tests cover: - Single and multiple chunk scenarios - Partial tag handling at chunk boundaries - Flush behavior at stream end - Multiple thinking blocks - Custom tag formats - Interaction with native thinking role chunks Co-authored-by: nate <[email protected]>

continue · 2025-11-26T20:00:39Z

Added integration tests in core/llm/thinkingTagIntegration.vitest.ts to verify ThinkingTagExtractor works correctly with BaseLLM's streaming functionality.

The tests cover:

Extraction of thinking content from single and multiple chunks
Handling of partial tags at chunk boundaries
Flush behavior when stream ends
Multiple thinking blocks in a single stream
Custom tag format support (e.g., <reasoning>, [THINK])
Interaction with native thinking role chunks
Correct behavior when thinking tags are not configured

These integration tests complement the existing unit tests for ThinkingTagExtractor by verifying the end-to-end behavior when the extractor is integrated with the actual streaming pipeline.

cubic-dev-ai

1 issue found across 4 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1412">
Custom provider streaming now suppresses assistant tool-call messages with empty `content`, so downstream tool executions are lost.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

core/llm/index.ts

cubic-dev-ai

1 issue found across 8 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1413">
Tool-call/usage-only assistant chunks are dropped because they are now yielded only when `content` is non-empty, preventing tools from ever executing.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

core/llm/index.ts

cubic-dev-ai

1 issue found across 7 files

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/index.ts">

<violation number="1" location="core/llm/index.ts:1412">
Tool/function-call chunks are dropped because new guard filters out assistant messages whose content is an empty string, so streaming tool calls no longer reach the caller.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

core/llm/index.ts

…utput-format-support-for-vLLM

RomneyDa

@AyRickk see comments, I think we could merge if

functionality is siloed to the VLLM class
use a single thinkingTagName not duplicate tag configs (unless it would be normal for them to be different in VLLM, I'm not sure)
Update config yaml schema as in comment, and in config-yaml package (or it will not work for yaml configs)
On first glance I wonder if the tag parsing logic could be significantly more concise, maybe worth another look/approach?
Follow up on/clean up test/docs commits. This is a new feature and we are tweaking when it runs (i.e. we might not run for community PRs) but I think in this case it made sense

core/index.d.ts

core/llm/index.ts

docs/customize/model-providers/more/vllm.mdx

core/llm/index.ts

…utput-format-support-for-vLLM

Per reviewer feedback on PR continuedev#8901: - Remove ThinkingTagExtractor class from core/llm/index.ts (keep in separate file) - Remove thinkingOpenTag/thinkingCloseTag from BaseLLM class - Remove thinking extractor logic from processChatChunk and streamChat in BaseLLM - Remove thinkingOpenTag/thinkingCloseTag from LLMOptions in core/index.d.ts - Remove thinkingTagIntegration.vitest.ts (BaseLLM integration test) The feature is now vLLM-specific only, handled by the Vllm class. Co-authored-by: AyRickk <[email protected]>

cubic-dev-ai

1 issue found across 2 files (reviewed changes from recent commits).

Prompt for AI agents (all 1 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="core/llm/thinkingTagIntegration.vitest.ts">

<violation number="1" location="core/llm/thinkingTagIntegration.vitest.ts:30">
Tests duplicate the production `_streamChat` logic inside `MockVllm`, so none of these cases exercise the real vLLM streaming implementation—regressions in `Vllm._streamChat` would still pass.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

core/llm/thinkingTagIntegration.vitest.ts

AyRickk · 2025-11-27T22:20:00Z

@RomneyDa is it ok ?

feat: add configurable thinking output format support for vLLM

e9b4d54

AyRickk requested a review from a team as a code owner November 26, 2025 19:51

AyRickk requested review from Patrick-Erichsen and removed request for a team November 26, 2025 19:51

github-project-automation bot added this to Issues and PRs Nov 26, 2025

github-project-automation bot moved this to Todo in Issues and PRs Nov 26, 2025

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 26, 2025

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Nov 26, 2025

cubic-dev-ai bot reviewed Nov 26, 2025

View reviewed changes

core/llm/index.ts Outdated Show resolved Hide resolved

cubic-dev-ai bot reviewed Nov 26, 2025

View reviewed changes

core/llm/index.ts Outdated Show resolved Hide resolved

cubic-dev-ai bot reviewed Nov 26, 2025

View reviewed changes

core/llm/index.ts Outdated Show resolved Hide resolved

AyRickk and others added 3 commits November 26, 2025 21:22

Merge branch 'continuedev:main' into feat/add-configurable-thinking-o…

50f90da

…utput-format-support-for-vLLM

fix: yield assistant chunks with tool calls even when content is empty

90af3d7

refactor: prettier files

c866c1c

AyRickk mentioned this pull request Nov 26, 2025

Support configurable thinking output formats #5992

Open

2 tasks

RomneyDa requested changes Nov 26, 2025

View reviewed changes

core/index.d.ts Outdated Show resolved Hide resolved

core/llm/index.ts Outdated Show resolved Hide resolved

docs/customize/model-providers/more/vllm.mdx Show resolved Hide resolved

core/llm/index.ts Outdated Show resolved Hide resolved

core/llm/index.ts Show resolved Hide resolved

github-project-automation bot moved this from Todo to In Progress in Issues and PRs Nov 26, 2025

AyRickk mentioned this pull request Nov 27, 2025

Implement configurable thinking output formats for vLLM (Issue #5992) with reviewer feedback from PR #8901 AyRickk/continue#2

Draft

24 tasks

AyRickk and others added 2 commits November 27, 2025 19:29

fix: vllm reasoning handling

040f21c

Merge branch 'continuedev:main' into feat/add-configurable-thinking-o…

55b5913

…utput-format-support-for-vLLM

refactor: configurable thinking

92b3dfa

Copilot AI mentioned this pull request Nov 27, 2025

Fix TypeScript errors in thinkingTagExtractor tests AyRickk/continue#3

Draft

10 tasks

test: add integration tests

832de22

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Nov 27, 2025

test: fix tests

2bb326b

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Nov 27, 2025

cubic-dev-ai bot reviewed Nov 27, 2025

View reviewed changes

core/llm/thinkingTagIntegration.vitest.ts Show resolved Hide resolved

test: fix integration test to handle new changes

d99b93c

feat: add configurable thinking output format support for vLLM #8901

Are you sure you want to change the base?

feat: add configurable thinking output format support for vLLM #8901

Conversation

AyRickk commented Nov 26, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

github-actions bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AyRickk commented Nov 26, 2025

Uh oh!

AyRickk commented Nov 26, 2025

Uh oh!

continue bot commented Nov 26, 2025

Uh oh!

continue bot commented Nov 26, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RomneyDa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AyRickk commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AyRickk commented Nov 26, 2025 •

edited by cubic-dev-ai bot

Loading

github-actions bot commented Nov 26, 2025 •

edited

Loading