Skip to content

[Bugfix] Fix partial tool-call marker leaking as content during reasoning-to-tool transition#40965

Closed
rixav77 wants to merge 3 commits into
vllm-project:mainfrom
rixav77:fix/tool-call-marker-leak-40911
Closed

[Bugfix] Fix partial tool-call marker leaking as content during reasoning-to-tool transition#40965
rixav77 wants to merge 3 commits into
vllm-project:mainfrom
rixav77:fix/tool-call-marker-leak-40911

Conversation

@rixav77
Copy link
Copy Markdown

@rixav77 rixav77 commented Apr 27, 2026

Purpose

Fix a streaming bug where partial tool-call markers (e.g. "<|" from "<|tool_call>") leak into delta.content during the reasoning-to-tool-call transition in DelegatingParser.parse_delta().

This PR fixes #40911.

Why is this not duplicating an existing PR?

No open PRs reference issue #40911. Checked via:

gh pr list --repo vllm-project/vllm --state open --search "40911 in:body"

Root Cause

When the reasoning end token (<channel|>) and the tool-call start token (<|tool_call>) arrive in the same streaming delta, extract_reasoning_streaming() performs a text-based split on the end-token string. The text after the split can contain partial special-token fragments (e.g. "<|" — the beginning of "<|tool_call>"), which the old code passed directly as current_text to the tool parser. Since the tool parser didn't recognize this partial prefix as a tool-call start, it emitted it as plain content.

Fix

Instead of using delta_message.content (text-based split from the reasoning parser), reconstruct the handoff text from the token IDs returned by extract_content_ids() via tokenizer.decode(). Token IDs have exact boundaries, so partial-token text fragments cannot leak.

Changed file: vllm/parser/abstract_parser.py (lines 614-625)

Test Plan

Added regression tests in tests/parser/test_reasoning_tool_transition.py:

pytest tests/parser/test_reasoning_tool_transition.py -v

Three test cases:

  1. Partial marker leak — reasoning end + tool start in same delta; verifies "<|" does not appear as content
  2. Full marker passthrough — complete tool-call token is correctly forwarded to tool parser
  3. Reasoning end only — no spurious content when reasoning ends without immediate tool call

AI Assistance Disclosure

This PR was developed with AI assistance (Claude). The human submitter has reviewed all changed lines.

Copilot AI review requested due to automatic review settings April 27, 2026 05:52
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

PRs do not trigger a full CI run by default. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

Agent Guidelines

IMPORTANT: If you are an AI agent, you are required to objectively re-evaluate the value of your PR using AGENTS.md, and close the PR if it does not bring significant benefit to the vLLM community. Failure to do so may result in an immediate ban.

🚀

@mergify mergify Bot added the bug Something isn't working label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses issue #40911 by modifying the parse_delta method in vllm/parser/abstract_parser.py to reconstruct text from token IDs, preventing partial special-token fragments from leaking into the content during reasoning-to-tool-call transitions. It also introduces a suite of regression tests. Review feedback points out that delta_message might be overwritten when a tool call is detected in the same delta, leading to lost reasoning content, and suggests using a local subclass in tests to avoid polluting _WrappedParser class attributes.

Comment thread vllm/parser/abstract_parser.py Outdated
else:
current_text = ""
if delta_message:
delta_message.content = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The delta_message object modified here is subsequently overwritten at line 639 if a tool call is detected in the same delta. This causes any reasoning content extracted in this delta to be lost. Consider merging the tool call delta into the existing delta_message instead of replacing it to ensure all parts of the response (e.g., final reasoning thoughts) are preserved during the transition.

Comment on lines +35 to +37
_WrappedParser.reasoning_parser_cls = Gemma4ReasoningParser
_WrappedParser.tool_parser_cls = Gemma4ToolParser
return _WrappedParser(tokenizer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Modifying the class attributes of _WrappedParser directly can lead to test pollution, as these changes persist across tests and may affect other test suites running in the same process. It is safer to use a local subclass to isolate these configurations.

Suggested change
_WrappedParser.reasoning_parser_cls = Gemma4ReasoningParser
_WrappedParser.tool_parser_cls = Gemma4ToolParser
return _WrappedParser(tokenizer)
class TestParser(_WrappedParser):
reasoning_parser_cls = Gemma4ReasoningParser
tool_parser_cls = Gemma4ToolParser
return TestParser(tokenizer)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a streaming edge case in DelegatingParser.parse_delta() where partial tool-call marker text (e.g. "<|") can be emitted as normal content during the reasoning → tool-call transition when both boundaries occur in the same delta.

Changes:

  • Reconstructs the tool-parser handoff text by decoding the post-reasoning token IDs (extract_content_ids(...)) instead of reusing the reasoning parser’s text-split output.
  • Adds regression tests intended to cover the reasoning-end/tool-start transition behavior (Gemma4 tokenizer/tool parser).
  • Adds tests/parser/__init__.py to align the test directory with other test packages.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
vllm/parser/abstract_parser.py Decodes the post-reasoning token IDs for the handoff into tool parsing, preventing partial special-token text fragments from leaking into content.
tests/parser/test_reasoning_tool_transition.py Adds regression tests for the reasoning → tool-call transition (partial marker leak / tool marker handling / reasoning end only).
tests/parser/__init__.py Makes tests/parser a package (consistent with other tests/* subdirectories).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +89 to +96
# Phase 2: reasoning end + tool call start in same delta
combined_ids = [channel_end_id, tool_call_start_id]
combined_text = tokenizer.decode(combined_ids, skip_special_tokens=False)
msg = parser.parse_delta(
delta_text=combined_text,
delta_token_ids=combined_ids,
request=request,
)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_partial_marker_does_not_leak doesn’t actually simulate the reported failure mode: combined_text is built via tokenizer.decode([channel_end_id, tool_call_start_id]), which yields the full <|tool_call> string (not a partial prefix like <|). The leak in #40911 happens when delta_text contains only a prefix of the tool marker even though the delta_token_ids include the full tool-call token (due to incremental detokenization / prefix-diff behavior). To make this a true regression test, construct combined_text so it ends with a strict prefix of the decoded tool token (e.g. tool_token_text[:2]) while still passing the full combined_ids.

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +155
def test_full_marker_passes_to_tool_parser(self, tokenizer, vocab):
"""
When reasoning ends and a complete tool-call token follows,
the tool parser should receive it (no leak, no suppression).
"""
parser = _make_parser(tokenizer)
request = _request()

channel_start_id = vocab["<|channel>"]
channel_end_id = vocab["<channel|>"]
tool_call_start_id = vocab.get("<|tool_call>")
if tool_call_start_id is None:
pytest.skip("<|tool_call> not in vocab")

# Send reasoning start
delta_text = tokenizer.decode([channel_start_id], skip_special_tokens=False)
parser.parse_delta(
delta_text=delta_text,
delta_token_ids=[channel_start_id],
request=request,
)

# Send reasoning content
content_ids = _encode(tokenizer, "reasoning")
for tid in content_ids:
delta_text = tokenizer.decode([tid], skip_special_tokens=False)
parser.parse_delta(
delta_text=delta_text,
delta_token_ids=[tid],
request=request,
)

# Send reasoning end
delta_text = tokenizer.decode([channel_end_id], skip_special_tokens=False)
parser.parse_delta(
delta_text=delta_text,
delta_token_ids=[channel_end_id],
request=request,
)

# Send tool call start — now in tool phase
delta_text = tokenizer.decode([tool_call_start_id], skip_special_tokens=False)
msg = parser.parse_delta(
delta_text=delta_text,
delta_token_ids=[tool_call_start_id],
request=request,
)

# Should not leak tool marker as content
if msg is not None and msg.content is not None:
assert "<|tool_call>" not in msg.content

Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_full_marker_passes_to_tool_parser doesn’t currently verify the stated behavior (“tool parser should receive it”). With only a standalone <|tool_call> token, Gemma4ToolParser.extract_tool_calls_streaming typically returns None (it waits for more text), so this test can pass without exercising tool-call parsing at all. Consider either (a) renaming the test to reflect that it only checks non-leakage, or (b) extending it to stream a minimal complete call (start + call:...{...} + end) and assert that a DeltaMessage.tool_calls is eventually produced and that no tool marker appears in DeltaMessage.content.

Copilot uses AI. Check for mistakes.
…ing reasoning-to-tool transition

When transitioning from reasoning to tool-call parsing in
DelegatingParser.parse_delta(), the content text was taken from the
reasoning parser's text-based split (delta_message.content). This could
contain partial special-token fragments (e.g. "<|" from "<|tool_call>")
when the reasoning end token and tool-call start token arrived in the
same streaming delta.

Fix: reconstruct the handoff text from extract_content_ids() token IDs
via tokenizer.decode() instead of using the text-split content. Token
IDs have exact boundaries, so partial-token text cannot leak.

Closes vllm-project#40911

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: rixav77 <vermaaryaman230@gmail.com>
@rixav77 rixav77 force-pushed the fix/tool-call-marker-leak-40911 branch from a99a753 to 96fec33 Compare April 27, 2026 06:54
@neon12345
Copy link
Copy Markdown

neon12345 commented Apr 27, 2026

Do you check if the data fed into extract_tool_calls_streaming(...) is consistent? I see "<channel|>" twice in delta_text with only one channel end token in delta_token_ids. Could be on my side.

Edit: There is a strange drift between delta_token_ids and delta_text from parse_delta(...) that makes things complicated. For example I see this sequence:

from parse_delta(...): [101] "nd simply."
before delta_message = self.extract_tool_calls_streaming(...):  [101] "<channel|>"
from parse_delta(...)  [3694] "<cha"
before delta_message = self.extract_tool_calls_streaming(...):  [3694] "<cha"
from parse_delta(...)  [4461] "nnel|>Tes"

101 is <channel|> and this is a problem in my case, because I changed my code to return the split tokens from self.extract_content_ids (there is another patch where <|tool_call> is handled as split token).

…rtial

   text, rename test

Signed-off-by: rixav77 <vermaaryaman230@gmail.com>
@neon12345
Copy link
Copy Markdown

I want to be more clear about the problem I see:

There may be text that was not yet delivered from parse_delta(...delta_text...) but is already delivered as token id from parse_delta(...delta_token_ids...). The tokens are then potentially returned from self.extract_content_ids(delta_token_ids) and added to current_text with this patch. But this text can also later arrive from parse_delta(...delta_text...).

Perhaps someone who knows the code better can comment on the drift between delta_token_ids and delta_text and if this is a problem here.

Comment thread vllm/parser/abstract_parser.py Outdated
Per maintainer feedback, move the token-ID-based content reconstruction
from the shared DelegatingParser into the Gemma4-specific reasoning
parser to avoid propagating model-specific changes to other models.

abstract_parser.py is fully reverted to its original code.

Signed-off-by: Aryaman Verma <vermaaryaman230@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: rixav77 <vermaaryaman230@gmail.com>
@chaunceyjiang
Copy link
Copy Markdown
Collaborator

/cc @sfeng33 PTAL.

@rixav77
Copy link
Copy Markdown
Author

rixav77 commented May 5, 2026

/cc @sfeng33 PTAL.

can you confirm once?

@sfeng33
Copy link
Copy Markdown
Collaborator

sfeng33 commented May 28, 2026

Closing as fixed in #42691

@sfeng33 sfeng33 closed this May 28, 2026
@rixav77
Copy link
Copy Markdown
Author

rixav77 commented May 28, 2026

Closing as fixed in #42691

Closing as fixed in #42691

thanks, glad it got resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Tool call leaks into content

6 participants