[Bugfix] Fix step3p5 reasoning with interleaved thinking#34211
[Bugfix] Fix step3p5 reasoning with interleaved thinking#34211chaunceyjiang merged 2 commits intovllm-project:mainfrom
Conversation
Signed-off-by: mariohong <mariohong128@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in the step3p5 reasoning parser where it failed to correctly detect the end of a reasoning block in multi-turn conversations. The change introduces more robust logic to identify the last <think> or </think> token to determine the reasoning state, which correctly handles prompts containing artifacts from previous turns. A comprehensive set of tests has been added to cover various scenarios, including the multi-turn case that triggered the bug. My review found a gap in the new tests where the streaming logic for is_reasoning_end_streaming is not properly exercised. I've provided a suggestion to improve the test coverage.
| if streaming: | ||
| is_reasoning_end = parser.is_reasoning_end(output_ids) | ||
| assert is_reasoning_end == param_dict["is_reasoning_end"] |
There was a problem hiding this comment.
The test for is_reasoning_end in streaming mode is not correctly testing the streaming behavior. It calls the non-streaming parser.is_reasoning_end(output_ids) method, which evaluates the entire output at once. The stateful, incremental logic of is_reasoning_end_streaming which operates on delta_ids is not being exercised. This is a significant testing gap for a critical part of the bug fix.
To properly test the streaming logic, you should simulate the streaming process and call is_reasoning_end_streaming at each step. Here is an example of how you could structure such a test:
def run_is_reasoning_end_streaming(parser, output_ids):
is_end = False
current_ids = []
for token_id in output_ids:
delta_ids = [token_id]
current_ids.append(token_id)
is_end = parser.is_reasoning_end_streaming(current_ids, delta_ids)
return is_end
# In test_reasoning, under the `if streaming:` block:
streaming_parser = ReasoningParserManager.get_reasoning_parser(parser_name)(step3p5_tokenizer)
is_reasoning_end = run_is_reasoning_end_streaming(streaming_parser, output_ids)
assert is_reasoning_end == param_dict["is_reasoning_end"]This would ensure the state transitions of the parser are correctly handled in streaming mode.
|
Much appreciated for this PR, hope it gets merged soon! |
|
Posted about this fix in r/BlackwellPerformance. Someone said this patch still does not work: Sadly it is still broken with that patch and tool calls do not work. You can easily try the patch yourself: assuming you use a venv, the patched file lives in: |
Can you provide more details such as input request? I tested locally and it works. Note that not support |
I'm the Reddit poster and yes, you nailed it: I'm using Interesting, I didn't know the parsers were so tightly coupled to the API type (openai vs anthropic). That's... unfortunate. |
Beacuse |
I don't understand. How are you proxying claude cli into vLLM such that Steps can even make tool calls? I've tried with LiteLLM, but vLLM barfs with |
I believe cc-switch works as a proxy on the machine that you point Claude code to which handles these streaming calls and such how CC expects it for tool calls and such if I'm not mistaken, issue with how vLLM parses this? Might give it a go tomorrow if I have time with this reasoning parser fix + that other PR fix to test it out with vLLM/LiteLLM and see how it does |
|
I'll try to test this later today (GMT+1) with the regular openai-endpoints (opencode). Edit: Tried it and can confirm the reasoning parser works now. Tested in opencode with the openai endpoint. |
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…t#34211) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Purpose
When there are multiple rounds of conversation, the prompt contains

</think>from the previous round, and the step3p5 reasoning parser failed to correctly determine the end of reasoning.Test Plan
Add tests:
tests/reasoning/test_step3p5_reasoning_parser.pyTest Result
All passed.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.