-
Notifications
You must be signed in to change notification settings - Fork 649
fix: deepseek tool parsing fixed #3557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ccb040b
to
cb06ca4
Compare
cb06ca4
to
31c8143
Compare
31c8143
to
59cb41d
Compare
59cb41d
to
5b3a031
Compare
9c8fa6b
to
4aa911c
Compare
WalkthroughAdds multiple streaming chat-completion test fixtures (SGLang and vLLM), standardizes their JSON schema to use expected_output and input_stream, introduces a helper CLI script for streaming with optional tools, extends parser config with separator tokens, and refactors DeepSeek v3.1 tool-call parsing with new block extraction and single-call parsing. Updates tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester
participant TestRunner as Test runner
participant Fixture as Test fixture (JSON)
participant Parser as Stream parser
participant ToolParser as DeepSeek v3.1 tool-call parser
Tester->>TestRunner: Run E2E streaming tests
TestRunner->>Fixture: Load expected_output + input_stream
Note over Fixture: expected_output.{normal_content, reasoning_content, tool_calls}
TestRunner->>Parser: parse_response_stream(input_stream, opts)
loop For each chunk
Parser-->>TestRunner: deltas (reasoning/content/tool signals)
end
alt Tool-call tokens present
Parser->>ToolParser: parse_tool_calls_deepseek_v3_1(text)
ToolParser->>ToolParser: extract_tool_call_blocks(start_tokens, end_tokens)
ToolParser->>ToolParser: parse_single_tool_call(block, separator_tokens)
ToolParser-->>Parser: ToolCallResponse(s) with UUIDs
else No tool-call tokens
Parser-->>TestRunner: Aggregated text only
end
TestRunner->>Fixture: Compare aggregated vs expected_output
TestRunner-->>Tester: Assertions pass/fail
sequenceDiagram
autonumber
participant Client as CLI helper
participant OpenAI as OpenAI client
participant Server as LLM server
Client->>OpenAI: chat.completions.create(stream=true, tools?)
OpenAI->>Server: Request (model, messages, params)
loop Stream
Server-->>OpenAI: chat.completion.chunk
OpenAI-->>Client: yield chunk
Client->>Client: count += 1
end
Client-->>Client: print total chunks
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (9)
lib/llm/tests/test_parsers_e2e.rs (3)
29-57
: Loader enforces new format only; comment is misleading.Code requires expected_output and input_stream; the “supports both new and legacy formats” comment is inaccurate. Either update the comment or add a legacy fallback.
Apply one of:
- Update comment:
- // Extract expected values (supports both new and legacy formats) + // Extract expected values (new format only: requires `expected_output` and `input_stream`)
- Or add tolerant fallback (example sketch):
- let expected = parsed_json - .get("expected_output") - .expect("No 'expected_output' object found in JSON"); + let expected = parsed_json + .get("expected_output") + .or_else(|| Some(&parsed_json)) + .expect("No 'expected_output' object found in JSON or legacy root fields missing"); ... - let data_chunks = parsed_json - .get("input_stream") - .and_then(|v| v.as_array()) - .expect("No 'input_stream' array found in JSON"); + let data_chunks = parsed_json + .get("input_stream") + .or_else(|| parsed_json.get("data")) + .and_then(|v| v.as_array()) + .expect("No 'input_stream' (new) or 'data' (legacy) array found in JSON");
165-184
: Tool-call equality is brittle across streamed chunks.Streamed tool_calls may repeat across chunks. Consider deduping by (function.name, arguments) before comparing lengths.
Example:
- assert_eq!(actual_tool_calls.len(), expected_tool_calls.len()); + let mut unique_actual = actual_tool_calls.to_vec(); + unique_actual.dedup_by(|a, b| a["function"]["name"] == b["function"]["name"] + && a["function"]["arguments"] == b["function"]["arguments"]); + assert_eq!(unique_actual.len(), expected_tool_calls.len());
353-360
: Test flag/comment mismatch.Comment says “parsing disabled” but you pass true,true (tool_parse_enable, reasoning_enable). Fix the comment or set both flags to false.
- // Parse the response stream with reasoning and tool parsing disabled + // Parse the response stream with reasoning and tool parsing enabledOr:
- true, - true, + false, + false,lib/llm/tests/scripts/tool_call_stream_request_helper.py (3)
17-20
: Remove unused import and let isort/black settle imports.pathlib.Path is unused; ruff will flag it.
-import argparse -from openai import OpenAI -from pathlib import Path +import argparse +from openai import OpenAI
64-69
: Rename unused loop variable to appease ruff (B007).- for chunk in stream: + for _chunk in stream: chunk_count += 1
162-164
: Avoid f-string without placeholders (F541).- print(f"✓ Request completed successfully") + print("✓ Request completed successfully")lib/parsers/src/tool_calling/json/deepseek_parser.rs (3)
63-66
: Avoid early return to miss mixed-token matches.Returning after first token-pair match can skip valid blocks if models mix token variants across chunks. Aggregate across pairs and dedup.
- // If we found matches with this token pair, don't try other combinations - if !blocks.is_empty() { - return blocks; - } + // Continue scanning other token pairs; we'll aggregate and dedup belowOptionally dedup at end:
blocks.sort(); blocks.dedup();
150-166
: Pick earliest start index for normal_text.Current find_map returns the first configured token found, not the earliest occurrence. Compute min index across tokens to avoid off-by-selection when multiple variants exist.
- let normal_text = if !wrapper_tokens.is_empty() { - wrapper_tokens - .iter() - .find_map(|token| { - trimmed - .find(token.as_str()) - .map(|idx| trimmed[..idx].to_string()) - }) - .unwrap_or_else(String::new) + let normal_text = if !wrapper_tokens.is_empty() { + let idx = wrapper_tokens + .iter() + .filter_map(|t| trimmed.find(t.as_str())) + .min(); + idx.map(|i| trimmed[..i].to_string()).unwrap_or_default() } else { // Fallback to first individual call token if no wrapper found - tool_call_start_tokens - .iter() - .filter(|token| !token.is_empty()) - .find_map(|token| trimmed.find(token).map(|idx| trimmed[..idx].to_string())) - .unwrap_or_else(String::new) + let idx = tool_call_start_tokens + .iter() + .filter(|token| !token.is_empty()) + .filter_map(|t| trimmed.find(t)) + .min(); + idx.map(|i| trimmed[..i].to_string()).unwrap_or_default() };
215-233
: Simplify partial-token detection and reduce allocations.Current loop builds Strings per char. Use starts_with/ends_with checks.
- config.tool_call_start_tokens.iter().any(|token| { - if token.is_empty() { - return false; - } - // Check if the chunk could be a prefix of this start token - // Handle Unicode character boundaries properly - for i in 1..=token.chars().count() { - if let Some(prefix) = token.chars().take(i).collect::<String>().get(..) { - let prefix_str = &prefix[..prefix.len()]; - if trimmed == prefix_str || trimmed.ends_with(prefix_str) { - return true; - } - } - } - false - }) + config.tool_call_start_tokens.iter().any(|token| { + !token.is_empty() && (token.starts_with(trimmed) || trimmed.ends_with(token)) + })
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
lib/llm/tests/data/sglang/gpt-oss-20b/chat_completion_stream_19c97899-tool.json
(1 hunks)lib/llm/tests/data/sglang/gpt-oss-20b/chat_completion_stream_675195a8-no-tool.json
(1 hunks)lib/llm/tests/data/sglang/qwen3-0.6B/chat_completion_stream_c42ba578-tool.json
(1 hunks)lib/llm/tests/data/vllm/gpt-oss-20b/chat_completion_stream_49f581c1-no-tool.json
(11 hunks)lib/llm/tests/data/vllm/gpt-oss-20b/chat_completion_stream_f0c86d72-tool.json
(1 hunks)lib/llm/tests/data/vllm/nemotron-49b/chat_completion_stream_3d40f925-tool.json
(1 hunks)lib/llm/tests/data/vllm/qwen3-0.6B/chat_completion_stream_5627a4c6-no-tool.json
(2 hunks)lib/llm/tests/data/vllm/qwen3-0.6B/chat_completion_stream_8f33c28b-tool.json
(1 hunks)lib/llm/tests/parallel_tool_call_integration.rs
(1 hunks)lib/llm/tests/scripts/tool_call_stream_request_helper.py
(1 hunks)lib/llm/tests/test_parsers_e2e.rs
(2 hunks)lib/parsers/src/tool_calling/config.rs
(3 hunks)lib/parsers/src/tool_calling/json/deepseek_parser.rs
(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
lib/llm/tests/parallel_tool_call_integration.rs (2)
lib/parsers/src/tool_calling/parsers.rs (1)
detect_and_parse_tool_call
(80-104)lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)
serde_json
(95-95)serde_json
(107-107)
lib/llm/tests/scripts/tool_call_stream_request_helper.py (1)
lib/async-openai/src/client.rs (1)
chat
(96-98)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
lib/parsers/src/tool_calling/config.rs (1)
deepseek_v3_1
(155-189)
🪛 GitHub Actions: Copyright Checks
lib/llm/tests/scripts/tool_call_stream_request_helper.py
[error] 1-1: Copyright header check failed. Missing/invalid header detected in this file.
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3557/merge) by elyasmnvidian.
lib/llm/tests/scripts/tool_call_stream_request_helper.py
[error] 1-1: isort reformatted the file during pre-commit (files were modified by this hook).
[error] 1-1: Black formatting reformatted the file during pre-commit (reformatted lib/llm/tests/scripts/tool_call_stream_request_helper.py).
[error] 1-1: Ruff linting found issues and fixed them during pre-commit (2 errors fixed).
🪛 Ruff (0.14.0)
lib/llm/tests/scripts/tool_call_stream_request_helper.py
65-65: Loop control variable chunk
not used within loop body
Rename unused chunk
to _chunk
(B007)
162-162: f-string without any placeholders
Remove extraneous f
prefix
(F541)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: sglang
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (12)
lib/llm/tests/data/vllm/nemotron-49b/chat_completion_stream_3d40f925-tool.json (1)
3-18
: LGTM! Test data restructuring is consistent.The restructuring wraps response fields into
expected_output
and introducesinput_stream
, aligning with the broader test fixture standardization across the PR.lib/llm/tests/data/vllm/qwen3-0.6B/chat_completion_stream_8f33c28b-tool.json (1)
3-18
: LGTM! Consistent test data restructuring.The changes align with the standardized test fixture format across the PR.
lib/llm/tests/data/sglang/qwen3-0.6B/chat_completion_stream_c42ba578-tool.json (1)
1-132
: LGTM! Well-structured SGLang test fixture.The new test fixture follows the standardized format with
expected_output
andinput_stream
, providing comprehensive test coverage for SGLang streaming with tool calls.lib/llm/tests/data/vllm/qwen3-0.6B/chat_completion_stream_5627a4c6-no-tool.json (1)
4-4
: LGTM! Improved emoji representation.The change from Unicode escape sequence to direct emoji character is valid and improves readability.
Also applies to: 167-167
lib/llm/tests/data/vllm/gpt-oss-20b/chat_completion_stream_f0c86d72-tool.json (1)
3-18
: LGTM! Consistent test fixture restructuring.The standardized
expected_output
andinput_stream
structure aligns with the broader test fixture updates.lib/llm/tests/parallel_tool_call_integration.rs (1)
385-479
: LGTM! Comprehensive DeepSeek v3.1 parsing test.The new test provides thorough coverage of DeepSeek v3.1 tool-call parsing, including:
- Parser invocation and error handling
- Content extraction (remaining vs. tool calls)
- Tool call structure validation
- Detailed argument validation with nested array checks
Error handling with
.expect()
is appropriate for tests, and the assertions are specific and meaningful.lib/llm/tests/data/sglang/gpt-oss-20b/chat_completion_stream_19c97899-tool.json (1)
1-83
: LGTM! Complete SGLang streaming fixture.The new test fixture is well-structured and follows the standardized format with
expected_output
andinput_stream
, supporting end-to-end testing of SGLang streaming with tool invocation.lib/parsers/src/tool_calling/config.rs (2)
26-29
: LGTM! Well-designed separator token support.The new
tool_call_separator_tokens
field is:
- Properly documented with clear examples
- Backward compatible (defaults to empty vector)
- Appropriately scoped for models that need it (like DeepSeek v3.1)
Also applies to: 49-49
159-184
: LGTM! Comprehensive DeepSeek v3.1 token variants.The expanded token variants support different tokenizer representations:
- Unicode-style brackets:
<|...|>
- ASCII-style brackets:
<|...|>
- Underscore vs. Unicode spaces:
tool_calls_begin
vs.tool▁calls▁begin
This ensures robust parsing across tokenizer variations.
lib/llm/tests/data/vllm/gpt-oss-20b/chat_completion_stream_49f581c1-no-tool.json (1)
3-7
: Fixture shape LGTM (expected_output + input_stream).Structure matches loader (expected_output.normal_content/reasoning_content/tool_calls + input_stream). Unicode punctuation looks consistent; repeated id is correct. No issues from me.
Also applies to: 8-256
lib/llm/tests/data/sglang/gpt-oss-20b/chat_completion_stream_675195a8-no-tool.json (1)
1-16
: SGLang fixture looks good.expected_output + input_stream align with tests. Extra event/comment fields are harmless. No tool_calls present as expected.
Also applies to: 8-135
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
19-24
: Helper visibility differs from summary.extract_tool_call_blocks is private (fn …), but the PR summary claims it’s public. Confirm intended visibility and adjust if needed.
4aa911c
to
a268cfd
Compare
This branch contains parser implementation changes: - Updated deepseek_parser.rs with bug fixes - Modified tool_calling/config.rs - Added parallel_tool_call_integration.rs test - Added tool_call_stream_request_helper.py script
a268cfd
to
8e266e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
Signed-off-by: athreesh <[email protected]>
Fixes: #3599
Summary
This PR fixes critical issues with the DeepSeek tool call parser for streaming JSON responses and adds comprehensive integration testing infrastructure.
Key Changes
Problem Statement
The DeepSeek tool call parser was failing to correctly parse streaming JSON responses, particularly when handling tool calls with complex arguments. The parser underwent several iterations to handle edge cases in the streaming format, requiring significant refactoring of the JSON parsing logic.
Changes
1. DeepSeek Parser Refactoring (
lib/parsers/src/tool_calling/json/deepseek_parser.rs
)2. Configuration Updates (
lib/parsers/src/tool_calling/config.rs
)3. Parallel Tool Call Integration Tests (
lib/llm/tests/parallel_tool_call_integration.rs
)4. Helper Tooling (
lib/llm/tests/scripts/tool_call_stream_request_helper.py
)Testing
The changes have been validated with:
Implementation Details
The parser now correctly:
Summary by CodeRabbit