-
Notifications
You must be signed in to change notification settings - Fork 720
chore: added utility to detect possible tool call start for a chunk #2923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ayushag <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
lib/parsers/src/tool_calling/json/base_json_parser.rs (2)
309-329: Ignore empty tokens, trim input, and prefer char searchPrevents accidental always-true when a config carries an empty start token and avoids scanning whitespace-only chunks.
pub fn detect_tool_call_start_basic_json( chunk: &str, config: &JsonParserConfig, ) -> anyhow::Result<bool> { - // Case 1: If there is any of the start tokens in the chunk, return true - if config - .tool_call_start_tokens - .iter() - .any(|token| chunk.contains(token)) - { + let trimmed = chunk.trim(); + if trimmed.is_empty() { + return Ok(false); + } + // Case 1: If there is any of the start tokens in the chunk, return true + if config + .tool_call_start_tokens + .iter() + .filter(|t| !t.is_empty()) + .any(|token| trimmed.contains(token)) + { return Ok(true); } // Case 2: If there is any "{" or "[" in the chunk, return true // This case will lead to false positives for those models which does not emit tool call start tokens - if chunk.contains("{") || chunk.contains("[") { + if trimmed.contains('{') || trimmed.contains('[') { return Ok(true); } Ok(false) }
331-446: Nice coverage; consider an explicit empty-chunk testAdd one test asserting empty input returns false to lock in the new trim behavior.
lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)
104-117: Harden detector (trim + ignore empty tokens)Mirror Harmony/Basic JSON behavior to avoid whitespace and empty-token pitfalls.
pub fn detect_tool_call_start_deepseek_v3_1( chunk: &str, config: &JsonParserConfig, ) -> anyhow::Result<bool> { - // if chunk contains tool_call_start_tokens then return true - if config - .tool_call_start_tokens - .iter() - .any(|token| chunk.contains(token)) - { + let trimmed = chunk.trim(); + if trimmed.is_empty() { + return Ok(false); + } + // if chunk contains tool_call_start_tokens then return true + if config + .tool_call_start_tokens + .iter() + .filter(|t| !t.is_empty()) + .any(|token| trimmed.contains(token)) + { return Ok(true); } Ok(false) }
234-272: Add an empty-chunk negative testOne quick test for chunk = "" → false would lock behavior and document the trim.
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
157-173: Nit: ignore empty tokens in the contains checkBring this in line with JSON/DeepSeek detectors.
- if config - .tool_call_start_tokens - .iter() - .any(|token| trimmed.contains(token)) + if config + .tool_call_start_tokens + .iter() + .filter(|t| !t.is_empty()) + .any(|token| trimmed.contains(token)) { return Ok(true); }lib/parsers/src/tool_calling/parsers.rs (2)
89-114: Consider non-fatal behavior for unimplemented formats in detectionFor detection-only flows, returning Ok(false) for Typescript/Xml (instead of an error) can be friendlier in streaming pipelines that probe multiple parsers. If you prefer current erroring semantics, no change needed.
- ToolCallParserType::Typescript => { - anyhow::bail!("Typescript parser not implemented"); - } - ToolCallParserType::Xml => { - anyhow::bail!("Xml parser not implemented"); - } + ToolCallParserType::Typescript => Ok(false), + ToolCallParserType::Xml => Ok(false),Add a small test asserting detect_tool_call_start(..., Some("unknown")) returns an error (current behavior) and that TS/XML return false (if you adopt the change).
1220-1280: Great end-to-end coverageNice spread across parsers. Consider adding:
- default (None) path detection test.
- negative test for unknown parser string.
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
190-199: Tighten heuristic minimally; fix comment typo; prefer char searchThe permissive heuristic is fine for “possible start” detection. Two small tweaks:
- Use a char pattern for contains() (slightly more idiomatic/efficient).
- Fix “atleast” → “at least”.
pub fn detect_tool_call_start_pythonic(chunk: &str) -> anyhow::Result<bool> { // Format Structure: [tool1(arg1=val1, arg2=val2), tool2(arg1=val3)] - // Check if the chunk contains atleast "[" - if !chunk.contains("[") { + // Check if the chunk contains at least '[' + if !chunk.contains('[') { return Ok(false); } Ok(true) }Optional (if you want fewer false positives at near-zero cost), add a tiny lookahead that still defaults to current behavior:
- Ok(true) + if let Some(i) = chunk.find('[') { + if let Some(c) = chunk[i+1..].chars().skip_while(|c| c.is_whitespace()).next() { + if c.is_ascii_alphabetic() || c == '_' { + return Ok(true); + } + } + } + // Maintain permissive behavior for now + Ok(true)
367-399: Good coverage; add two edge-case tests to lock in behaviorNice set of tests, including an explicit false-positive case. Consider adding:
- Empty chunk → false
- Lone "[" (stream split) → true (documents intended permissiveness)
mod detect_parser_tests { use super::*; @@ fn test_detect_tool_call_start_pythonic_false_positive() { // Since we detect just "[" as tool call start token, this will be a false positive let text = r#"Hey [ There is one tool call here . foo(a=1, b=2)"#; let result = detect_tool_call_start_pythonic(text).unwrap(); assert!(result); } + + #[test] + fn test_detect_tool_call_start_pythonic_empty_chunk() { + let text = ""; + let result = detect_tool_call_start_pythonic(text).unwrap(); + assert!(!result); + } + + #[test] + fn test_detect_tool_call_start_pythonic_lone_open_bracket() { + let text = "["; + let result = detect_tool_call_start_pythonic(text).unwrap(); + assert!(result); + } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs(3 hunks)lib/parsers/src/tool_calling/harmony/mod.rs(1 hunks)lib/parsers/src/tool_calling/json/base_json_parser.rs(1 hunks)lib/parsers/src/tool_calling/json/deepseek_parser.rs(3 hunks)lib/parsers/src/tool_calling/json/mod.rs(2 hunks)lib/parsers/src/tool_calling/parsers.rs(3 hunks)lib/parsers/src/tool_calling/pythonic/mod.rs(1 hunks)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
detect_tool_call_start_pythonic(190-198)try_tool_call_parse_pythonic(162-188)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
detect_tool_call_start_harmony(157-173)parse_tool_calls_harmony(22-155)
lib/parsers/src/tool_calling/parsers.rs (4)
lib/parsers/src/tool_calling/config.rs (2)
harmony(145-154)pythonic(138-143)lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
detect_tool_call_start_harmony(157-173)parse_tool_calls_harmony(22-155)lib/parsers/src/tool_calling/json/mod.rs (2)
detect_tool_call_start_json(38-43)try_tool_call_parse_json(28-36)lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)
detect_tool_call_start_pythonic(190-198)try_tool_call_parse_pythonic(162-188)
lib/parsers/src/tool_calling/json/base_json_parser.rs (1)
lib/parsers/src/tool_calling/config.rs (2)
default(45-55)default(68-73)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
lib/parsers/src/tool_calling/config.rs (2)
default(45-55)default(68-73)
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)
lib/parsers/src/tool_calling/config.rs (2)
default(45-55)default(68-73)
lib/parsers/src/tool_calling/json/mod.rs (2)
lib/parsers/src/tool_calling/json/base_json_parser.rs (1)
detect_tool_call_start_basic_json(310-329)lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
detect_tool_call_start_deepseek_v3_1(104-117)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and Test - dynamo
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (9)
lib/parsers/src/tool_calling/harmony/mod.rs (1)
7-7: Re-export looks goodPublicly surfacing the detector alongside the parser aligns with the central dispatcher usage.
lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)
46-48: Good refactor to reuse detector in parserEarly-out via detect_tool_call_start_deepseek_v3_1 keeps parsing fast on non-matching chunks.
lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)
32-33: Good: parser now gates on detector resultThis avoids unnecessary Harmony parsing when tokens are absent.
292-315: Detector tests look solidCovers presence/absence and middle-of-text cases.
lib/parsers/src/tool_calling/pythonic/mod.rs (1)
7-7: Re-export is consistent with other modulesMatches Harmony/JSON pattern; enables central dispatcher.
lib/parsers/src/tool_calling/json/mod.rs (2)
8-9: Public exports LGTMExposing both parse and detect keeps API symmetric.
38-43: Dispatcher looks correctSwitch on JsonParserType and forward to the right detector; no changes requested.
lib/parsers/src/tool_calling/parsers.rs (1)
5-8: Imports grouped appropriatelyKeeps the dispatcher readable.
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)
190-199: Keep-> anyhow::Result<bool>ondetect_tool_call_start_pythonic
Alldetect_tool_call_start_*functions (JSON, Harmony, Pythonic) uniformly returnanyhow::Result<bool>, and the centraldetect_tool_call_startdispatcher expects aResult<bool>to propagate errors consistently. No change needed.
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: Indrajit Bhosale <[email protected]>
…2923) Signed-off-by: ayushag <[email protected]>
…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: hongkuanz <[email protected]>
Overview:
PR adds a utility method to detect possibility of tool call start from a streaming chunk. Adds extensive test cases.
Usage
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit