chore: added utility to detect possible tool call start for a chunk #2923

ayushag-nv · 2025-09-07T21:43:31Z

Overview:

PR adds a utility method to detect possibility of tool call start from a streaming chunk. Adds extensive test cases.

Usage

parsers::tool_calling::detect_tool_call_start(chunk, parser_str)

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Unified tool-call start detection across parsers with a single entry point and per-parser detectors (JSON, DeepSeek v3.1, Harmony, Pythonic).
- Automatic parser selection with clearer errors for unsupported types.
Refactor
- Parsers now rely on shared detection logic, improving consistency and returning normal content when no tool call is detected.
Tests
- Added unit and end-to-end tests covering multiple models and formats to verify detection accuracy and behavior.

Signed-off-by: ayushag <[email protected]>

copy-pr-bot · 2025-09-07T21:43:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-07T21:52:29Z

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (9)

lib/parsers/src/tool_calling/json/base_json_parser.rs (2)

309-329: Ignore empty tokens, trim input, and prefer char search

Prevents accidental always-true when a config carries an empty start token and avoids scanning whitespace-only chunks.

 pub fn detect_tool_call_start_basic_json(
     chunk: &str,
     config: &JsonParserConfig,
 ) -> anyhow::Result<bool> {
-    // Case 1: If there is any of the start tokens in the chunk, return true
-    if config
-        .tool_call_start_tokens
-        .iter()
-        .any(|token| chunk.contains(token))
-    {
+    let trimmed = chunk.trim();
+    if trimmed.is_empty() {
+        return Ok(false);
+    }
+    // Case 1: If there is any of the start tokens in the chunk, return true
+    if config
+        .tool_call_start_tokens
+        .iter()
+        .filter(|t| !t.is_empty())
+        .any(|token| trimmed.contains(token))
+    {
         return Ok(true);
     }
 
     // Case 2: If there is any "{" or "[" in the chunk, return true
     // This case will lead to false positives for those models which does not emit tool call start tokens
-    if chunk.contains("{") || chunk.contains("[") {
+    if trimmed.contains('{') || trimmed.contains('[') {
         return Ok(true);
     }
     Ok(false)
 }

331-446: Nice coverage; consider an explicit empty-chunk test

Add one test asserting empty input returns false to lock in the new trim behavior.

lib/parsers/src/tool_calling/json/deepseek_parser.rs (2)

104-117: Harden detector (trim + ignore empty tokens)

Mirror Harmony/Basic JSON behavior to avoid whitespace and empty-token pitfalls.

 pub fn detect_tool_call_start_deepseek_v3_1(
     chunk: &str,
     config: &JsonParserConfig,
 ) -> anyhow::Result<bool> {
-    // if chunk contains tool_call_start_tokens then return true
-    if config
-        .tool_call_start_tokens
-        .iter()
-        .any(|token| chunk.contains(token))
-    {
+    let trimmed = chunk.trim();
+    if trimmed.is_empty() {
+        return Ok(false);
+    }
+    // if chunk contains tool_call_start_tokens then return true
+    if config
+        .tool_call_start_tokens
+        .iter()
+        .filter(|t| !t.is_empty())
+        .any(|token| trimmed.contains(token))
+    {
         return Ok(true);
     }
     Ok(false)
 }

234-272: Add an empty-chunk negative test

One quick test for chunk = "" → false would lock behavior and document the trim.

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)

157-173: Nit: ignore empty tokens in the contains check

Bring this in line with JSON/DeepSeek detectors.

-    if config
-        .tool_call_start_tokens
-        .iter()
-        .any(|token| trimmed.contains(token))
+    if config
+        .tool_call_start_tokens
+        .iter()
+        .filter(|t| !t.is_empty())
+        .any(|token| trimmed.contains(token))
     {
         return Ok(true);
     }

lib/parsers/src/tool_calling/parsers.rs (2)

89-114: Consider non-fatal behavior for unimplemented formats in detection

For detection-only flows, returning Ok(false) for Typescript/Xml (instead of an error) can be friendlier in streaming pipelines that probe multiple parsers. If you prefer current erroring semantics, no change needed.
-            ToolCallParserType::Typescript => {
-                anyhow::bail!("Typescript parser not implemented");
-            }
-            ToolCallParserType::Xml => {
-                anyhow::bail!("Xml parser not implemented");
-            }
+            ToolCallParserType::Typescript => Ok(false),
+            ToolCallParserType::Xml => Ok(false),
Add a small test asserting detect_tool_call_start(..., Some("unknown")) returns an error (current behavior) and that TS/XML return false (if you adopt the change).

1220-1280: Great end-to-end coverage

Nice spread across parsers. Consider adding:

default (None) path detection test.

negative test for unknown parser string.

lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)

190-199: Tighten heuristic minimally; fix comment typo; prefer char search

The permissive heuristic is fine for “possible start” detection. Two small tweaks:

Use a char pattern for contains() (slightly more idiomatic/efficient).
Fix “atleast” → “at least”.

 pub fn detect_tool_call_start_pythonic(chunk: &str) -> anyhow::Result<bool> {
   // Format Structure: [tool1(arg1=val1, arg2=val2), tool2(arg1=val3)]
 
-  // Check if the chunk contains atleast "["
-  if !chunk.contains("[") {
+  // Check if the chunk contains at least '['
+  if !chunk.contains('[') {
       return Ok(false);
   }
   Ok(true)
 }

Optional (if you want fewer false positives at near-zero cost), add a tiny lookahead that still defaults to current behavior:

-  Ok(true)
+  if let Some(i) = chunk.find('[') {
+      if let Some(c) = chunk[i+1..].chars().skip_while(|c| c.is_whitespace()).next() {
+          if c.is_ascii_alphabetic() || c == '_' {
+              return Ok(true);
+          }
+      }
+  }
+  // Maintain permissive behavior for now
+  Ok(true)

367-399: Good coverage; add two edge-case tests to lock in behavior

Nice set of tests, including an explicit false-positive case. Consider adding:

Empty chunk → false
Lone "[" (stream split) → true (documents intended permissiveness)

 mod detect_parser_tests {
     use super::*;
 
@@
     fn test_detect_tool_call_start_pythonic_false_positive() {
         // Since we detect just "[" as tool call start token, this will be a false positive
         let text = r#"Hey [ There is one tool call here . foo(a=1, b=2)"#;
         let result = detect_tool_call_start_pythonic(text).unwrap();
         assert!(result);
     }
+
+    #[test]
+    fn test_detect_tool_call_start_pythonic_empty_chunk() {
+        let text = "";
+        let result = detect_tool_call_start_pythonic(text).unwrap();
+        assert!(!result);
+    }
+
+    #[test]
+    fn test_detect_tool_call_start_pythonic_lone_open_bracket() {
+        let text = "[";
+        let result = detect_tool_call_start_pythonic(text).unwrap();
+        assert!(result);
+    }
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1477f6e and 1b2b891.

📒 Files selected for processing (8)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (3 hunks)
lib/parsers/src/tool_calling/harmony/mod.rs (1 hunks)
lib/parsers/src/tool_calling/json/base_json_parser.rs (1 hunks)
lib/parsers/src/tool_calling/json/deepseek_parser.rs (3 hunks)
lib/parsers/src/tool_calling/json/mod.rs (2 hunks)
lib/parsers/src/tool_calling/parsers.rs (3 hunks)
lib/parsers/src/tool_calling/pythonic/mod.rs (1 hunks)
lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (7)

lib/parsers/src/tool_calling/pythonic/mod.rs (1)

lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)

detect_tool_call_start_pythonic (190-198)

try_tool_call_parse_pythonic (162-188)

lib/parsers/src/tool_calling/harmony/mod.rs (1)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

detect_tool_call_start_harmony (157-173)

parse_tool_calls_harmony (22-155)

lib/parsers/src/tool_calling/parsers.rs (4)

lib/parsers/src/tool_calling/config.rs (2)

harmony (145-154)

pythonic (138-143)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

detect_tool_call_start_harmony (157-173)

parse_tool_calls_harmony (22-155)

lib/parsers/src/tool_calling/json/mod.rs (2)

detect_tool_call_start_json (38-43)

try_tool_call_parse_json (28-36)

lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (2)

detect_tool_call_start_pythonic (190-198)

try_tool_call_parse_pythonic (162-188)

lib/parsers/src/tool_calling/json/base_json_parser.rs (1)

lib/parsers/src/tool_calling/config.rs (2)

default (45-55)

default (68-73)

lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)

lib/parsers/src/tool_calling/config.rs (2)

default (45-55)

default (68-73)

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (1)

lib/parsers/src/tool_calling/config.rs (2)

default (45-55)

default (68-73)

lib/parsers/src/tool_calling/json/mod.rs (2)

lib/parsers/src/tool_calling/json/base_json_parser.rs (1)

detect_tool_call_start_basic_json (310-329)

lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)

detect_tool_call_start_deepseek_v3_1 (104-117)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (9)

lib/parsers/src/tool_calling/harmony/mod.rs (1)

7-7: Re-export looks good

Publicly surfacing the detector alongside the parser aligns with the central dispatcher usage.

lib/parsers/src/tool_calling/json/deepseek_parser.rs (1)

46-48: Good refactor to reuse detector in parser

Early-out via detect_tool_call_start_deepseek_v3_1 keeps parsing fast on non-matching chunks.

lib/parsers/src/tool_calling/harmony/harmony_parser.rs (2)

32-33: Good: parser now gates on detector result

This avoids unnecessary Harmony parsing when tokens are absent.

292-315: Detector tests look solid

Covers presence/absence and middle-of-text cases.

lib/parsers/src/tool_calling/pythonic/mod.rs (1)

7-7: Re-export is consistent with other modules

Matches Harmony/JSON pattern; enables central dispatcher.

lib/parsers/src/tool_calling/json/mod.rs (2)

8-9: Public exports LGTM

Exposing both parse and detect keeps API symmetric.

38-43: Dispatcher looks correct

Switch on JsonParserType and forward to the right detector; no changes requested.

lib/parsers/src/tool_calling/parsers.rs (1)

5-8: Imports grouped appropriately

Keeps the dispatcher readable.

lib/parsers/src/tool_calling/pythonic/pythonic_parser.rs (1)

190-199: Keep -> anyhow::Result<bool> on detect_tool_call_start_pythonic
All detect_tool_call_start_* functions (JSON, Harmony, Pythonic) uniformly return anyhow::Result<bool>, and the central detect_tool_call_start dispatcher expects a Result<bool> to propagate errors consistently. No change needed.

Signed-off-by: ayushag <[email protected]>

lib/parsers/src/tool_calling/harmony/harmony_parser.rs

Signed-off-by: ayushag <[email protected]>

…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: Indrajit Bhosale <[email protected]>

…2923) Signed-off-by: ayushag <[email protected]>

…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: hongkuanz <[email protected]>

chore: added utility to detect possible tool call start for a chunk

1b2b891

Signed-off-by: ayushag <[email protected]>

ayushag-nv self-assigned this Sep 7, 2025

ayushag-nv requested a review from a team as a code owner September 7, 2025 21:43

pull-request-size bot added the size/L label Sep 7, 2025

github-actions bot added the chore label Sep 7, 2025

ayushag-nv requested review from elyasmnvidian and grahamking September 7, 2025 21:44

coderabbitai bot reviewed Sep 7, 2025

View reviewed changes

fix: add trim logic for chunks

8e53371

Signed-off-by: ayushag <[email protected]>

grahamking reviewed Sep 8, 2025

View reviewed changes

lib/parsers/src/tool_calling/harmony/harmony_parser.rs Show resolved Hide resolved

grahamking approved these changes Sep 8, 2025

View reviewed changes

elyasmnvidian mentioned this pull request Sep 8, 2025

feat: Tool calling support stream=True #2932

Merged

chore: optimizations

8cd5b8d

Signed-off-by: ayushag <[email protected]>

ayushag-nv enabled auto-merge (squash) September 8, 2025 19:21

ayushag-nv added 3 commits September 8, 2025 13:02

Merge branch 'main' into ayushag/streaming-tool-call-util

a6e12db

Merge branch 'main' into ayushag/streaming-tool-call-util

5883b54

Merge branch 'main' into ayushag/streaming-tool-call-util

633e548

ayushag-nv merged commit b929e13 into main Sep 8, 2025
11 checks passed

ayushag-nv deleted the ayushag/streaming-tool-call-util branch September 8, 2025 23:01

indrajit96 pushed a commit that referenced this pull request Sep 9, 2025

chore: added utility to detect possible tool call start for a chunk (#…

31228c5

…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: Indrajit Bhosale <[email protected]>

indrajit96 pushed a commit that referenced this pull request Sep 9, 2025

chore: added utility to detect possible tool call start for a chunk (#…

98dacfb

…2923) Signed-off-by: ayushag <[email protected]>

tedzhouhk pushed a commit that referenced this pull request Sep 10, 2025

chore: added utility to detect possible tool call start for a chunk (#…

8b1b24c

…2923) Signed-off-by: ayushag <[email protected]> Signed-off-by: hongkuanz <[email protected]>

This was referenced Sep 16, 2025

feat: JailedStream #3034

Merged

fix: harmony parser streaming fix #3074

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: added utility to detect possible tool call start for a chunk #2923

chore: added utility to detect possible tool call start for a chunk #2923

Uh oh!

ayushag-nv commented Sep 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 7, 2025

Uh oh!

coderabbitai bot commented Sep 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chore: added utility to detect possible tool call start for a chunk #2923

chore: added utility to detect possible tool call start for a chunk #2923

Uh oh!

Conversation

ayushag-nv commented Sep 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 7, 2025

Uh oh!

coderabbitai bot commented Sep 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ayushag-nv commented Sep 7, 2025 •

edited by coderabbitai bot

Loading