Skip to content

feat: add tool_choice support#4722

Merged
grahamking merged 20 commits intoai-dynamo:mainfrom
vladnosiv:tool-choice-support
Dec 9, 2025
Merged

feat: add tool_choice support#4722
grahamking merged 20 commits intoai-dynamo:mainfrom
vladnosiv:tool-choice-support

Conversation

@vladnosiv
Copy link
Copy Markdown
Contributor

@vladnosiv vladnosiv commented Dec 3, 2025

Overview:

Implements OpenAI tool_choice parameter support for Chat Completions API.

Details:

Jail Stream Enhancement:

  • Added JailMode enum with MarkerBased (traditional) and Immediate (tool_choice) variants

JSON Schema Generation:

  • New module for deriving JSON schemas from tool_choice + tools
  • Integrated into CommonExtProvider::get_guided_json() for guided decoding
  • Supports none, auto, required, and named tool choice modes

Request Preprocessing:

  • Routes requests to appropriate JailMode based on tool_choice option

Finish Reasons Handling:

  • fix_finish_reason() post-processor adjusts finish reasons after jail stream

Where should the reviewer start?

lib/llm/src/protocols/openai/chat_completions/jail.rs - JailMode::Immediate implementation
lib/llm/src/protocols/openai/tools.rs - JSON schema generation for guided decoding

Future Work

Per-call streaming for required mode to emit individual tool calls as they complete, rather than waiting for the entire array.

Related Issues:

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for Named and Required tool choice modes, enabling more explicit control over which tools the model uses.
    • Implemented automatic JSON schema derivation from tool definitions for guided decoding.
    • Enhanced tool calling configuration with new builder methods for specifying tool choice requirements.
  • Improvements

    • Improved finish reason handling for tool-based completion scenarios.
    • Refined tool jailing logic to support multiple activation modes.

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Dec 3, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Dec 3, 2025

👋 Hi vladnosiv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Dec 3, 2025
@grahamking
Copy link
Copy Markdown
Contributor

@ayushag-nv Could you to an early review of the approach here?

@vladnosiv
Copy link
Copy Markdown
Contributor Author

I noticed here that the usual FC in dynamo are not streamed like OpenAI, but wait for the full generation of the entire FC

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv vladnosiv force-pushed the tool-choice-support branch from b732106 to cdddc24 Compare December 3, 2025 15:57
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv
Copy link
Copy Markdown
Contributor Author

To be honest, I expected that fewer changes would be required.
PR can be split into several to simplify the review, if the approach itself is considered viable.

@ayushag-nv
Copy link
Copy Markdown
Contributor

will review today !

@ayushag-nv ayushag-nv self-requested a review December 4, 2025 16:42
@ayushag-nv
Copy link
Copy Markdown
Contributor

Implements OpenAI tool_choice parameter support for Chat/Completions API with incremental streaming. Enables function calling with proper tool_calls deltas, parallel function calling, and OpenAI API compatibility.

@vladnosiv Yes thats the jail streaming part that allows us to use non-streaming parsers. The normal content gets streamed as it is but it blocks the stream as soon as it detects potential start of tool calls. Then tries to find the complete tool call and then stream it.

@vladnosiv
Copy link
Copy Markdown
Contributor Author

@vladnosiv Yes thats the jail streaming part that allows us to use non-streaming parsers. The normal content gets streamed as it is but it blocks the stream as soon as it detects potential start of tool calls. Then tries to find the complete tool call and then stream it.

Okay, I really overlooked that the current FC implementation does not involve honest FC streaming. In the current version of this PR, in the case of tool_choice != none/auto I added honest streaming.

If dynamo does not plan to support FC streaming, then it makes sense to remove this part from here. Otherwise, it would be possible to generalize the approach with partial Json parsing for streaming FC-deltas in the jail streams component.

In addition, I realized here that perhaps the case of FC through the JSON schema can be tried to squeeze into the code path with a jail streaming. It may turn out to be a little more hacky than with honest tags, especially in the case of a parallel call in a required scenario.

@ayushag-nv
Copy link
Copy Markdown
Contributor

@vladnosiv Yes thats the jail streaming part that allows us to use non-streaming parsers. The normal content gets streamed as it is but it blocks the stream as soon as it detects potential start of tool calls. Then tries to find the complete tool call and then stream it.

Okay, I really overlooked that the current FC implementation does not involve honest FC streaming. In the current version of this PR, in the case of tool_choice != none/auto I added honest streaming.

If dynamo does not plan to support FC streaming, then it makes sense to remove this part from here. Otherwise, it would be possible to generalize the approach with partial Json parsing for streaming FC-deltas in the jail streams component.

In addition, I realized here that perhaps the case of FC through the JSON schema can be tried to squeeze into the code path with a jail streaming. It may turn out to be a little more hacky than with honest tags, especially in the case of a parallel call in a required scenario.

yeah right now no plans and requirements also to support honest FC streaming. So, you can update your PR accordingly.

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@vladnosiv
Copy link
Copy Markdown
Contributor Author

Hi @ayushag-nv !
I fixed the code, now the code follows the path of the jail stream. The only significant change in this part is that JailMode = Immediate has been added, which does not wait for the start tokens (and does not delete them for parsing). It also helps to know the function name when using named tool_choice, because LLM generates only arguments in this scenario.

Honest FC streaming has been removed. In addition, to simplify the current PR in required mode, with parallel FC in streaming, all chunks with FC will return at the end. This simplification can be adjusted for per-call streaming in another PR.

@vladnosiv vladnosiv marked this pull request as ready for review December 5, 2025 19:40
@vladnosiv vladnosiv requested a review from a team as a code owner December 5, 2025 19:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

This PR implements comprehensive tool_choice support for the OpenAI-compatible API. It extends the jail-based tool call interception mechanism with configurable modes (Named, Required, MarkerBased), converts tool_choice configurations into JSON schemas for guided decoding, and updates multiple public function signatures to accept optional parameters for tool parsing and tool choice handling.

Changes

Cohort / File(s) Summary
Preprocessor core logic
lib/llm/src/preprocessor.rs
Expanded jail decision logic to trigger immediate jail for Named or Required tool_choice modes even without a parser. Updated should_apply_tool_jail and apply_tool_calling_jail signatures to accept Option<String> for parser and Option<ChatCompletionToolChoiceOption> for tool choice. Call sites now always invoke jail application with both optional parser and tool choice.
Protocol-level API surface
lib/llm/src/protocols/openai.rs, lib/llm/src/protocols/openai/common_ext.rs
Added public tools module export. Changed CommonExtProvider::get_guided_json() return type from Option<&serde_json::Value> to Option<serde_json::Value> across trait and implementations.
Tool schema derivation
lib/llm/src/protocols/openai/tools.rs
New module providing get_json_schema_from_tools() to derive JSON schemas from tool_choice and tools definitions. Introduced public ToolChoiceError enum with variants for missing tools, conflicts, and empty toolsets. Includes helpers for finding tools, cloning parameters, and merging schema definitions.
Chat completions implementation
lib/llm/src/protocols/openai/chat_completions.rs
Updated get_guided_json() to return owned Option<serde_json::Value>. Implemented fallback logic to derive JSON schema from tool_choice and tools via tools::get_json_schema_from_tools() when no explicit guided JSON is present.
Completions implementation
lib/llm/src/protocols/openai/completions.rs
Updated get_guided_json() to return owned Option<serde_json::Value>, replacing as_ref() with clone() for parameter access.
Jail mechanism enhancements
lib/llm/src/protocols/openai/chat_completions/jail.rs
Introduced public JailMode enum with MarkerBased and Immediate { format: ToolChoiceFormat } variants. Added public ToolChoiceFormat enum with SingleObject { tool_name } and ArrayOfTools variants. Extended ChoiceJailState and ChoiceJailStateCollection to accept starts_jailed: bool. Enhanced JailedStream and JailedStreamBuilder with jail_mode field and builder methods tool_choice_named() and tool_choice_required(). Updated finish reason handling and stream processing logic to honor new jail modes, including JSON parsing for Immediate mode tool choices.
Test infrastructure updates
lib/llm/tests/test_common_ext.rs, lib/llm/tests/test_reasoning_parser.rs, lib/llm/tests/test_streaming_tool_parsers.rs
Updated test expectations and call sites to use new apply_tool_calling_jail signature accepting Option<String> for tool parser and Option<ChatCompletionToolChoiceOption> for tool choice. Modified assertions to reflect owned JSON value returns.
New test modules
lib/llm/tests/tool_choice.rs, lib/llm/tests/tool_choice_finish_reasons.rs
Added comprehensive test suites for tool choice handling, including parsing Named and Required tool choices from JSON, handling parse failures, verifying finish reasons (Stop, ToolCalls, Length, ContentFilter) across different tool choice modes, and validating tool call details.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Jail mechanism redesign: The jail.rs file contains substantial logic density with new enum variants, builder patterns, state initialization changes, and dual-mode handling (MarkerBased vs Immediate) affecting multiple code paths.
  • Schema derivation logic: The new tools.rs module implements non-trivial JSON schema merging and validation logic with error handling for conflicting definitions and missing tools.
  • API signature propagation: Multiple public function signatures have changed across preprocessor.rs, common_ext.rs, and chat_completions.rs, requiring careful tracking of how optional parameters flow through the call chain.
  • Test coverage expansion: New test modules add significant logic that exercises tool choice modes, JSON parsing, and finish reason transformations, warranting close verification.

Areas requiring extra attention:

  • Jail mode initialization and state management in ChoiceJailState::new() and ChoiceJailStateCollection::get_or_create_state()
  • JSON schema derivation logic in tools.rs, particularly the $defs merging and conflict detection
  • Tool choice JSON parsing in parse_tool_choice_json() and its interaction with Immediate mode handling
  • Finish reason transformation logic in fix_finish_reason() when switching between modes

Poem

🐰 With tool_choice now in full bloom,
Named, Required—no more gloom!
Schemas dance through JSON streams,
Jailed and parsed—fulfilling dreams!
The rabbit hops with glee so grand,
Tool calling works across the land! 🎉

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.88% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely identifies the main feature being added: tool_choice support, which matches the primary objective of the pull request.
Description check ✅ Passed The description follows the template structure with Overview, Details, and Where to Start sections. It clearly explains the implementation approach, key components added, and future work planned.
Linked Issues check ✅ Passed The PR successfully implements all key requirements from #4721: support for tool_choice modes (none, auto, required, named), JSON schema generation from tool_choice+tools, guided decoding integration, and partial JSON response parsing via jail stream.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the tool_choice feature scope. The PR modifies jail streams, adds JSON schema generation, updates preprocessor logic, and adjusts finish reasons—all supporting the linked issue #4721 requirements without introducing unrelated functionality.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
lib/llm/src/protocols/openai/tools.rs (2)

58-63: Consider whether default schema needs additionalProperties: false.

The fallback schema for tools without parameters allows any additional properties by default. If strict guided decoding is intended, consider adding "additionalProperties": false:

 fn clone_parameters(function: &FunctionObject) -> Value {
     function
         .parameters
         .clone()
-        .unwrap_or_else(|| json!({"type": "object", "properties": {}}))
+        .unwrap_or_else(|| json!({"type": "object", "properties": {}, "additionalProperties": false}))
 }

However, this may be intentional to allow flexibility when no schema is defined.


201-262: Good test coverage for core scenarios.

Tests cover the main happy paths and key error conditions. Consider adding tests for edge cases like tool_choice=None returning Ok(None) and tool_choice=required with tools=None returning MissingTools error for completeness.

lib/llm/src/protocols/openai/chat_completions/jail.rs (1)

889-897: Tool call IDs are non-unique across requests.

The hardcoded "call-1" and format!("call-{}", idx + 1) IDs differ from the marker-based path which uses parser-generated IDs. Consider using UUIDs for consistency with typical OpenAI responses:

-                    Ok(vec![ChatCompletionMessageToolCallChunk {
-                        index: 0,
-                        id: Some("call-1".to_string()),
+                    Ok(vec![ChatCompletionMessageToolCallChunk {
+                        index: 0,
+                        id: Some(format!("call_{}", uuid::Uuid::new_v4())),

This ensures tool call IDs are globally unique, which some clients may rely on for tracking.

lib/llm/tests/tool_choice_finish_reasons.rs (3)

54-87: Consider adding error context to unwraps for easier test debugging.

The chained unwraps at line 86 could make it hard to diagnose test failures. Consider using expect() with descriptive messages.

-    output_stream.next().await.unwrap().data.unwrap()
+    output_stream
+        .next()
+        .await
+        .expect("jail stream should emit at least one item")
+        .data
+        .expect("jail output should contain data")

198-225: Missing jail-based test for named tool choice with Stop finish reason.

This test verifies the generator behavior (Stop remains Stop before jail), but doesn't verify the jail behavior. Test 6 shows that Required+Stop becomes ToolCalls through jail. There should be a corresponding async test verifying that Named+Stop remains Stop (or becomes ToolCalls) through jail.

Consider adding an async version that applies jail transformation:

#[tokio::test]
async fn test_named_tool_choice_normal_stop_becomes_stop_through_jail() {
    let mut request = create_test_request();
    let tool_choice = Some(ChatCompletionToolChoiceOption::Named(
        ChatCompletionNamedToolChoice {
            r#type: ChatCompletionToolType::Function,
            function: FunctionName {
                name: "get_weather".to_string(),
            },
        },
    ));
    request.inner.tool_choice = tool_choice.clone();

    let mut generator = request.response_generator("req-stop-jail".to_string());
    let backend_output = build_backend_output_with_finish(
        r#"{"location":"Paris","unit":"celsius"}"#,
        common::FinishReason::Stop,
    );

    let raw_response = generator
        .choice_from_postprocessor(backend_output)
        .expect("choice generation");

    let response = apply_jail_transformation(raw_response, tool_choice).await;

    // Verify jail behavior for named tool choice with Stop
    assert_eq!(
        response.choices[0].finish_reason,
        Some(dynamo_async_openai::types::FinishReason::Stop), // or ToolCalls?
    );
}

122-196: Consider adding jail-based tests for ContentFilter preservation.

Tests 3 and 4 verify ContentFilter preservation at the generator level, but don't verify jail behavior. If the jail has different handling for ContentFilter vs Length, these cases would be uncovered. Consider adding async versions similar to test 1.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and bb95bb1.

📒 Files selected for processing (12)
  • lib/llm/src/preprocessor.rs (3 hunks)
  • lib/llm/src/protocols/openai.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions.rs (2 hunks)
  • lib/llm/src/protocols/openai/chat_completions/jail.rs (17 hunks)
  • lib/llm/src/protocols/openai/common_ext.rs (1 hunks)
  • lib/llm/src/protocols/openai/completions.rs (1 hunks)
  • lib/llm/src/protocols/openai/tools.rs (1 hunks)
  • lib/llm/tests/test_common_ext.rs (1 hunks)
  • lib/llm/tests/test_reasoning_parser.rs (2 hunks)
  • lib/llm/tests/test_streaming_tool_parsers.rs (1 hunks)
  • lib/llm/tests/tool_choice.rs (1 hunks)
  • lib/llm/tests/tool_choice_finish_reasons.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

  • lib/llm/tests/tool_choice.rs
  • lib/llm/src/protocols/openai/completions.rs
  • lib/llm/src/protocols/openai/chat_completions.rs
  • lib/llm/tests/tool_choice_finish_reasons.rs
  • lib/llm/src/protocols/openai/tools.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

  • lib/llm/tests/tool_choice.rs
  • lib/llm/tests/test_streaming_tool_parsers.rs
  • lib/llm/src/protocols/openai/tools.rs
  • lib/llm/tests/test_reasoning_parser.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

  • lib/llm/tests/tool_choice.rs
  • lib/llm/src/protocols/openai/completions.rs
  • lib/llm/src/protocols/openai/chat_completions.rs
  • lib/llm/src/protocols/openai/tools.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T05:04:58.417Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.

Applied to files:

  • lib/llm/tests/tool_choice.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • lib/llm/tests/tool_choice.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/llm/src/protocols/openai/common_ext.rs
  • lib/llm/src/protocols/openai/chat_completions.rs
  • lib/llm/tests/test_common_ext.rs
  • lib/llm/src/protocols/openai/tools.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/llm/tests/test_streaming_tool_parsers.rs
  • lib/llm/tests/test_reasoning_parser.rs
  • lib/llm/src/protocols/openai/chat_completions/jail.rs
  • lib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T20:10:09.345Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/parsers/src/reasoning/gpt_oss_parser.rs:31-37
Timestamp: 2025-08-22T20:10:09.345Z
Learning: StreamableParser from openai-harmony does not implement the Debug trait, which prevents storing it as a field in structs that derive Debug in lib/parsers/src/reasoning/gpt_oss_parser.rs.

Applied to files:

  • lib/llm/tests/test_streaming_tool_parsers.rs
  • lib/llm/tests/test_reasoning_parser.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.

Applied to files:

  • lib/llm/tests/test_reasoning_parser.rs
🧬 Code graph analysis (4)
lib/llm/tests/tool_choice.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
  • new (103-134)
lib/llm/src/protocols/openai/chat_completions/jail.rs (5)
  • default (1192-1194)
  • builder (471-473)
  • new (126-135)
  • new (425-427)
  • new (1026-1034)
lib/llm/src/protocols/openai/common_ext.rs (2)
lib/llm/src/protocols/openai/chat_completions.rs (1)
  • get_guided_json (162-180)
lib/llm/src/protocols/openai/completions.rs (1)
  • get_guided_json (186-188)
lib/llm/tests/tool_choice_finish_reasons.rs (1)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
  • default (1192-1194)
  • builder (471-473)
lib/llm/src/preprocessor.rs (3)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
  • tool_call_parser (1069-1072)
  • builder (471-473)
lib/bindings/python/rust/llm/local_model.rs (1)
  • tool_call_parser (97-99)
lib/llm/src/preprocessor/prompt/template/oai.rs (1)
  • tool_choice (188-194)
🔇 Additional comments (30)
lib/llm/src/protocols/openai/common_ext.rs (1)

97-97: Trait signature change to owned Value is appropriate.

Returning an owned serde_json::Value enables implementations to construct new schemas dynamically (e.g., deriving from tool_choice + tools), which wouldn't be possible with a borrowed reference. The relevant implementations in chat_completions.rs and completions.rs have been updated accordingly.

lib/llm/src/protocols/openai/tools.rs (1)

65-106: Well-structured schema generation for required tool_choice.

The anyOf approach correctly allows the model to select any tool while enforcing the schema for each tool's parameters. The $defs merging with conflict detection is a good safeguard.

lib/llm/src/protocols/openai/chat_completions/jail.rs (4)

767-792: Immediate mode jail termination logic is correct.

The JSON parsing approach correctly waits for complete, valid JSON before ending the jail. The non-empty array check for ArrayOfTools aligns with the minItems: 1 constraint in the generated schema.


987-1000: Finish reason handling aligns with OpenAI semantics.

Named tool choice keeping Stop and required tool choice changing to ToolCalls matches OpenAI's behavior where forced tool invocation via named choice is considered a normal completion, while required mode indicates explicit tool call handling.


543-562: Emission filtering for jailed choices prevents aggregator issues.

The logic correctly filters out empty deltas for choices that were jailed, avoiding issues where the aggregator receives deltas without roles after unjailing. Based on learnings, this aligns with the jail stream's design.


1092-1106: Builder methods for immediate jail modes are clean.

The tool_choice_named and tool_choice_required methods provide a clear API for configuring immediate jail modes. Note that calling these will override any previously set jail_mode.

lib/llm/src/protocols/openai.rs (2)

20-20: New tools module correctly exported.

The public export enables access to get_json_schema_from_tools and ToolChoiceError for external schema derivation needs.


134-141: Guided JSON now passed as owned value.

The change from .cloned() to direct pass-through is correct since get_guided_json() now returns an owned Option<serde_json::Value>. This simplifies the code and avoids an unnecessary clone.

lib/llm/src/protocols/openai/completions.rs (1)

186-188: Implementation correctly updated for owned return type.

The .clone() is necessary since the field is stored as Option<serde_json::Value> and the trait now requires returning an owned value.

lib/llm/tests/test_common_ext.rs (1)

93-96: Test expectation correctly updated for owned return type.

The assertion now expects Some(serde_json::json!({"key": "value"})) (owned) instead of a borrowed reference, aligning with the updated get_guided_json() signature that returns Option<serde_json::Value>.

lib/llm/tests/test_reasoning_parser.rs (2)

486-490: Call site correctly updated for new API signature.

The apply_tool_calling_jail call now passes the tool parser as Some(...) and None for tool_choice, matching the updated function signature. This is appropriate since this test focuses on reasoning/tool parsing without tool_choice configuration.


599-603: Consistent API adaptation for the ignored test.

Same signature update applied correctly. The None for tool_choice maintains the existing test behavior.

lib/llm/src/protocols/openai/chat_completions.rs (1)

162-180: Well-structured guided JSON derivation with tool_choice fallback.

The implementation correctly:

  1. Prioritizes explicit guided_json from the request (cloned for ownership)
  2. Falls back to deriving JSON schema from tool_choice + tools when available
  3. Gracefully handles derivation failures with a warning log and None return

The ownership change from borrowed to owned is necessary to support the new derivation path that constructs values rather than referencing existing fields.

lib/llm/tests/test_streaming_tool_parsers.rs (1)

159-164: Call site correctly adapted with clear documentation.

The update wraps tool_parser in Some(...) and passes None for tool_choice, matching the new API. The inline comment clarifies intent for future readers.

lib/llm/tests/tool_choice.rs (8)

1-35: Clean test setup with minimal request construction.

The imports and create_test_request helper provide a solid foundation for the test suite. The request construction uses sensible defaults with only required fields populated.


37-70: Single-response jail transformation helper is well-structured.

The helper correctly uses the builder pattern and handles both Named and Required tool choice variants. The unwrap() on line 69 is acceptable in test code.


72-110: Streaming variant correctly collects all transformed responses.

The implementation mirrors the single-response variant but collects all outputs via filter_map and collect, which is appropriate for streaming tests.


173-234: Thorough coverage of parallel tool calls with required tool choice.

The test properly validates:

  • Correct FinishReason::ToolCalls
  • Two tool calls with sequential indices (0, 1)
  • Correct IDs (call-1, call-2)
  • Proper function names and arguments extraction

The JSON array parsing for parallel tool calls is well-tested.


236-255: Good coverage of parse failure fallback behavior.

Tests the important edge case where JSON parsing fails, ensuring the raw content is returned rather than being silently dropped. The comment on lines 251-252 clarifies this matches marker-based FC behavior.


257-325: Streaming buffering test validates correct accumulation.

Tests that partial JSON chunks are buffered until complete, then emitted as a single tool call. The chunk boundary ({"location":" / Paris","unit":" / celsius"}) is a realistic test case.


327-406: Parallel tool calls in streaming mode well-tested.

Validates that the jail stream correctly buffers and parses a JSON array split across chunks, producing multiple tool calls in a single response.


408-436: Synchronous test for no-tool-choice baseline behavior.

Good to have this baseline test confirming normal text output flows through without tool call interpretation when tool_choice is not set.

lib/llm/tests/tool_choice_finish_reasons.rs (5)

1-15: LGTM on imports and license header.

Imports are well-organized and include all necessary types for testing the jail transformation and finish_reason handling.


16-37: LGTM on test request helper.

Clean minimal setup for test requests.


39-52: LGTM on backend output helper.

Correctly constructs minimal backend output for finish_reason testing.


89-120: LGTM on named tool choice length preservation test.

Correctly verifies that the jail transformation preserves Length finish reason for named tool choice.


227-250: LGTM on required tool choice stop-to-tool_calls transformation test.

Correctly verifies that the jail transforms Stop to ToolCalls for required tool choice.

lib/llm/src/preprocessor.rs (3)

754-757: LGTM on enabling jail for Named/Required without parser.

This correctly enables immediate jail mode for Named and Required tool_choice, which don't require a parser for marker-based detection since they operate in immediate mode.


780-814: LGTM on jail configuration logic.

The match-based configuration correctly routes:

  • Named → immediate jail with specific function name
  • Required → immediate jail in required mode
  • Auto/None/unspecified → traditional marker-based jail with parser

The fallthrough case at lines 802-809 is safe because should_apply_tool_jail already returns false when there's no parser and tool_choice is Auto/None.


974-979: LGTM on updated call site.

The call correctly passes both tool_call_parser and tool_choice to the updated apply_tool_calling_jail function.

Copy link
Copy Markdown
Contributor

@ayushag-nv ayushag-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the implementation. @grahamking can u give your rust expert view on this once ?

Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
@grahamking
Copy link
Copy Markdown
Contributor

/ok to test b3c69af

@grahamking
Copy link
Copy Markdown
Contributor

Thanks @vladnosiv !

@grahamking grahamking enabled auto-merge (squash) December 9, 2025 17:30
@grahamking grahamking merged commit 5585f80 into ai-dynamo:main Dec 9, 2025
37 of 38 checks passed
karen-sy pushed a commit that referenced this pull request Dec 9, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
ryanolson added a commit that referenced this pull request Dec 10, 2025
Merged 238 commits from main branch to bring the feature branch up to date.

Key conflicts resolved:
- Removed lib/kvbm-kernels references (deleted in main)
- Kept nova/nova-backend/kvbm workspace members from feature branch
- Maintained v2 module API refactoring from feature branch
- Updated Cargo.lock files to reflect new dependencies

Major updates from main include:
- LoRA support for vLLM (#4810)
- Multimodal documentation (#4510)
- Scaling adapter features (#4699, #4825)
- Tool calling support (#4822, #4722)
- NIXL connect improvements (#4433)

Signed-off-by: Ryan Olson <rolson@nvidia.com>
zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
smatta-star pushed a commit to smatta-star/dynamo that referenced this pull request Dec 19, 2025
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add tool_choice support

3 participants