feat: add tool_choice support#4722
Conversation
|
👋 Hi vladnosiv! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
@ayushag-nv Could you to an early review of the approach here? |
|
I noticed here that the usual FC in dynamo are not streamed like OpenAI, but wait for the full generation of the entire FC |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
b732106 to
cdddc24
Compare
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
To be honest, I expected that fewer changes would be required. |
|
will review today ! |
@vladnosiv Yes thats the jail streaming part that allows us to use non-streaming parsers. The normal content gets streamed as it is but it blocks the stream as soon as it detects potential start of tool calls. Then tries to find the complete tool call and then stream it. |
Okay, I really overlooked that the current FC implementation does not involve honest FC streaming. In the current version of this PR, in the case of tool_choice != none/auto I added honest streaming. If dynamo does not plan to support FC streaming, then it makes sense to remove this part from here. Otherwise, it would be possible to generalize the approach with partial Json parsing for streaming FC-deltas in the jail streams component. In addition, I realized here that perhaps the case of FC through the JSON schema can be tried to squeeze into the code path with a jail streaming. It may turn out to be a little more hacky than with honest tags, especially in the case of a parallel call in a required scenario. |
yeah right now no plans and requirements also to support honest FC streaming. So, you can update your PR accordingly. |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
Hi @ayushag-nv ! Honest FC streaming has been removed. In addition, to simplify the current PR in required mode, with parallel FC in streaming, all chunks with FC will return at the end. This simplification can be adjusted for per-call streaming in another PR. |
WalkthroughThis PR implements comprehensive tool_choice support for the OpenAI-compatible API. It extends the jail-based tool call interception mechanism with configurable modes (Named, Required, MarkerBased), converts tool_choice configurations into JSON schemas for guided decoding, and updates multiple public function signatures to accept optional parameters for tool parsing and tool choice handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Areas requiring extra attention:
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (6)
lib/llm/src/protocols/openai/tools.rs (2)
58-63: Consider whether default schema needsadditionalProperties: false.The fallback schema for tools without parameters allows any additional properties by default. If strict guided decoding is intended, consider adding
"additionalProperties": false:fn clone_parameters(function: &FunctionObject) -> Value { function .parameters .clone() - .unwrap_or_else(|| json!({"type": "object", "properties": {}})) + .unwrap_or_else(|| json!({"type": "object", "properties": {}, "additionalProperties": false})) }However, this may be intentional to allow flexibility when no schema is defined.
201-262: Good test coverage for core scenarios.Tests cover the main happy paths and key error conditions. Consider adding tests for edge cases like
tool_choice=NonereturningOk(None)andtool_choice=requiredwithtools=NonereturningMissingToolserror for completeness.lib/llm/src/protocols/openai/chat_completions/jail.rs (1)
889-897: Tool call IDs are non-unique across requests.The hardcoded
"call-1"andformat!("call-{}", idx + 1)IDs differ from the marker-based path which uses parser-generated IDs. Consider using UUIDs for consistency with typical OpenAI responses:- Ok(vec![ChatCompletionMessageToolCallChunk { - index: 0, - id: Some("call-1".to_string()), + Ok(vec![ChatCompletionMessageToolCallChunk { + index: 0, + id: Some(format!("call_{}", uuid::Uuid::new_v4())),This ensures tool call IDs are globally unique, which some clients may rely on for tracking.
lib/llm/tests/tool_choice_finish_reasons.rs (3)
54-87: Consider adding error context to unwraps for easier test debugging.The chained unwraps at line 86 could make it hard to diagnose test failures. Consider using
expect()with descriptive messages.- output_stream.next().await.unwrap().data.unwrap() + output_stream + .next() + .await + .expect("jail stream should emit at least one item") + .data + .expect("jail output should contain data")
198-225: Missing jail-based test for named tool choice with Stop finish reason.This test verifies the generator behavior (Stop remains Stop before jail), but doesn't verify the jail behavior. Test 6 shows that Required+Stop becomes ToolCalls through jail. There should be a corresponding async test verifying that Named+Stop remains Stop (or becomes ToolCalls) through jail.
Consider adding an async version that applies jail transformation:
#[tokio::test] async fn test_named_tool_choice_normal_stop_becomes_stop_through_jail() { let mut request = create_test_request(); let tool_choice = Some(ChatCompletionToolChoiceOption::Named( ChatCompletionNamedToolChoice { r#type: ChatCompletionToolType::Function, function: FunctionName { name: "get_weather".to_string(), }, }, )); request.inner.tool_choice = tool_choice.clone(); let mut generator = request.response_generator("req-stop-jail".to_string()); let backend_output = build_backend_output_with_finish( r#"{"location":"Paris","unit":"celsius"}"#, common::FinishReason::Stop, ); let raw_response = generator .choice_from_postprocessor(backend_output) .expect("choice generation"); let response = apply_jail_transformation(raw_response, tool_choice).await; // Verify jail behavior for named tool choice with Stop assert_eq!( response.choices[0].finish_reason, Some(dynamo_async_openai::types::FinishReason::Stop), // or ToolCalls? ); }
122-196: Consider adding jail-based tests for ContentFilter preservation.Tests 3 and 4 verify ContentFilter preservation at the generator level, but don't verify jail behavior. If the jail has different handling for ContentFilter vs Length, these cases would be uncovered. Consider adding async versions similar to test 1.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
lib/llm/src/preprocessor.rs(3 hunks)lib/llm/src/protocols/openai.rs(2 hunks)lib/llm/src/protocols/openai/chat_completions.rs(2 hunks)lib/llm/src/protocols/openai/chat_completions/jail.rs(17 hunks)lib/llm/src/protocols/openai/common_ext.rs(1 hunks)lib/llm/src/protocols/openai/completions.rs(1 hunks)lib/llm/src/protocols/openai/tools.rs(1 hunks)lib/llm/tests/test_common_ext.rs(1 hunks)lib/llm/tests/test_reasoning_parser.rs(2 hunks)lib/llm/tests/test_streaming_tool_parsers.rs(1 hunks)lib/llm/tests/tool_choice.rs(1 hunks)lib/llm/tests/tool_choice_finish_reasons.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.
Applied to files:
lib/llm/tests/tool_choice.rslib/llm/src/protocols/openai/completions.rslib/llm/src/protocols/openai/chat_completions.rslib/llm/tests/tool_choice_finish_reasons.rslib/llm/src/protocols/openai/tools.rslib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T15:27:42.511Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.
Applied to files:
lib/llm/tests/tool_choice.rslib/llm/tests/test_streaming_tool_parsers.rslib/llm/src/protocols/openai/tools.rslib/llm/tests/test_reasoning_parser.rslib/llm/src/protocols/openai/chat_completions/jail.rslib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T19:55:41.608Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.
Applied to files:
lib/llm/tests/tool_choice.rslib/llm/src/protocols/openai/completions.rslib/llm/src/protocols/openai/chat_completions.rslib/llm/src/protocols/openai/tools.rslib/llm/src/protocols/openai/chat_completions/jail.rs
📚 Learning: 2025-09-10T05:04:58.417Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.
Applied to files:
lib/llm/tests/tool_choice.rslib/llm/src/protocols/openai/chat_completions/jail.rslib/llm/src/preprocessor.rs
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
lib/llm/tests/tool_choice.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/llm/src/protocols/openai/common_ext.rslib/llm/src/protocols/openai/chat_completions.rslib/llm/tests/test_common_ext.rslib/llm/src/protocols/openai/tools.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.
Applied to files:
lib/llm/tests/test_streaming_tool_parsers.rslib/llm/tests/test_reasoning_parser.rslib/llm/src/protocols/openai/chat_completions/jail.rslib/llm/src/preprocessor.rs
📚 Learning: 2025-08-22T20:10:09.345Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/parsers/src/reasoning/gpt_oss_parser.rs:31-37
Timestamp: 2025-08-22T20:10:09.345Z
Learning: StreamableParser from openai-harmony does not implement the Debug trait, which prevents storing it as a field in structs that derive Debug in lib/parsers/src/reasoning/gpt_oss_parser.rs.
Applied to files:
lib/llm/tests/test_streaming_tool_parsers.rslib/llm/tests/test_reasoning_parser.rs
📚 Learning: 2025-09-08T21:18:43.478Z
Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2936
File: lib/parsers/src/reasoning/granite_parser.rs:42-46
Timestamp: 2025-09-08T21:18:43.478Z
Learning: In GraniteReasoningParser in lib/parsers/src/reasoning/granite_parser.rs, the think_start_tokens and think_end_tokens are hardcoded in the constructor with fixed values, so unwrap() calls on these vectors are safe and won't panic.
Applied to files:
lib/llm/tests/test_reasoning_parser.rs
🧬 Code graph analysis (4)
lib/llm/tests/tool_choice.rs (2)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1)
new(103-134)lib/llm/src/protocols/openai/chat_completions/jail.rs (5)
default(1192-1194)builder(471-473)new(126-135)new(425-427)new(1026-1034)
lib/llm/src/protocols/openai/common_ext.rs (2)
lib/llm/src/protocols/openai/chat_completions.rs (1)
get_guided_json(162-180)lib/llm/src/protocols/openai/completions.rs (1)
get_guided_json(186-188)
lib/llm/tests/tool_choice_finish_reasons.rs (1)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
default(1192-1194)builder(471-473)
lib/llm/src/preprocessor.rs (3)
lib/llm/src/protocols/openai/chat_completions/jail.rs (2)
tool_call_parser(1069-1072)builder(471-473)lib/bindings/python/rust/llm/local_model.rs (1)
tool_call_parser(97-99)lib/llm/src/preprocessor/prompt/template/oai.rs (1)
tool_choice(188-194)
🔇 Additional comments (30)
lib/llm/src/protocols/openai/common_ext.rs (1)
97-97: Trait signature change to ownedValueis appropriate.Returning an owned
serde_json::Valueenables implementations to construct new schemas dynamically (e.g., deriving fromtool_choice+tools), which wouldn't be possible with a borrowed reference. The relevant implementations inchat_completions.rsandcompletions.rshave been updated accordingly.lib/llm/src/protocols/openai/tools.rs (1)
65-106: Well-structured schema generation forrequiredtool_choice.The
anyOfapproach correctly allows the model to select any tool while enforcing the schema for each tool's parameters. The$defsmerging with conflict detection is a good safeguard.lib/llm/src/protocols/openai/chat_completions/jail.rs (4)
767-792: Immediate mode jail termination logic is correct.The JSON parsing approach correctly waits for complete, valid JSON before ending the jail. The non-empty array check for
ArrayOfToolsaligns with theminItems: 1constraint in the generated schema.
987-1000: Finish reason handling aligns with OpenAI semantics.Named tool choice keeping
Stopand required tool choice changing toToolCallsmatches OpenAI's behavior where forced tool invocation via named choice is considered a normal completion, whilerequiredmode indicates explicit tool call handling.
543-562: Emission filtering for jailed choices prevents aggregator issues.The logic correctly filters out empty deltas for choices that were jailed, avoiding issues where the aggregator receives deltas without roles after unjailing. Based on learnings, this aligns with the jail stream's design.
1092-1106: Builder methods for immediate jail modes are clean.The
tool_choice_namedandtool_choice_requiredmethods provide a clear API for configuring immediate jail modes. Note that calling these will override any previously setjail_mode.lib/llm/src/protocols/openai.rs (2)
20-20: Newtoolsmodule correctly exported.The public export enables access to
get_json_schema_from_toolsandToolChoiceErrorfor external schema derivation needs.
134-141: Guided JSON now passed as owned value.The change from
.cloned()to direct pass-through is correct sinceget_guided_json()now returns an ownedOption<serde_json::Value>. This simplifies the code and avoids an unnecessary clone.lib/llm/src/protocols/openai/completions.rs (1)
186-188: Implementation correctly updated for owned return type.The
.clone()is necessary since the field is stored asOption<serde_json::Value>and the trait now requires returning an owned value.lib/llm/tests/test_common_ext.rs (1)
93-96: Test expectation correctly updated for owned return type.The assertion now expects
Some(serde_json::json!({"key": "value"}))(owned) instead of a borrowed reference, aligning with the updatedget_guided_json()signature that returnsOption<serde_json::Value>.lib/llm/tests/test_reasoning_parser.rs (2)
486-490: Call site correctly updated for new API signature.The
apply_tool_calling_jailcall now passes the tool parser asSome(...)andNonefortool_choice, matching the updated function signature. This is appropriate since this test focuses on reasoning/tool parsing without tool_choice configuration.
599-603: Consistent API adaptation for the ignored test.Same signature update applied correctly. The
Nonefor tool_choice maintains the existing test behavior.lib/llm/src/protocols/openai/chat_completions.rs (1)
162-180: Well-structured guided JSON derivation with tool_choice fallback.The implementation correctly:
- Prioritizes explicit
guided_jsonfrom the request (cloned for ownership)- Falls back to deriving JSON schema from
tool_choice+toolswhen available- Gracefully handles derivation failures with a warning log and
NonereturnThe ownership change from borrowed to owned is necessary to support the new derivation path that constructs values rather than referencing existing fields.
lib/llm/tests/test_streaming_tool_parsers.rs (1)
159-164: Call site correctly adapted with clear documentation.The update wraps
tool_parserinSome(...)and passesNonefortool_choice, matching the new API. The inline comment clarifies intent for future readers.lib/llm/tests/tool_choice.rs (8)
1-35: Clean test setup with minimal request construction.The imports and
create_test_requesthelper provide a solid foundation for the test suite. The request construction uses sensible defaults with only required fields populated.
37-70: Single-response jail transformation helper is well-structured.The helper correctly uses the builder pattern and handles both
NamedandRequiredtool choice variants. Theunwrap()on line 69 is acceptable in test code.
72-110: Streaming variant correctly collects all transformed responses.The implementation mirrors the single-response variant but collects all outputs via
filter_mapandcollect, which is appropriate for streaming tests.
173-234: Thorough coverage of parallel tool calls with required tool choice.The test properly validates:
- Correct
FinishReason::ToolCalls- Two tool calls with sequential indices (0, 1)
- Correct IDs (
call-1,call-2)- Proper function names and arguments extraction
The JSON array parsing for parallel tool calls is well-tested.
236-255: Good coverage of parse failure fallback behavior.Tests the important edge case where JSON parsing fails, ensuring the raw content is returned rather than being silently dropped. The comment on lines 251-252 clarifies this matches marker-based FC behavior.
257-325: Streaming buffering test validates correct accumulation.Tests that partial JSON chunks are buffered until complete, then emitted as a single tool call. The chunk boundary (
{"location":"/Paris","unit":"/celsius"}) is a realistic test case.
327-406: Parallel tool calls in streaming mode well-tested.Validates that the jail stream correctly buffers and parses a JSON array split across chunks, producing multiple tool calls in a single response.
408-436: Synchronous test for no-tool-choice baseline behavior.Good to have this baseline test confirming normal text output flows through without tool call interpretation when
tool_choiceis not set.lib/llm/tests/tool_choice_finish_reasons.rs (5)
1-15: LGTM on imports and license header.Imports are well-organized and include all necessary types for testing the jail transformation and finish_reason handling.
16-37: LGTM on test request helper.Clean minimal setup for test requests.
39-52: LGTM on backend output helper.Correctly constructs minimal backend output for finish_reason testing.
89-120: LGTM on named tool choice length preservation test.Correctly verifies that the jail transformation preserves
Lengthfinish reason for named tool choice.
227-250: LGTM on required tool choice stop-to-tool_calls transformation test.Correctly verifies that the jail transforms
StoptoToolCallsfor required tool choice.lib/llm/src/preprocessor.rs (3)
754-757: LGTM on enabling jail for Named/Required without parser.This correctly enables immediate jail mode for Named and Required tool_choice, which don't require a parser for marker-based detection since they operate in immediate mode.
780-814: LGTM on jail configuration logic.The match-based configuration correctly routes:
- Named → immediate jail with specific function name
- Required → immediate jail in required mode
- Auto/None/unspecified → traditional marker-based jail with parser
The fallthrough case at lines 802-809 is safe because
should_apply_tool_jailalready returnsfalsewhen there's no parser and tool_choice is Auto/None.
974-979: LGTM on updated call site.The call correctly passes both
tool_call_parserandtool_choiceto the updatedapply_tool_calling_jailfunction.
ayushag-nv
left a comment
There was a problem hiding this comment.
I am ok with the implementation. @grahamking can u give your rust expert view on this once ?
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
/ok to test b3c69af |
|
Thanks @vladnosiv ! |
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Merged 238 commits from main branch to bring the feature branch up to date. Key conflicts resolved: - Removed lib/kvbm-kernels references (deleted in main) - Kept nova/nova-backend/kvbm workspace members from feature branch - Maintained v2 module API refactoring from feature branch - Updated Cargo.lock files to reflect new dependencies Major updates from main include: - LoRA support for vLLM (#4810) - Multimodal documentation (#4510) - Scaling adapter features (#4699, #4825) - Tool calling support (#4822, #4722) - NIXL connect improvements (#4433) Signed-off-by: Ryan Olson <rolson@nvidia.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
Overview:
Implements OpenAI
tool_choiceparameter support for Chat Completions API.Details:
Jail Stream Enhancement:
JailModeenum withMarkerBased(traditional) andImmediate(tool_choice) variantsJSON Schema Generation:
tool_choice+toolsCommonExtProvider::get_guided_json()for guided decodingnone,auto,required, and named tool choice modesRequest Preprocessing:
JailModebased ontool_choiceoptionFinish Reasons Handling:
fix_finish_reason()post-processor adjusts finish reasons after jail streamWhere should the reviewer start?
lib/llm/src/protocols/openai/chat_completions/jail.rs-JailMode::Immediateimplementationlib/llm/src/protocols/openai/tools.rs- JSON schema generation for guided decodingFuture Work
Per-call streaming for
requiredmode to emit individual tool calls as they complete, rather than waiting for the entire array.Related Issues:
Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.