Conversation
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
nemo_skills/inference/model/tool_call.py (1)
71-75: Handle non-string tool arguments and useLOG.exception.This is a duplicate of a previous review comment that remains unaddressed. The code has two issues:
json.loads(tool_args)raisesTypeError(notJSONDecodeError) whentool_argsis already a dict/list. Some providers return pre-parsed arguments, so this exception escapes the handler and breaks subsequent tool calls.- Line 73 uses
LOG.errorinstead ofLOG.exception, making debugging harder.Apply this diff to fix both issues:
- try: - tool_args = json.loads(tool_args) - except json.decoder.JSONDecodeError as e: - LOG.error(f"Tool arguments are not in JSON format: {tool_args}") - LOG.exception(e) + try: + if isinstance(tool_args, str): + tool_args = json.loads(tool_args) + except (json.JSONDecodeError, TypeError) as e: + LOG.exception(f"Tool arguments are not in JSON format: {tool_args}") return {"error": "Tool argument parsing failed."}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/inference/model/tool_call.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/model/tool_call.py (2)
nemo_skills/mcp/adapters.py (3)
format_tool_list_by_completion_type(51-75)format_tool_response_by_completion_type(95-110)get_tool_details_by_completion_type(113-123)nemo_skills/inference/model/base.py (1)
CompletionType(37-40)
🪛 Ruff (0.13.3)
nemo_skills/inference/model/tool_call.py
73-73: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
95-95: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🔇 Additional comments (2)
nemo_skills/inference/model/tool_call.py (2)
23-31: LGTM! Good use of adapter functions.The imports align well with the PR's objective to add completion_type support. The new adapter functions properly encapsulate the completion_type-specific formatting logic.
98-158: LGTM! Clean completion_type integration.The method correctly threads
completion_typethrough the entire tool calling flow. The use of adapter functions for tool formatting and theserialized_outputapproach align well with the PR's objectives.
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nemo_skills/inference/generate.py (1)
227-229: LGTM with optional refinement.The hard deprecation approach is correct, providing clear migration guidance to users. The error message appropriately directs users to the new
++inference.completion_type=textparameter.Optional: The static analysis tool suggests moving the error message to a constant to follow best practices (TRY003). While not critical, this could improve consistency if similar patterns appear elsewhere:
+_DEPRECATED_USE_COMPLETIONS_API_MSG = "use_completions_api is deprecated, please use ++inference.completion_type=text instead." + def _post_init_deprecated_params(self): if self.use_completions_api: - raise ValueError("use_completions_api is deprecated, please use ++inference.completion_type=text instead.") + raise ValueError(_DEPRECATED_USE_COMPLETIONS_API_MSG)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/inference/generate.py(10 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/base.py (1)
CompletionType(37-40)
🪛 Ruff (0.13.3)
nemo_skills/inference/generate.py
229-229: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (6)
nemo_skills/inference/generate.py (6)
43-43: LGTM!The import of
CompletionTypeis correctly placed and necessary for the new completion type logic.
60-66: LGTM!The
completion_typefield is well-defined with clear documentation covering all three modes (chat, text, responses). The default ofCompletionType.chatmaintains backward compatibility.
87-89: LGTM!The deprecation comment clearly directs users to the new
completion_typeparameter. The field is retained to provide a meaningful error message when used, which aligns with the hard deprecation strategy discussed in previous reviews.
190-190: LGTM!The call to
_post_init_deprecated_params()ensures that deprecated parameter usage is caught during initialization, providing early feedback to users.
211-211: LGTM!Correctly sets
completion_typetoCompletionType.textfor Megatron server, replacing the olduse_completions_apibehavior. This aligns with Megatron's lack of chat completion support.
277-277: LGTM!All logic correctly migrates from the deprecated
use_completions_apito the newcompletion_typeparameter:
- Line 277: Chat template kwargs handling excludes text completions
- Line 289: Tokenizer setup includes text completions (plus soft_fail and count_prompt_tokens cases)
- Line 301: Tokens_to_generate validation applies only to text completions
- Line 358: Tokenizer passed to prompt setup only for text completions
The logic treats "responses" mode similarly to "chat" mode in this file, with responses-specific handling presumably implemented in the model classes.
Also applies to: 289-289, 301-302, 358-358
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Kipok
left a comment
There was a problem hiding this comment.
Could you please test if responses api is supported in https://build.nvidia.com/openai/gpt-oss-20b? If so, then we can add it as a simple generation test in here to ensure this functionality doesn't break https://github.com/NVIDIA/NeMo-Skills/blob/main/tests/test_generation.py
requirements/main.txt
Outdated
| ipython | ||
| iso639-lang | ||
| litellm[caching] < 1.75.0 # some bug with asyncio.run hanging forever | ||
| litellm[caching] == 1.77.5 # Requires patching the logging worker |
There was a problem hiding this comment.
do we have to pin to exact version? What happens if we use latest?
There was a problem hiding this comment.
Because the patch explicitly change one of its classes, so if they modify that class in the future, we need to change that class's modification too.
There was a problem hiding this comment.
ok, can youplease update the comment to add a little more details, so that when we look at this line in the future it's easy to remember why we pinned this and what we should test to unpin.
And the behavior here is that this keeps slurm jobs from existing and we waste gpus? But it only happens in some cases, not always?
There was a problem hiding this comment.
yes, left more comments about that.
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
I tested that endpoint, and it seems to be worse than vllm: Its function call responses are empty! |
|
ok, but should we still have a responses test with that endpoint, but without function calls? Just to ensure there is no obvious mistakes in the code, like some parameter not passed or some other issue like this? |
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
nemo_skills/mcp/adapters.py (1)
119-119: Replace assert with explicit validation.The
assertstatement for validatingtool_call["type"]can be disabled with Python's-Ooptimization flag, which could allow invalid tool calls to proceed silently in production environments.Apply this diff to use explicit validation:
elif endpoint_type == EndpointType.responses: - assert tool_call["type"] == "function_call", "Tool call must be a function call" + if tool_call.get("type") != "function_call": + raise ValueError(f"Tool call must be a function call, got type: {tool_call.get('type')}") tool_name = tool_call["name"] tool_args = tool_call["arguments"]Based on past review comments and security best practices.
nemo_skills/inference/model/tool_call.py (1)
93-96: Addstrict=Trueto zip() for safety.The
zip()call should usestrict=Trueto ensuretool_callsandtool_resultshave matching lengths. This prevents silent bugs if the lists don't match (e.g., if a tool call fails to return a result).Apply this diff:
return [ format_tool_response_by_endpoint_type(tool_call, tool_result, endpoint_type) - for tool_call, tool_result in zip(tool_calls, tool_results) + for tool_call, tool_result in zip(tool_calls, tool_results, strict=True) ]Based on past review comments and static analysis hints.
🧹 Nitpick comments (4)
docs/basics/prompt-format.md (1)
111-112: Reflect the newcompletion_typeflag in this doc snippetThe CLI now steers completions vs. responses via the
completion_typesetting (per this PR’s goals). This example still references only++inference.endpoint_type=text, which will leave readers without guidance on the new flag. Update the prose (and sample command, if applicable) to mention thecompletion_typesetting alongside the endpoint type so the instructions stay accurate.nemo_skills/inference/eval/bfcl.py (1)
120-135: Improve error logging in exception handler.The exception handler logs the error but can be simplified by using
logging.exception()which automatically includes the traceback and exception details.Apply this diff to improve the logging:
try: fmted_prompt = self.message_formatter(messages, tools=tools) except Exception as e: - # Sometimes the parsed tool-call is a string, which is not JSON serializable - # Putting a debugging here in case it happens in the future and we need to address it. LOG.info(f"Messages: {messages}, Tools: {tools}") - LOG.error(f"Error formatting prompt: {e}") - raise e + LOG.exception("Error formatting prompt") + raiseBased on static analysis hints.
nemo_skills/inference/model/base.py (1)
445-445: Consider removing unusedkwargsparameter.The
kwargsparameter is not used in this method. If it's included for API consistency with other parse methods, that's acceptable, but consider adding a comment explaining this or removing it if not needed.Based on static analysis hints.
nemo_skills/inference/model/tool_call.py (1)
71-75: Improve JSON parsing error handling.The current error handling only catches
JSONDecodeErrorbut doesn't handle cases wheretool_argsis already a parsed dict/list (not a string), which would raiseTypeError. Additionally, the logging can be simplified.Apply this diff to handle both cases:
try: - tool_args = json.loads(tool_args) + if isinstance(tool_args, str): + tool_args = json.loads(tool_args) + elif not isinstance(tool_args, dict): + raise ValueError(f"Tool arguments must be a string or dict, got {type(tool_args)}") except json.decoder.JSONDecodeError as e: - LOG.error(f"Tool arguments are not in JSON format: {tool_args}") - LOG.exception(e) + LOG.exception("Tool arguments are not in valid JSON format: %s", tool_args) return {"error": "Tool argument parsing failed."}Based on past review comments and static analysis hints.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
docs/basics/prompt-format.md(1 hunks)docs/pipelines/generation.md(7 hunks)docs/releases/openmathinstruct2/dataset.md(6 hunks)docs/releases/openmathreasoning/evaluation.md(2 hunks)docs/tutorials/posts/gpt-oss-python.md(2 hunks)nemo_skills/dataset/ruler/prepare.py(1 hunks)nemo_skills/inference/eval/bfcl.py(2 hunks)nemo_skills/inference/generate.py(10 hunks)nemo_skills/inference/model/base.py(8 hunks)nemo_skills/inference/model/code_execution.py(8 hunks)nemo_skills/inference/model/openai.py(1 hunks)nemo_skills/inference/model/parallel_thinking.py(2 hunks)nemo_skills/inference/model/tool_call.py(6 hunks)nemo_skills/mcp/adapters.py(4 hunks)tests/slurm-tests/gpt_oss_python_aime25/run_test.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- docs/releases/openmathreasoning/evaluation.md
- nemo_skills/dataset/ruler/prepare.py
- tests/slurm-tests/gpt_oss_python_aime25/run_test.py
- docs/releases/openmathinstruct2/dataset.md
- docs/pipelines/generation.md
- docs/tutorials/posts/gpt-oss-python.md
🧰 Additional context used
🧬 Code graph analysis (8)
nemo_skills/inference/model/code_execution.py (1)
nemo_skills/inference/model/base.py (2)
BaseModel(43-509)EndpointType(37-40)
nemo_skills/inference/model/parallel_thinking.py (1)
nemo_skills/inference/model/base.py (2)
BaseModel(43-509)EndpointType(37-40)
nemo_skills/mcp/adapters.py (1)
nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
nemo_skills/inference/model/openai.py (2)
nemo_skills/inference/model/base.py (2)
_build_responses_request_params(205-206)_build_chat_request_params(198-199)nemo_skills/inference/model/vllm.py (2)
_build_responses_request_params(142-148)_build_chat_request_params(102-140)
nemo_skills/inference/model/base.py (4)
nemo_skills/inference/patch_litellm_logging.py (1)
patch_litellm_logging_worker(59-71)nemo_skills/inference/model/openai.py (1)
_build_responses_request_params(155-160)nemo_skills/inference/model/vllm.py (1)
_build_responses_request_params(142-148)nemo_skills/inference/model/tool_call.py (1)
generate_async(98-160)
nemo_skills/inference/model/tool_call.py (2)
nemo_skills/mcp/adapters.py (3)
format_tool_list_by_endpoint_type(51-76)format_tool_response_by_endpoint_type(96-111)get_tool_details_by_endpoint_type(114-124)nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
nemo_skills/inference/eval/bfcl.py (1)
nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
🪛 Ruff (0.13.3)
nemo_skills/mcp/adapters.py
76-76: Avoid specifying long messages outside the exception class
(TRY003)
111-111: Avoid specifying long messages outside the exception class
(TRY003)
123-123: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/model/base.py
291-291: Avoid specifying long messages outside the exception class
(TRY003)
445-445: Unused method argument: kwargs
(ARG002)
502-502: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/model/tool_call.py
73-73: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
95-95: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
nemo_skills/inference/generate.py
229-229: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/eval/bfcl.py
126-126: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
127-127: Use raise without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (7)
nemo_skills/inference/model/base.py (1)
491-503: Robust serialization with clear error handling.The
_serialize_outputmethod handles both responses and chat completion formats cleanly. The error message for unsupported types is clear and actionable.nemo_skills/inference/model/openai.py (1)
155-160: Clean adapter for responses API format.The method correctly delegates to the existing chat request builder and remaps the keys to match the responses API surface (
messages→input,max_completion_tokens→max_output_tokens). This avoids code duplication while providing the necessary format transformation.nemo_skills/inference/model/parallel_thinking.py (1)
55-55: Clean migration to endpoint_type enum.The replacement of
use_completions_api: bool = Falsewithendpoint_type: EndpointType = EndpointType.chatis a clean refactor that maintains backward compatibility in behavior while providing better clarity and extensibility.nemo_skills/inference/model/code_execution.py (1)
68-68: Proper endpoint_type propagation through code execution flow.The
endpoint_typeparameter is correctly threaded through the entire code execution call chain (_generate_single,generate_async,_stream_single) and included in request dictionaries, ensuring consistent behavior with the new endpoint-type based routing.Also applies to: 113-113, 263-263, 321-321, 336-336
nemo_skills/inference/generate.py (2)
60-66: Well-documented endpoint_type configuration.The new
endpoint_typefield is clearly documented with examples for each endpoint type ("chat", "text", "responses") and includes guidance on tokenizer override behavior. This provides good clarity for users migrating from the deprecateduse_completions_apiflag.
227-229: Explicit deprecation prevents silent errors.The hard deprecation of
use_completions_apiwith a clear error message and migration guidance is the right approach. This addresses the concerns raised in past review comments about silent failures being worse than breaking changes, ensuring users are immediately aware they need to update their configuration.nemo_skills/inference/model/tool_call.py (1)
112-113: Solid tool invocation flow with endpoint-type awareness.The tool invocation logic correctly:
- Formats tools based on endpoint type
- Propagates endpoint_type through all generation calls
- Serializes outputs appropriately
- Handles tool call execution with proper error handling
The integration of endpoint_type throughout the flow maintains consistency with the broader refactor.
Also applies to: 138-147
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nemo_skills/inference/patch_litellm_logging.py (1)
28-59: Consider adding type hints for completeness.While the current implementation is functional, adding complete type hints would improve code clarity and maintainability. Consider adding return type hints (e.g.,
-> None) to all methods.Example:
- def start(self) -> None: + def start(self) -> None: pass - async def stop(self) -> None: + async def stop(self) -> None: passNote: Some methods already have
-> None, so this would ensure consistency across all methods.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_skills/inference/patch_litellm_logging.py(1 hunks)requirements/main.txt(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- requirements/main.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (3)
nemo_skills/inference/patch_litellm_logging.py (3)
15-23: Past review comment addressed.The docstring now provides the requested details about why this patch is needed, including the specific symptoms (async_loop gets stuck), impact (slurm jobs don't exit, GPU waste), and environment differences (Docker vs. local).
43-49: Coroutine closing prevents resource leaks.The
enqueueandensure_initialized_and_enqueuemethods correctly close coroutines to prevent resource leaks and suppress "coroutine was never awaited" warnings. This is the appropriate approach for a no-op implementation meant to disable functionality.
61-73: LGTM!The patch function correctly replaces both the
LoggingWorkerclass and theGLOBAL_LOGGING_WORKERinstance, and gracefully handles version compatibility by catchingModuleNotFoundError. The implementation is safe to call during module import.
Kipok
left a comment
There was a problem hiding this comment.
lgtm, as long as tests pass
…sponses-api Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
requirements/main.txt (1)
30-30: Document the pin rationale.Please capture the concrete failure mode and what to retest before unpinning so future upgrades aren’t blind. Right now the comment just says the patch exists, but not why it matters.
-litellm[caching] == 1.77.5 # Requires patching the logging worker (See nemo_skills/inference/patch_litellm_logging.py) +litellm[caching] == 1.77.5 # Pin because litellm's logging worker can block Slurm job teardown; we patch it (see nemo_skills/inference/patch_litellm_logging.py) and must re-run the multi-GPU teardown test before unpinning.nemo_skills/inference/eval/bfcl.py (1)
133-140: Consider usinglogging.exceptionand bareraise.The try/except block is useful for debugging, but consider these minor improvements:
- Line 139: Use
logging.exception(...)instead oflogging.error(...)to automatically include the exception traceback.- Line 140: Use bare
raiseinstead ofraise efor cleaner exception re-raising.Apply this diff:
try: fmted_prompt = self.message_formatter(messages, tools=tools) except Exception as e: # Sometimes the parsed tool-call is a string, which is not JSON serializable # Putting a debugging here in case it happens in the future and we need to address it. LOG.info(f"Messages: {messages}, Tools: {tools}") - LOG.error(f"Error formatting prompt: {e}") - raise e + LOG.exception("Error formatting prompt") + raisenemo_skills/inference/generate.py (1)
227-229: Hard deprecation approach aligns with past discussion.The validation correctly raises an error when
use_completions_api=True, forcing users to migrate toendpoint_type. This aligns with the team's decision to hard-deprecate rather than maintain backward compatibility. The error message clearly directs users to the new parameter.Optional: Address static analysis hint.
Ruff suggests avoiding long error messages inline (TRY003). If preferred, you could extract the message to a constant:
+ _DEPRECATED_USE_COMPLETIONS_API_MSG = ( + "use_completions_api is deprecated, please use ++inference.endpoint_type=text instead." + ) + def _post_init_deprecated_params(self): if self.use_completions_api: - raise ValueError("use_completions_api is deprecated, please use ++inference.endpoint_type=text instead.") + raise ValueError(self._DEPRECATED_USE_COMPLETIONS_API_MSG)However, this is a minor style preference and not required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
nemo_skills/inference/eval/bfcl.py(2 hunks)nemo_skills/inference/generate.py(10 hunks)requirements/main.txt(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
nemo_skills/inference/eval/bfcl.py (1)
nemo_skills/inference/model/base.py (1)
EndpointType(37-40)
🪛 Ruff (0.13.3)
nemo_skills/inference/generate.py
229-229: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/eval/bfcl.py
139-139: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
140-140: Use raise without specifying exception name
Remove exception name
(TRY201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (5)
nemo_skills/inference/eval/bfcl.py (2)
41-41: LGTM!The import is necessary for the
endpoint_typesetting added toconstruct_input_dict.
141-148: LGTM!Explicitly setting
endpoint_type = EndpointType.textis correct for BFCL's use case and aligns with the broader shift toward endpoint-type aware configurations across the codebase. Based on past review comments, this prevents unwanted assumptions about completion type based on input format.nemo_skills/inference/generate.py (3)
60-66: LGTM! Clean migration to enum-based endpoint configuration.The
endpoint_typefield with theEndpointTypeenum is a significant improvement over the booleanuse_completions_api. The comprehensive documentation clearly explains the behavior for each endpoint type (chat, text, responses), and the enum provides better type safety and extensibility.
211-211: LGTM! Correct migration for Megatron server.Setting
endpoint_type = EndpointType.textfor Megatron server correctly replaces the previoususe_completions_apilogic.
277-277: LGTM! Consistent logic updates across all usage points.All conditional checks have been correctly updated to use
endpoint_type:
- Line 277:
endpoint_type != EndpointType.textfor chat_template_kwargs (includes chat and responses)- Line 289:
endpoint_type == EndpointType.textfor tokenizer setup- Line 301:
endpoint_type == EndpointType.textfor tokens_to_generate validation- Line 361:
endpoint_type == EndpointType.textfor prompt tokenizer passingThe logic consistently treats the responses API the same as the chat API in this file, with model-specific implementation details handled in the model classes.
Also applies to: 289-289, 301-302, 361-361
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
use_completions_apiin favour ofcompletion_type.Running example:
Limitations with VLLM+GPT-OSS:
<channel>or other template related keywords in tool nameSummary by CodeRabbit
New Features
Deprecations / Breaking Changes
Documentation
Chores