Conversation
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
WalkthroughAdds a Responses API pathway and pluggable client-handler system for inference, new conversation managers/adapters for tool calling, a GenerationDefaults module, and a use_responses_api flag propagated through model factories and generation setup. OpenAI model logic updated for reasoning-aware parameter handling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant User
participant GenerationTask
participant Factory as Model Factory
participant Model as BaseModel
participant Handler as ClientHandler
participant Provider as LLM API
User->>GenerationTask: run(config.use_responses_api)
GenerationTask->>Factory: get_*_model(use_responses_api)
Factory-->>GenerationTask: model instance
GenerationTask->>Model: generate_async(prompt, params)
Model->>Handler: call_api_async(prompt, params)
Handler->>Handler: build request (chat/responses)
Handler->>Provider: API call
Provider-->>Handler: response
Handler->>Model: parse_response(...)
Model-->>GenerationTask: generation (content, tool_calls, serialized_output)
sequenceDiagram
autonumber
participant TC as ToolCallingWrapper
participant Conv as ConversationManager
participant CI as CallInterpreter
participant Tools as Tools Runtime
Note over TC: Init chooses adapters<br/>based on use_responses_api
TC->>Conv: add_assistant_response(conversation, generation)
TC->>CI: parse(tool_calls)
CI-->>TC: normalized tool_calls
TC->>Tools: _execute_tool_calls(...)
Tools-->>TC: tool_results
TC->>Conv: add_tool_results(conversation, tool_calls_msg, formatted_results)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_skills/pipeline/utils/server.py (1)
155-169: Fix server startup for “responses”
serve_vllm.py doesn’t expose a/v1/responsesendpoint, so launching it forserver_type="responses"will break ResponsesModel clients. Either add/v1/responsessupport to serve_vllm.py or invoke a dedicated serve_responses entrypoint, and update any server_type validations to include"responses".
🧹 Nitpick comments (7)
nemo_skills/inference/model/tool_call.py (2)
71-89: Prefer capability flag over isinstance() to select adapters.Avoid tight coupling to ResponsesModel; check a capability flag on the model instead.
Apply this diff:
- # Import here to avoid circular imports - from .responses import ResponsesModel - - # Detect model type and configure adapters - if isinstance(self.model, ResponsesModel): + # Detect model type and configure adapters + if getattr(self.model, "USES_RESPONSES_API", False): # Responses API model - uses flatter tool schema format self.schema_adapter = ResponsesSchemaAdapter() self.call_interpreter = ResponsesCallInterpreter() self.response_formatter = ResponsesResponseFormatter() self.conversation_manager = ResponsesConversationManager() else: # Chat completion model (default) - uses nested function format self.schema_adapter = OpenAISchemaAdapter() self.call_interpreter = ChatCompletionCallInterpreter() self.response_formatter = ChatCompletionResponseFormatter() self.conversation_manager = CompletionConversationManager()Also set a flag on the model (outside this file):
# in nemo_skills/inference/model/responses.py class ResponsesModel(BaseModel): USES_RESPONSES_API = True
99-103: Return richer error context on JSON arg parse failure.Include tool name and raw args; avoid noisy stack traces for expected parse errors.
- except json.decoder.JSONDecodeError as e: - LOG.exception(e) - return {"error": "Tool argument parsing failed."} + except json.decoder.JSONDecodeError: + LOG.warning("Tool argument parsing failed for %s", tool_name) + return {"error": "Tool argument parsing failed.", "tool": tool_name, "raw_args": tool_args}nemo_skills/mcp/adapters.py (2)
186-197: Silence Ruff ARG002: unused parameter.Rename the unused parameter to underscore.
- def add_tool_results( + def add_tool_results( self, - conversation: List[Dict[str, Any]], - tool_calls_message: Dict[str, Any], - formatted_results: List[Dict[str, Any]], + conversation: List[Dict[str, Any]], + _tool_calls_message: Dict[str, Any], + formatted_results: List[Dict[str, Any]], ) -> None:
228-235: Align subclass signature with base interface to avoid type-checker conflicts.The base ToolResponseFormatter.format expects a ChatCompletionMessageToolCall; this subclass narrows the type. Either relax the base interface or drop annotations here.
- def format(self, tool_call: Dict[str, Any], result: Dict[str, Any]) -> Dict[str, Any]: + def format(self, tool_call, result): """Format the response from a tool call for responses API.""" return { "type": "function_call_output", "call_id": tool_call["id"], "output": json.dumps(result) if not isinstance(result, str) else result, }Optionally, broaden the base interface (outside this hunk) to accept Any for tool_call/result.
nemo_skills/inference/model/responses.py (3)
131-131: Rename unused parameter to underscore.Prevents ARG002 lint and clarifies intent.
- def _parse_responses_response(self, response, include_response: bool = False, **kwargs) -> dict: + def _parse_responses_response(self, response, include_response: bool = False, **_kwargs) -> dict:
270-272: Reduce response logging to debug (optional).Logging full response objects at INFO can be noisy and may include sensitive content. Consider downgrading to DEBUG or summarizing.
- LOG.info(f"Response from server: {response}") - LOG.info(f"Response type: {type(response)}") + LOG.debug("Response type: %s", type(response))
337-339: Reduce sync response logging to debug (optional).Same rationale as async path.
- LOG.info(f"Response from server: {response}") - LOG.info(f"Response type: {type(response)}") + LOG.debug("Response type: %s", type(response))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
nemo_skills/inference/model/__init__.py(2 hunks)nemo_skills/inference/model/responses.py(1 hunks)nemo_skills/inference/model/tool_call.py(4 hunks)nemo_skills/mcp/adapters.py(3 hunks)nemo_skills/pipeline/utils/server.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/model/responses.py (4)
nemo_skills/utils.py (1)
get_logger_name(130-131)nemo_skills/inference/model/base.py (3)
BaseModel(32-513)_build_request_params(193-200)_maybe_apply_stop_phrase_removal(149-153)nemo_skills/inference/model/context_retry.py (1)
with_context_retry(162-182)nemo_skills/inference/model/tool_call.py (1)
generate_async(122-183)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
CompletionConversationManager(145-170)OpenAISchemaAdapter(70-83)ResponsesCallInterpreter(204-222)ResponsesConversationManager(173-196)ResponsesResponseFormatter(225-234)ResponsesSchemaAdapter(237-250)ChatCompletionCallInterpreter(103-122)ChatCompletionResponseFormatter(125-137)add_assistant_response(50-52)add_assistant_response(148-156)add_assistant_response(176-183)parse(35-36)parse(87-90)parse(112-122)parse(207-222)add_tool_results(55-62)add_tool_results(158-170)add_tool_results(185-196)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/responses.py
52-52: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
58-58: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
103-103: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
131-131: Unused method argument: kwargs
(ARG002)
204-204: Do not catch blind exception: Exception
(BLE001)
221-221: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
230-230: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
238-238: Prefer TypeError exception for invalid type
(TRY004)
238-238: Avoid specifying long messages outside the exception class
(TRY003)
291-291: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
300-300: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
308-308: Prefer TypeError exception for invalid type
(TRY004)
308-308: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/mcp/adapters.py
188-188: Unused method argument: tool_calls_message
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (3)
nemo_skills/inference/model/__init__.py (1)
31-31: Wire-up for ResponsesModel looks good.Import and registry exposure for "responses" are correct.
Also applies to: 51-51
nemo_skills/pipeline/utils/server.py (1)
126-127: LGTM: mount check allowlist updated.Including "responses" alongside vllm/sglang/trtllm is appropriate here.
nemo_skills/inference/model/tool_call.py (1)
159-161: Good centralization via conversation_manager.Appending assistant responses through the manager keeps formats consistent across model types.
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
nemo_skills/inference/generate.py (1)
193-202: Fix attribute access: self.cfg.use_completions_api → self.use_completions_apiGenerateSolutionsConfig has no cfg attribute. This will crash on Megatron path.
Apply this diff:
- self.cfg.use_completions_api = True + self.use_completions_api = Truenemo_skills/inference/model/tool_call.py (1)
92-100: Handle non-string tool arguments (dict) to avoid TypeErrorResponses API may supply already-parsed dicts. Current code only catches JSONDecodeError and will raise on non-str types.
Apply this diff:
- tool_args = tool_call["function"]["arguments"] - - ## - # TODO(sanyamk): Not all tool arguments might necessarily be in JSON format. - # Kept here to handle errors for now. - try: - tool_args = json.loads(tool_args) - except json.decoder.JSONDecodeError as e: - LOG.exception(e) - return {"error": "Tool argument parsing failed."} + tool_args = tool_call["function"]["arguments"] + # Accept both JSON strings and dicts + if isinstance(tool_args, str): + try: + tool_args = json.loads(tool_args) + except json.JSONDecodeError as e: + LOG.exception("Tool argument parsing failed: %s", e) + return {"error": "Tool argument parsing failed."} + elif not isinstance(tool_args, dict): + LOG.warning("Unexpected tool argument type: %s", type(tool_args).__name__) + tool_args = {}nemo_skills/inference/model/base.py (1)
641-719: Handle potential UnboundLocalError in retry loop.If the while loop exhausts all retries without returning, the
resultvariable will be undefined on line 719, causing an UnboundLocalError.Apply this diff to fix the UnboundLocalError:
# TODO: remove this after we no longer use gpt-oss or it's fixed in vllm max_retries = 2 retry_count = 0 +result = None while retry_count <= max_retries: try: # Delegate to client handler using public API response = await self.client_handler.call_api_async(prompt, **kwargs) if stream: return self._handle_streaming_response(response) else: result = self.client_handler.parse_response(response, **kwargs) if remove_stop_phrases: self._maybe_apply_stop_phrase_removal(result, remove_stop_phrases, stop_phrases) return result except openai.BadRequestError as e: if "output messages (reasoning and final)" in str(e): if retry_count < max_retries: retry_count += 1 LOG.warning(f"BadRequestError, retrying {retry_count}/{max_retries}: {e}") continue LOG.error(f"BadRequestError after {max_retries} retries, returning empty response: {e}") return {"generation": "", "reasoning_content": "", "num_generated_tokens": 0} else: raise e -return result +# This should never be reached unless retry logic is incorrect +LOG.error("Unexpected end of retry loop without result") +return {"generation": "", "num_generated_tokens": 0}
🧹 Nitpick comments (5)
nemo_skills/inference/model/defaults.py (1)
41-79: De-duplicate param sets to prevent drift between CHAT_COMPLETION_PARAMS and RESPONSES_PARAMSBoth sets are currently identical. Define a single base set and derive others (add/remove as needed) to avoid future divergence bugs.
nemo_skills/inference/model/__init__.py (1)
58-58: Optional: shorten exception message to appease Ruff TRY003Current message length triggers TRY003. Consider a shorter message or a custom exception type.
nemo_skills/inference/model/tool_call.py (1)
45-46: Update outdated docstringThe wrapper now supports both chat-completions and responses via adapters; update the TODO.
nemo_skills/inference/model/base.py (2)
593-615: Consider improving the fallback serialization strategy.The current implementation silently catches all exceptions and falls back to string representation. Consider logging more specific error types or providing more structured fallback data.
Consider this improved error handling approach:
def _serialize_response_output(self, response): """Serialize response output objects using model_dump() for conversation history.""" serialized_output = [] if hasattr(response, "output") and response.output: for output_item in response.output: try: # Try to use model_dump() method if available (Pydantic models) if hasattr(output_item, "model_dump"): serialized_output.append(output_item.model_dump()) # Fallback to dict conversion elif hasattr(output_item, "__dict__"): serialized_output.append(output_item.__dict__) # Last resort: convert to string representation else: serialized_output.append({"content": str(output_item), "type": "unknown"}) - except Exception as e: - LOG.warning(f"Failed to serialize output item: {e}") - # Fallback serialization - serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)}) + except (AttributeError, TypeError) as e: + LOG.warning(f"Failed to serialize output item of type {type(output_item).__name__}: {e}") + # Fallback serialization with type information + serialized_output.append({ + "content": str(output_item), + "type": "serialization_error", + "original_type": type(output_item).__name__, + "error": str(e) + }) + except Exception as e: + LOG.error(f"Unexpected error serializing output item: {e}") + # Generic fallback for unexpected errors + serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)}) return serialized_output
405-429: Consider extracting SSH tunnel configuration to a dataclass.The SSH tunnel setup logic could be encapsulated in a configuration dataclass for better maintainability and reusability.
Would you like me to create a separate
SSHTunnelConfigdataclass to encapsulate the SSH configuration logic and make it more reusable?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
nemo_skills/inference/generate.py(2 hunks)nemo_skills/inference/model/__init__.py(2 hunks)nemo_skills/inference/model/base.py(13 hunks)nemo_skills/inference/model/defaults.py(1 hunks)nemo_skills/inference/model/openai.py(4 hunks)nemo_skills/inference/model/tool_call.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
CompletionConversationManager(145-170)OpenAISchemaAdapter(70-83)ResponsesCallInterpreter(204-222)ResponsesConversationManager(173-196)ResponsesResponseFormatter(225-234)ResponsesSchemaAdapter(237-250)ChatCompletionCallInterpreter(103-122)ChatCompletionResponseFormatter(125-137)add_assistant_response(50-52)add_assistant_response(148-156)add_assistant_response(176-183)parse(35-36)parse(87-90)parse(112-122)parse(207-222)add_tool_results(55-62)add_tool_results(158-170)add_tool_results(185-196)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/__init__.py (3)
get_code_execution_model(62-70)get_tool_calling_model(97-113)get_model(53-59)
nemo_skills/inference/model/openai.py (1)
nemo_skills/inference/model/base.py (5)
get_supported_params(44-46)get_supported_params(150-151)get_supported_params(257-258)get_supported_params(501-503)apply_model_specific_params(505-507)
nemo_skills/inference/model/base.py (3)
nemo_skills/inference/model/openai.py (3)
get_supported_params(81-92)apply_model_specific_params(94-109)_get_api_key(60-72)nemo_skills/inference/model/defaults.py (1)
GenerationDefaults(20-38)nemo_skills/inference/model/context_retry.py (1)
ContextLimitRetryConfig(118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/model/base.py
73-75: Avoid specifying long messages outside the exception class
(TRY003)
310-310: Unused method argument: prompt
(ARG002)
310-310: Unused method argument: params
(ARG002)
312-312: Avoid specifying long messages outside the exception class
(TRY003)
314-314: Unused method argument: prompt
(ARG002)
318-318: Unused method argument: prompt
(ARG002)
400-400: Avoid specifying long messages outside the exception class
(TRY003)
505-505: Unused method argument: params
(ARG002)
609-609: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (9)
nemo_skills/inference/generate.py (2)
95-97: LGTM: client_type surfaced in configPropagating client_type at the config level is correct and consistent with model init.
342-346: LGTM: client_type forwarded to code execution modelForwarding client_type here maintains consistent handler selection.
nemo_skills/inference/model/tool_call.py (1)
71-86: Adapters selection looks correctUsing client_type to choose schema/interpreter/formatter/conversation manager is appropriate and avoids import-time cycles.
nemo_skills/inference/model/base.py (6)
32-138: Well-designed handler abstraction with clear public API.The new
BaseClientHandlerclass provides a clean separation of concerns, clearly distinguishing between public methods that models can override and internal implementation details. The parameter extraction/validation logic with support for model-specific restrictions is particularly well thought out.
140-245: Clean implementation of chat completion handler.The
ChatCompletionHandlerproperly encapsulates litellm-specific logic, handles both chat and completion APIs, and correctly manages extra_body parameters for non-standard fields. The setup of httpx sessions with appropriate connection limits ensures efficient connection pooling.
247-325: Comprehensive ResponsesHandler implementation for the new API format.The handler correctly uses the OpenAI client directly and properly structures requests for the responses API, including proper handling of tools and extra_body parameters.
517-592: Robust response parsing for the Responses API with proper fallbacks.The implementation handles various response structures gracefully, including tool calls, reasoning content, and message content. The serialization logic for conversation history is comprehensive with good error handling.
375-404: Well-structured initialization with proper separation of concerns.The initialization sequence is logical - setup networking, resolve API configuration, then initialize the handler. The public client_handler attribute allows for clean delegation while maintaining encapsulation.
48-86: extract_and_validate_params correctly applies model-level filtering
extract_and_validate_params intersects handler-level and OpenAI’s get_supported_params (non-empty) and skips filtering for base models. No issues found.
Signed-off-by: George Armstrong <georgea@nvidia.com> MAINT switch to use_responses_api parameter Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
nemo_skills/inference/model/base.py (2)
310-313: Silence linters for unused args in completion-path errorAvoid ARG002/TRY003 nits; keep behavior unchanged.
- def build_completion_request_structure(self, prompt: str, params: dict) -> dict: + def build_completion_request_structure(self, prompt: str, params: dict) -> dict: # noqa: ARG002 """Responses API doesn't support completion - raise error""" raise ValueError("ResponsesHandler only supports message lists, not string prompts")
711-736: Remove unreachable trailing returnresult is always returned inside the loop; trailing return can be dropped.
- return result + # Unreachable + # return result
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
nemo_skills/inference/generate.py(2 hunks)nemo_skills/inference/model/__init__.py(2 hunks)nemo_skills/inference/model/base.py(13 hunks)nemo_skills/inference/model/openai.py(4 hunks)nemo_skills/inference/model/tool_call.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- nemo_skills/inference/generate.py
🧰 Additional context used
🧬 Code graph analysis (4)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
CompletionConversationManager(145-170)OpenAISchemaAdapter(70-83)ResponsesCallInterpreter(204-222)ResponsesConversationManager(173-196)ResponsesResponseFormatter(225-234)ResponsesSchemaAdapter(237-250)ChatCompletionCallInterpreter(103-122)ChatCompletionResponseFormatter(125-137)add_assistant_response(50-52)add_assistant_response(148-156)add_assistant_response(176-183)parse(35-36)parse(87-90)parse(112-122)parse(207-222)add_tool_results(55-62)add_tool_results(158-170)add_tool_results(185-196)
nemo_skills/inference/model/__init__.py (1)
nemo_skills/inference/chat_interface/core.py (2)
get(136-151)sandbox(177-178)
nemo_skills/inference/model/openai.py (1)
nemo_skills/inference/model/base.py (5)
get_supported_params(44-46)get_supported_params(150-151)get_supported_params(257-258)get_supported_params(518-520)apply_model_specific_params(522-524)
nemo_skills/inference/model/base.py (3)
nemo_skills/inference/model/openai.py (3)
get_supported_params(81-92)apply_model_specific_params(94-109)_get_api_key(60-72)nemo_skills/inference/model/defaults.py (1)
GenerationDefaults(20-38)nemo_skills/inference/model/context_retry.py (1)
ContextLimitRetryConfig(118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/__init__.py
65-65: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/model/base.py
73-75: Avoid specifying long messages outside the exception class
(TRY003)
310-310: Unused method argument: prompt
(ARG002)
310-310: Unused method argument: params
(ARG002)
312-312: Avoid specifying long messages outside the exception class
(TRY003)
314-314: Unused method argument: prompt
(ARG002)
318-318: Unused method argument: prompt
(ARG002)
415-417: Avoid specifying long messages outside the exception class
(TRY003)
522-522: Unused method argument: params
(ARG002)
626-626: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (4)
nemo_skills/inference/model/__init__.py (2)
53-66: Normalize server_type, add unknown-type guard, and make trtllm check case-insensitivePrevents KeyError on unknown inputs and fixes mixed‑case trtllm from bypassing the guard.
-def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs): +def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs): @@ - model_class = models[server_type.lower()] - if server_type == "trtllm" and kwargs.get("enable_soft_fail", False): + st = server_type.lower() + if st not in models: + raise ValueError(f"Unknown server_type: {server_type}. Choices: {list(models.keys())}") + model_class = models[st] + if st == "trtllm" and kwargs.get("enable_soft_fail", False): if kwargs.get("context_limit_retry_strategy", None) is not None: raise ValueError("context_limit_retry_strategy is not supported for trtllm") return model_class(use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)
122-135: Fix incorrect keyword when delegating to get_modelPassing model=... raises TypeError; get_model expects server_type.
- if isinstance(model, str): - model = get_model(model=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs) + if isinstance(model, str): + model = get_model(server_type=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)nemo_skills/inference/model/tool_call.py (1)
68-86: Adapters setup and conversation manager integration look goodGood separation by responses vs chat and centralized conversation updates.
Also applies to: 156-168
nemo_skills/inference/model/openai.py (1)
81-92: Replace self.model_name_or_path with self.model in reasoning checksself.model_name_or_path is not defined on OpenAIModel; this will raise AttributeError at runtime.
- if self.is_reasoning_model(self.model_name_or_path): + if self.is_reasoning_model(self.model): base_params -= OPENAI_REASONING_UNSUPPORTED @@ - if self.is_reasoning_model(self.model_name_or_path): + if self.is_reasoning_model(self.model): # Reasoning model specific handling if "messages" in request: request["messages"] = [ {**msg, "role": "developer"} if msg.get("role") == "system" else msg for msg in request["messages"] ] if params.get("reasoning_effort"): request["reasoning_effort"] = params["reasoning_effort"]Also applies to: 94-110
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Moves timeout to the main body parameters Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
nemo_skills/mcp/adapters.py (1)
181-189: Silence unused param in ResponsesConversationManager.add_tool_results.Rename to underscore to satisfy Ruff ARG002 and clarify intent.
Apply this diff:
- def add_tool_results( - self, - conversation: List[Dict[str, Any]], - tool_calls_message: Dict[str, Any], - formatted_results: List[Dict[str, Any]], - ) -> None: + def add_tool_results( + self, + conversation: List[Dict[str, Any]], + _tool_calls_message: Dict[str, Any], + formatted_results: List[Dict[str, Any]], + ) -> None: # noqa: ARG002As per static analysis.
nemo_skills/inference/model/base.py (3)
308-315: Suppress unused-arg warnings in ResponsesHandler.make_*_call.
promptisn’t used; rename to underscore.Apply this diff:
- async def make_async_call(self, request: dict, prompt): + async def make_async_call(self, request: dict, _prompt): # noqa: ARG002 @@ - def make_sync_call(self, request: dict, prompt): + def make_sync_call(self, request: dict, _prompt): # noqa: ARG002
597-619: Narrow overly broad exception in _serialize_response_output.Catching Exception hides real failures and trips linters.
Apply this diff:
- except Exception as e: + except (AttributeError, TypeError, ValueError) as e: LOG.warning(f"Failed to serialize output item: {e}") # Fallback serialization serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})
519-522: Suppress unused-arg warning for base apply_model_specific_params.Subclasses use params; base impl doesn’t.
Apply this diff:
- def apply_model_specific_params(self, request: dict, params: dict) -> dict: + def apply_model_specific_params(self, request: dict, params: dict) -> dict: # noqa: ARG002 """Public method for model-specific parameter handling - override in subclasses""" return request
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_skills/inference/model/base.py(13 hunks)nemo_skills/mcp/adapters.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/model/base.py (6)
nemo_skills/inference/model/openai.py (3)
get_supported_params(79-90)apply_model_specific_params(92-107)_get_api_key(58-70)nemo_skills/inference/model/defaults.py (1)
GenerationDefaults(20-38)nemo_skills/inference/model/context_retry.py (1)
ContextLimitRetryConfig(118-159)nemo_skills/inference/model/azure.py (1)
_get_api_key(32-39)nemo_skills/inference/model/gemini.py (1)
_get_api_key(32-39)nemo_skills/inference/model/megatron.py (1)
_parse_completion_response(72-106)
🪛 Ruff (0.13.1)
nemo_skills/mcp/adapters.py
184-184: Unused method argument: tool_calls_message
(ARG002)
nemo_skills/inference/model/base.py
69-71: Avoid specifying long messages outside the exception class
(TRY003)
304-304: Unused method argument: prompt
(ARG002)
304-304: Unused method argument: params
(ARG002)
306-306: Avoid specifying long messages outside the exception class
(TRY003)
308-308: Unused method argument: prompt
(ARG002)
312-312: Unused method argument: prompt
(ARG002)
412-414: Avoid specifying long messages outside the exception class
(TRY003)
519-519: Unused method argument: params
(ARG002)
613-613: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (5)
nemo_skills/mcp/adapters.py (2)
146-171: ConversationManager abstraction and default/chat implementations look solid.Interfaces and responsibilities are clear; integration points are cohesive.
200-219: Do not silently fall back to 'unknown' for tool_call id/name; raise on malformed input.Fail fast if a tool_call lacks id or name to avoid corrupting conversation state later.
Apply this diff:
class ResponsesCallInterpreter(ToolCallInterpreter): @@ - def parse(self, tool_calls: List[Any]) -> Dict[str, Any]: + def parse(self, tool_calls: List[Any]) -> Dict[str, Any]: """Parse tool calls from responses API format.""" parsed_calls = [] for tool_call in tool_calls: - parsed_call = { - "type": "function", - "id": getattr(tool_call, "call_id", getattr(tool_call, "id", "unknown")), - "function": { - "name": getattr(tool_call, "name", "unknown"), - "arguments": getattr(tool_call, "arguments", "{}"), - }, - } + call_id = getattr(tool_call, "call_id", getattr(tool_call, "id", None)) + name = getattr(tool_call, "name", None) + args = getattr(tool_call, "arguments", "{}") + if not call_id or not name: + raise ValueError(f"Malformed tool_call: missing id or name: {tool_call}") + parsed_call = { + "type": "function", + "id": call_id, + "function": {"name": name, "arguments": args}, + } parsed_calls.append(parsed_call) return {"role": "assistant", "tool_calls": parsed_calls}nemo_skills/inference/model/base.py (3)
304-306: Suppress unused-arg warnings in ResponsesHandler.build_completion_request_structure.Keep signature but mark unused to satisfy linters.
Apply this diff:
- def build_completion_request_structure(self, prompt: str, params: dict) -> dict: + def build_completion_request_structure(self, _prompt: str, _params: dict) -> dict: # noqa: ARG002 """Responses API doesn't support completion - raise error""" - raise ValueError("ResponsesHandler only supports message lists, not string prompts") + raise ValueError("ResponsesHandler only supports message lists, not string prompts") # noqa: TRY003
62-71: Param validation correctly allows None-equals-default for unsupported params.Prevents false positives when callers pass default None. Good.
261-301: ResponsesHandler: move timeout to top-level; block stream until supported.Timeout in extra_body is ignored by OpenAI client; streaming for Responses isn't parsed by _handle_streaming_response and will break.
Apply this diff:
def build_chat_request_structure(self, messages: list, params: dict) -> dict: """Build responses API request structure""" # Use proper list of dicts format for vLLM servers request = { "input": messages, "max_output_tokens": params["tokens_to_generate"], "temperature": params["temperature"], "top_p": params["top_p"], - "stream": params["stream"], + "stream": params["stream"], } + + # Pass timeout to client at top-level + if params["timeout"] is not None: + request["timeout"] = params["timeout"] + + # Guard until streaming parsing for Responses is implemented + if params["stream"]: + raise ValueError("stream=True is not supported for ResponsesHandler yet.") @@ - if params["timeout"] is not None: - extra_body["timeout"] = params["timeout"]
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nemo_skills/inference/model/base.py(13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/model/base.py (5)
nemo_skills/inference/model/defaults.py (1)
GenerationDefaults(20-38)nemo_skills/inference/model/context_retry.py (1)
ContextLimitRetryConfig(118-159)nemo_skills/inference/model/gemini.py (1)
_get_api_key(32-39)nemo_skills/inference/model/azure.py (1)
_get_api_key(32-39)nemo_skills/inference/model/megatron.py (1)
_parse_completion_response(72-106)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/base.py
69-71: Avoid specifying long messages outside the exception class
(TRY003)
278-278: Avoid specifying long messages outside the exception class
(TRY003)
312-312: Unused method argument: prompt
(ARG002)
312-312: Unused method argument: params
(ARG002)
314-314: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Unused method argument: prompt
(ARG002)
320-320: Unused method argument: prompt
(ARG002)
420-422: Avoid specifying long messages outside the exception class
(TRY003)
527-527: Unused method argument: params
(ARG002)
621-621: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
| for param_name in all_param_names: | ||
| if param_name in kwargs: | ||
| params[param_name] = kwargs[param_name] | ||
| else: | ||
| params[param_name] = getattr(self.defaults, param_name) | ||
|
|
||
| return params |
There was a problem hiding this comment.
Restore defaults when params are left unset
When a caller omits tokens_to_generate (the common path, since the public API defaults it to None), extract_and_validate_params now forwards None straight through. That means the subsequent request builders send max_tokens=None / max_output_tokens=None, which OpenAI/vLLM reject with a 400. We need to reapply the GenerationDefaults value whenever the provided value is None and the default is not, so the happy path keeps working.
- for param_name in all_param_names:
- if param_name in kwargs:
- params[param_name] = kwargs[param_name]
- else:
- params[param_name] = getattr(self.defaults, param_name)
+ for param_name in all_param_names:
+ if param_name in kwargs:
+ value = kwargs[param_name]
+ if value is None and hasattr(self.defaults, param_name):
+ default_value = getattr(self.defaults, param_name)
+ if default_value is not None:
+ params[param_name] = default_value
+ continue
+ params[param_name] = value
+ else:
+ params[param_name] = getattr(self.defaults, param_name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for param_name in all_param_names: | |
| if param_name in kwargs: | |
| params[param_name] = kwargs[param_name] | |
| else: | |
| params[param_name] = getattr(self.defaults, param_name) | |
| return params | |
| for param_name in all_param_names: | |
| if param_name in kwargs: | |
| value = kwargs[param_name] | |
| if value is None and hasattr(self.defaults, param_name): | |
| default_value = getattr(self.defaults, param_name) | |
| if default_value is not None: | |
| params[param_name] = default_value | |
| continue | |
| params[param_name] = value | |
| else: | |
| params[param_name] = getattr(self.defaults, param_name) | |
| return params |
🤖 Prompt for AI Agents
In nemo_skills/inference/model/base.py around lines 76 to 82, the current logic
copies values from kwargs even when they are present but set to None, which
causes None to be forwarded for parameters like tokens_to_generate and leads to
downstream 400 errors; change the logic so that for each param_name you use
kwargs[param_name] only if it exists in kwargs and is not None, otherwise fall
back to getattr(self.defaults, param_name) (i.e., treat explicit None as "unset"
and restore the GenerationDefaults value).
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
nemo_skills/inference/model/tool_call.py (1)
105-107: Avoid logging.exception(e); let logger attach the tracebackPass a message, not the exception object.
- except Exception as e: - LOG.exception(e) + except Exception: + LOG.exception("Tool execution failed") return {"error": "Tool execution failed."}Based on static analysis hints
nemo_skills/inference/model/base.py (4)
308-311: Remove duplicate returnSecond return is unreachable.
- return request - return request
316-323: Silence unused-arg warnings in ResponsesHandler call methodsPrompt isn’t used here.
- async def make_async_call(self, request: dict, prompt): + async def make_async_call(self, request: dict, prompt): # noqa: ARG002 @@ - def make_sync_call(self, request: dict, prompt): + def make_sync_call(self, request: dict, prompt): # noqa: ARG002
452-461: Fix return type to allow None from _setup_base_url when base_url is ''Prevents type mismatches.
-def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str: +def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str | None:
628-634: Async streaming should use async chunk iterators_generate_async returns a generator; consider picking async iterators here to avoid blocking on async streams.
- if hasattr(response, "choices") and hasattr(response.choices[0], "message"): - return self._stream_chat_chunks_sync(response) - else: - return self._stream_completion_chunks_sync(response) + if hasattr(response, "choices") and hasattr(response.choices[0], "message"): + return self._stream_chat_chunks_async(response) + else: + return self._stream_completion_chunks_async(response)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_skills/inference/model/base.py(13 hunks)nemo_skills/inference/model/tool_call.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
CompletionConversationManager(145-170)OpenAISchemaAdapter(70-83)ResponsesCallInterpreter(200-218)ResponsesConversationManager(173-192)ResponsesResponseFormatter(221-230)ResponsesSchemaAdapter(233-246)ChatCompletionCallInterpreter(103-122)ChatCompletionResponseFormatter(125-137)add_assistant_response(50-52)add_assistant_response(148-156)add_assistant_response(176-179)parse(35-36)parse(87-90)parse(112-122)parse(203-218)add_tool_results(55-62)add_tool_results(158-170)add_tool_results(181-192)
nemo_skills/inference/model/base.py (4)
nemo_skills/inference/model/openai.py (3)
get_supported_params(79-90)apply_model_specific_params(92-107)_get_api_key(58-70)nemo_skills/inference/model/defaults.py (1)
GenerationDefaults(20-38)nemo_skills/evaluation/metrics/base.py (1)
update(145-189)nemo_skills/inference/model/context_retry.py (1)
ContextLimitRetryConfig(118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/tool_call.py
98-98: Redundant exception object included in logging.exception call
(TRY401)
nemo_skills/inference/model/base.py
69-71: Avoid specifying long messages outside the exception class
(TRY003)
278-278: Avoid specifying long messages outside the exception class
(TRY003)
312-312: Unused method argument: prompt
(ARG002)
312-312: Unused method argument: params
(ARG002)
314-314: Avoid specifying long messages outside the exception class
(TRY003)
316-316: Unused method argument: prompt
(ARG002)
320-320: Unused method argument: prompt
(ARG002)
420-422: Avoid specifying long messages outside the exception class
(TRY003)
527-527: Unused method argument: params
(ARG002)
621-621: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (5)
nemo_skills/inference/model/tool_call.py (2)
71-85: Good adapter/conversation manager selectionClean split between chat-completions and responses API with centralized conversation management. This will keep tool-calling flow consistent across clients.
156-168: LGTM: unified conversation updatesUsing ConversationManager for both assistant and tool-result updates simplifies the flow and keeps formats correct for each client type.
nemo_skills/inference/model/base.py (3)
261-279: Timeout placement and stream guard for Responses API look correctSetting timeout as a top-level request option and explicitly rejecting stream=True until implemented is the right call.
312-315: Silence unused-arg warnings on unsupported completion pathKeep signature but mark args as intentionally unused.
- def build_completion_request_structure(self, prompt: str, params: dict) -> dict: + def build_completion_request_structure(self, prompt: str, params: dict) -> dict: # noqa: ARG002
76-82: Treat explicit None as “unset” to restore defaultsForwarding None (e.g., tokens_to_generate=None) breaks downstream APIs. Fall back to GenerationDefaults when provided value is None and default is not None.
- for param_name in all_param_names: - if param_name in kwargs: - params[param_name] = kwargs[param_name] - else: - params[param_name] = getattr(self.defaults, param_name) + for param_name in all_param_names: + if param_name in kwargs: + value = kwargs[param_name] + if value is None and hasattr(self.defaults, param_name): + default_value = getattr(self.defaults, param_name) + if default_value is not None: + params[param_name] = default_value + continue + params[param_name] = value + else: + params[param_name] = getattr(self.defaults, param_name)
| def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None: | ||
| """Resolve API key from various sources""" | ||
| resolved_key = self._get_api_key(api_key, api_key_env_var, base_url) | ||
| if resolved_key is None: # self-hosted models don't need the key | ||
| resolved_key = "EMPTY" | ||
| return resolved_key | ||
|
|
||
| def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None: | ||
| if api_key: # explicit cmd argument always takes precedence |
There was a problem hiding this comment.
🧩 Analysis chain
Allow None base_url in API-key resolution; avoids TypeError in subclasses
_base_url may be None; loosen types accordingly. Update overrides to match.
-def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
+def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None:
@@
- def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
+ def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None:Please align subclass overrides (e.g., openai.py, azure.py, gemini.py) to accept base_url: str | None.
🏁 Script executed:
#!/bin/bash
rg -n "def _.*api_key" -g "nemo_skills/inference/model/*.py"Length of output: 740
Loosen base_url type to allow None
- In nemo_skills/inference/model/base.py, change both
_resolve_api_keyand_get_api_keysignatures tobase_url: str | None. - In nemo_skills/inference/model/openai.py, azure.py, and gemini.py, update their
_get_api_keymethods to acceptbase_url: str | Noneas well.
🧰 Tools
🪛 Ruff (0.13.1)
469-469: Unused method argument: base_url
(ARG002)
🤖 Prompt for AI Agents
In nemo_skills/inference/model/base.py around lines 462 to 470, loosen the type
of the base_url parameter from str to str | None in both _resolve_api_key and
_get_api_key signatures (i.e., change base_url: str to base_url: str | None) and
adjust any internal uses to handle None safely; also update the corresponding
_get_api_key method signatures in nemo_skills/inference/model/openai.py,
nemo_skills/inference/model/azure.py, and nemo_skills/inference/model/gemini.py
to accept base_url: str | None so the types align across implementations.
| try: | ||
| tool_args = json.loads(tool_args) | ||
| except json.decoder.JSONDecodeError as e: | ||
| LOG.exception(e) | ||
| LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}") | ||
| return {"error": "Tool argument parsing failed."} |
There was a problem hiding this comment.
Handle non-string tool args and avoid JSONDecode-only catch
Responses API may pass dict args; json.loads on dict raises TypeError. Also avoid embedding the exception in logging.exception.
Apply:
- try:
- tool_args = json.loads(tool_args)
- except json.decoder.JSONDecodeError as e:
- LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")
- return {"error": "Tool argument parsing failed."}
+ try:
+ tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args
+ except (json.JSONDecodeError, TypeError):
+ LOG.exception("Failed to parse tool arguments %r", tool_args)
+ return {"error": "Tool argument parsing failed."}Based on static analysis hints
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| tool_args = json.loads(tool_args) | |
| except json.decoder.JSONDecodeError as e: | |
| LOG.exception(e) | |
| LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}") | |
| return {"error": "Tool argument parsing failed."} | |
| try: | |
| tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args | |
| except (json.JSONDecodeError, TypeError): | |
| LOG.exception("Failed to parse tool arguments %r", tool_args) | |
| return {"error": "Tool argument parsing failed."} |
🧰 Tools
🪛 Ruff (0.13.1)
98-98: Redundant exception object included in logging.exception call
(TRY401)
🤖 Prompt for AI Agents
In nemo_skills/inference/model/tool_call.py around lines 95-99, the code
currently calls json.loads on tool_args and only catches JSONDecodeError, which
fails for non-string inputs (e.g., dict) and incorrectly embeds the exception in
LOG.exception; update the logic to first check the type: if tool_args is already
a dict, use it directly; if it's a string, call json.loads inside a try/except
that catches json.decoder.JSONDecodeError and logs with LOG.exception("Failed to
parse tool arguments") without interpolating the exception object; for any other
types return a clear error or coerce safely (e.g., convert bytes to str before
parsing) so TypeError is avoided and logging follows best practices.
|
Thanks for adding the responses api! I just want to share two small suggestions, in case they might be helpful:
|
Kipok
left a comment
There was a problem hiding this comment.
@smahdavi4 could you also review this please? I looked through the changes on the high level, but didn't have a chance to dive into details
@smahdavi4 for your current comment I think @gwarmstrong found that responses api in litellm is in a fairly bad state and it's really hard to make it work, so going through a separate client is basically necessary until they fix it. About completion_type, I think maybe we can do it in another PR, that would be a breaking change and it's not fully clear how responses api will be eventually supported in litellm or other providers in a stable way, so we can also wait before doing a "first-class" support for it. But if we find the current structure inconvenient, we can definitely make this change
@gwarmstrong Since the changes are fairly big and touch many parts, could you please run slurm tests from this branch? Let me know if you need details on how to do this. I will also share another test offline that I will ask you to run for a non-public model that goes through tool-calling api to verify that nothing is broken there
| if extra_body: | ||
| request["extra_body"] = extra_body | ||
|
|
||
| return request |
| """Public method for model-specific parameter handling - override in subclasses""" | ||
| return request | ||
|
|
||
| def parse_chat_completion_response(self, response, **kwargs) -> dict: |
There was a problem hiding this comment.
Is this function inherited anywhere? If not, we can remove this function and directly call _parse_chat_completion_response
| """A helper function to create a tool calling model. | ||
|
|
||
| Args: | ||
| model: Model name/path or model instance |
There was a problem hiding this comment.
Some of the function docstrings and some code comments are not informative enough (probably LLM artifacts). Maybe we could remove them to avoid redundancy?
| if hasattr(output_item, "content") and output_item.content: | ||
| for content_item in output_item.content: | ||
| if hasattr(content_item, "text"): | ||
| reasoning_content += content_item.text + "\n" |
There was a problem hiding this comment.
Shall we somehow keep the order of events? If we want to consider a scenario where we want to distill the outputs of a model into a another model, we need a list of interleaved text and tools.
|
The PR looks great. Left some minor comments. For the litellm responses, it might be a good idea to comment somewhere in the code the complications of the API, so we know for future reference when we come back to it and unify all the APIs. I tried the latest litellm with vllm tool calling and it seems to work very well (so I am not sure where it is to fix it later). |
|
Closed in favor of #889 |
Add Responses API Support for GPT-OSS Generation
This PR adds support for the OpenAI Responses API in nemo-skills, enabling generation with
gpt-ossmodels that use the responses format instead of the standard chat completions API.Key Changes
1. Client Handler Architecture
BaseClientHandlersystem innemo_skills/inference/model/base.pyChatCompletionHandler: Handles standard chat completion and text completion APIs via litellmResponsesHandler: Handles OpenAI Responses API with direct OpenAI client integration2. Unified BaseModel with Client Type Support
use_responses_apiparameter to BaseModel (default: false)3. Strict Parameter Validation System
defaults.pywith shared parameter defaults and validation schemas4. Enhanced Configuration Support
use_responses_apito GenerateSolutionsConfig for direct configuration controlget_model,get_tool_calling_model, etc.) to support use_responses_apiUsage Examples
Pipeline Integration
Summary by CodeRabbit
New Features
Refactor