Skip to content

Add Responses API#844

Closed
gwarmstrong wants to merge 21 commits intomainfrom
georgea/response-api
Closed

Add Responses API#844
gwarmstrong wants to merge 21 commits intomainfrom
georgea/response-api

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Sep 24, 2025

Add Responses API Support for GPT-OSS Generation

This PR adds support for the OpenAI Responses API in nemo-skills, enabling generation with gpt-oss models that use the responses format instead of the standard chat completions API.

Key Changes

1. Client Handler Architecture

  • New BaseClientHandler system in nemo_skills/inference/model/base.py
  • ChatCompletionHandler: Handles standard chat completion and text completion APIs via litellm
  • ResponsesHandler: Handles OpenAI Responses API with direct OpenAI client integration

2. Unified BaseModel with Client Type Support

  • Added use_responses_api parameter to BaseModel (default: false)
  • Removed litellm coupling from BaseModel - moved to ChatCompletionHandler
  • Unified generate methods that delegate to client handlers
  • Two-stage request building: Handler builds structure, model applies customizations

3. Strict Parameter Validation System

  • New defaults.py with shared parameter defaults and validation schemas
  • Explicit parameter handling with strict validation (errors on unsupported non-default params)
  • Model-specific restrictions (e.g., OpenAI reasoning models) in respective model files

4. Enhanced Configuration Support

  • Added use_responses_api to GenerateSolutionsConfig for direct configuration control
  • Updated all factory functions (get_model, get_tool_calling_model, etc.) to support use_responses_api
  • Full tool calling support with responses API

Usage Examples

Pipeline Integration

from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++use_responses_api=true "
        "++prompt_config=your/prompt/config "
        "++inference.tokens_to_generate=8192 "
        "++tool_modules=[nemo_skills.mcp.servers.python_tool::PythonTool] "
        "++inference.temperature=0.6 "
    ),
    input_file='input.jsonl',
    cluster='local',
    model='openai/gpt-oss-20b',
    server_type='vllm',  # Model provider
    server_gpus=2,
    output_dir='outputs/',
    num_random_seeds=1,
    with_sandbox=True,
)

Summary by CodeRabbit

  • New Features

    • Optional Responses API mode available across standard, tool-calling, and code-execution flows via a new toggle.
    • Unified tool-calling with improved conversation management and support for serialized/structured tool outputs.
    • Exposed generation config option to toggle Responses API and added shared generation defaults (tokens, temperature, stop phrases, streaming, tools, etc.).
  • Refactor

    • Pluggable request/response handlers for consistent API calls, parameter validation, and unified response parsing.
    • Enhanced compatibility with reasoning-aware models and automatic parameter mapping.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Walkthrough

Adds a Responses API pathway and pluggable client-handler system for inference, new conversation managers/adapters for tool calling, a GenerationDefaults module, and a use_responses_api flag propagated through model factories and generation setup. OpenAI model logic updated for reasoning-aware parameter handling.

Changes

Cohort / File(s) Summary of Changes
Factory API updates
nemo_skills/inference/model/__init__.py
Added use_responses_api parameter to get_model, get_code_execution_model, and get_tool_calling_model; forwarded flag through creation paths and updated docstrings.
Client-handler architecture
nemo_skills/inference/model/base.py
Added BaseClientHandler, ChatCompletionHandler, ResponsesHandler, and CLIENT_HANDLERS. BaseModel accepts use_responses_api, selects a handler, delegates request construction/calls/parsing, and exposes helper parsing/support methods.
OpenAI model param handling
nemo_skills/inference/model/openai.py
OpenAIModel now accepts use_responses_api; added reasoning-model detection, get_supported_params, and apply_model_specific_params to adjust request shapes (developer/system remapping, reasoning_effort, presence_penalty).
Tool calling adapters & managers
nemo_skills/inference/model/tool_call.py
Added _setup_adapters() to ToolCallingWrapper to pick schema/call/response adapters and a conversation_manager based on use_responses_api. generate_async now routes assistant responses and tool-result integration through the conversation manager; improved JSON error logging.
MCP adapters and conversation managers
nemo_skills/mcp/adapters.py
Added ConversationManager (abstract), CompletionConversationManager, ResponsesConversationManager, ResponsesSchemaAdapter, ResponsesCallInterpreter, and ResponsesResponseFormatter; expanded typing imports (Any, Dict).
Generation config plumbing
nemo_skills/inference/generate.py
Added GenerateSolutionsConfig.use_responses_api and forwarded it to get_model, get_code_execution_model, and get_tool_calling_model in GenerationTask.setup_llm().
Defaults module
nemo_skills/inference/model/defaults.py
New GenerationDefaults dataclass with shared generation parameters and parameter sets: CHAT_COMPLETION_PARAMS, RESPONSES_PARAMS, COMPLETION_PARAMS.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant GenerationTask
  participant Factory as Model Factory
  participant Model as BaseModel
  participant Handler as ClientHandler
  participant Provider as LLM API

  User->>GenerationTask: run(config.use_responses_api)
  GenerationTask->>Factory: get_*_model(use_responses_api)
  Factory-->>GenerationTask: model instance
  GenerationTask->>Model: generate_async(prompt, params)
  Model->>Handler: call_api_async(prompt, params)
  Handler->>Handler: build request (chat/responses)
  Handler->>Provider: API call
  Provider-->>Handler: response
  Handler->>Model: parse_response(...)
  Model-->>GenerationTask: generation (content, tool_calls, serialized_output)
Loading
sequenceDiagram
  autonumber
  participant TC as ToolCallingWrapper
  participant Conv as ConversationManager
  participant CI as CallInterpreter
  participant Tools as Tools Runtime

  Note over TC: Init chooses adapters<br/>based on use_responses_api
  TC->>Conv: add_assistant_response(conversation, generation)
  TC->>CI: parse(tool_calls)
  CI-->>TC: normalized tool_calls
  TC->>Tools: _execute_tool_calls(...)
  Tools-->>TC: tool_results
  TC->>Conv: add_tool_results(conversation, tool_calls_msg, formatted_results)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

I twitch my ears at toggled flows,
Two paths now ripple where one rose.
Handlers hop, adapters chime,
Conversations keep their time.
With tools and responses in one warren deep,
I ship this patch—and take a leap. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Add Responses API” succinctly reflects the primary goal of the changeset—introducing support for the Responses API—and is clear, concise, and directly related to the main feature without extraneous details.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/response-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_skills/pipeline/utils/server.py (1)

155-169: Fix server startup for “responses”
serve_vllm.py doesn’t expose a /v1/responses endpoint, so launching it for server_type="responses" will break ResponsesModel clients. Either add /v1/responses support to serve_vllm.py or invoke a dedicated serve_responses entrypoint, and update any server_type validations to include "responses".

🧹 Nitpick comments (7)
nemo_skills/inference/model/tool_call.py (2)

71-89: Prefer capability flag over isinstance() to select adapters.

Avoid tight coupling to ResponsesModel; check a capability flag on the model instead.

Apply this diff:

-        # Import here to avoid circular imports
-        from .responses import ResponsesModel
-
-        # Detect model type and configure adapters
-        if isinstance(self.model, ResponsesModel):
+        # Detect model type and configure adapters
+        if getattr(self.model, "USES_RESPONSES_API", False):
             # Responses API model - uses flatter tool schema format
             self.schema_adapter = ResponsesSchemaAdapter()
             self.call_interpreter = ResponsesCallInterpreter()
             self.response_formatter = ResponsesResponseFormatter()
             self.conversation_manager = ResponsesConversationManager()
         else:
             # Chat completion model (default) - uses nested function format
             self.schema_adapter = OpenAISchemaAdapter()
             self.call_interpreter = ChatCompletionCallInterpreter()
             self.response_formatter = ChatCompletionResponseFormatter()
             self.conversation_manager = CompletionConversationManager()

Also set a flag on the model (outside this file):

# in nemo_skills/inference/model/responses.py
class ResponsesModel(BaseModel):
    USES_RESPONSES_API = True

99-103: Return richer error context on JSON arg parse failure.

Include tool name and raw args; avoid noisy stack traces for expected parse errors.

-        except json.decoder.JSONDecodeError as e:
-            LOG.exception(e)
-            return {"error": "Tool argument parsing failed."}
+        except json.decoder.JSONDecodeError:
+            LOG.warning("Tool argument parsing failed for %s", tool_name)
+            return {"error": "Tool argument parsing failed.", "tool": tool_name, "raw_args": tool_args}
nemo_skills/mcp/adapters.py (2)

186-197: Silence Ruff ARG002: unused parameter.

Rename the unused parameter to underscore.

-    def add_tool_results(
+    def add_tool_results(
         self,
-        conversation: List[Dict[str, Any]],
-        tool_calls_message: Dict[str, Any],
-        formatted_results: List[Dict[str, Any]],
+        conversation: List[Dict[str, Any]],
+        _tool_calls_message: Dict[str, Any],
+        formatted_results: List[Dict[str, Any]],
     ) -> None:

228-235: Align subclass signature with base interface to avoid type-checker conflicts.

The base ToolResponseFormatter.format expects a ChatCompletionMessageToolCall; this subclass narrows the type. Either relax the base interface or drop annotations here.

-    def format(self, tool_call: Dict[str, Any], result: Dict[str, Any]) -> Dict[str, Any]:
+    def format(self, tool_call, result):
         """Format the response from a tool call for responses API."""
         return {
             "type": "function_call_output",
             "call_id": tool_call["id"],
             "output": json.dumps(result) if not isinstance(result, str) else result,
         }

Optionally, broaden the base interface (outside this hunk) to accept Any for tool_call/result.

nemo_skills/inference/model/responses.py (3)

131-131: Rename unused parameter to underscore.

Prevents ARG002 lint and clarifies intent.

-    def _parse_responses_response(self, response, include_response: bool = False, **kwargs) -> dict:
+    def _parse_responses_response(self, response, include_response: bool = False, **_kwargs) -> dict:

270-272: Reduce response logging to debug (optional).

Logging full response objects at INFO can be noisy and may include sensitive content. Consider downgrading to DEBUG or summarizing.

-        LOG.info(f"Response from server: {response}")
-        LOG.info(f"Response type: {type(response)}")
+        LOG.debug("Response type: %s", type(response))

337-339: Reduce sync response logging to debug (optional).

Same rationale as async path.

-        LOG.info(f"Response from server: {response}")
-        LOG.info(f"Response type: {type(response)}")
+        LOG.debug("Response type: %s", type(response))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c1c65b and 97b4f2f.

📒 Files selected for processing (5)
  • nemo_skills/inference/model/__init__.py (2 hunks)
  • nemo_skills/inference/model/responses.py (1 hunks)
  • nemo_skills/inference/model/tool_call.py (4 hunks)
  • nemo_skills/mcp/adapters.py (3 hunks)
  • nemo_skills/pipeline/utils/server.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/model/responses.py (4)
nemo_skills/utils.py (1)
  • get_logger_name (130-131)
nemo_skills/inference/model/base.py (3)
  • BaseModel (32-513)
  • _build_request_params (193-200)
  • _maybe_apply_stop_phrase_removal (149-153)
nemo_skills/inference/model/context_retry.py (1)
  • with_context_retry (162-182)
nemo_skills/inference/model/tool_call.py (1)
  • generate_async (122-183)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
  • CompletionConversationManager (145-170)
  • OpenAISchemaAdapter (70-83)
  • ResponsesCallInterpreter (204-222)
  • ResponsesConversationManager (173-196)
  • ResponsesResponseFormatter (225-234)
  • ResponsesSchemaAdapter (237-250)
  • ChatCompletionCallInterpreter (103-122)
  • ChatCompletionResponseFormatter (125-137)
  • add_assistant_response (50-52)
  • add_assistant_response (148-156)
  • add_assistant_response (176-183)
  • parse (35-36)
  • parse (87-90)
  • parse (112-122)
  • parse (207-222)
  • add_tool_results (55-62)
  • add_tool_results (158-170)
  • add_tool_results (185-196)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/responses.py

52-52: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


58-58: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


103-103: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


131-131: Unused method argument: kwargs

(ARG002)


204-204: Do not catch blind exception: Exception

(BLE001)


221-221: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


230-230: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


238-238: Prefer TypeError exception for invalid type

(TRY004)


238-238: Avoid specifying long messages outside the exception class

(TRY003)


291-291: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


300-300: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


308-308: Prefer TypeError exception for invalid type

(TRY004)


308-308: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/mcp/adapters.py

188-188: Unused method argument: tool_calls_message

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (3)
nemo_skills/inference/model/__init__.py (1)

31-31: Wire-up for ResponsesModel looks good.

Import and registry exposure for "responses" are correct.

Also applies to: 51-51

nemo_skills/pipeline/utils/server.py (1)

126-127: LGTM: mount check allowlist updated.

Including "responses" alongside vllm/sglang/trtllm is appropriate here.

nemo_skills/inference/model/tool_call.py (1)

159-161: Good centralization via conversation_manager.

Appending assistant responses through the manager keeps formats consistent across model types.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
nemo_skills/inference/generate.py (1)

193-202: Fix attribute access: self.cfg.use_completions_api → self.use_completions_api

GenerateSolutionsConfig has no cfg attribute. This will crash on Megatron path.

Apply this diff:

-            self.cfg.use_completions_api = True
+            self.use_completions_api = True
nemo_skills/inference/model/tool_call.py (1)

92-100: Handle non-string tool arguments (dict) to avoid TypeError

Responses API may supply already-parsed dicts. Current code only catches JSONDecodeError and will raise on non-str types.

Apply this diff:

-        tool_args = tool_call["function"]["arguments"]
-
-        ##
-        # TODO(sanyamk): Not all tool arguments might necessarily be in JSON format.
-        #   Kept here to handle errors for now.
-        try:
-            tool_args = json.loads(tool_args)
-        except json.decoder.JSONDecodeError as e:
-            LOG.exception(e)
-            return {"error": "Tool argument parsing failed."}
+        tool_args = tool_call["function"]["arguments"]
+        # Accept both JSON strings and dicts
+        if isinstance(tool_args, str):
+            try:
+                tool_args = json.loads(tool_args)
+            except json.JSONDecodeError as e:
+                LOG.exception("Tool argument parsing failed: %s", e)
+                return {"error": "Tool argument parsing failed."}
+        elif not isinstance(tool_args, dict):
+            LOG.warning("Unexpected tool argument type: %s", type(tool_args).__name__)
+            tool_args = {}
nemo_skills/inference/model/base.py (1)

641-719: Handle potential UnboundLocalError in retry loop.

If the while loop exhausts all retries without returning, the result variable will be undefined on line 719, causing an UnboundLocalError.

Apply this diff to fix the UnboundLocalError:

 # TODO: remove this after we no longer use gpt-oss or it's fixed in vllm
 max_retries = 2
 retry_count = 0
+result = None
 
 while retry_count <= max_retries:
     try:
         # Delegate to client handler using public API
         response = await self.client_handler.call_api_async(prompt, **kwargs)
 
         if stream:
             return self._handle_streaming_response(response)
         else:
             result = self.client_handler.parse_response(response, **kwargs)
             if remove_stop_phrases:
                 self._maybe_apply_stop_phrase_removal(result, remove_stop_phrases, stop_phrases)
             return result
 
     except openai.BadRequestError as e:
         if "output messages (reasoning and final)" in str(e):
             if retry_count < max_retries:
                 retry_count += 1
                 LOG.warning(f"BadRequestError, retrying {retry_count}/{max_retries}: {e}")
                 continue
 
             LOG.error(f"BadRequestError after {max_retries} retries, returning empty response: {e}")
             return {"generation": "", "reasoning_content": "", "num_generated_tokens": 0}
         else:
             raise e
 
-return result
+# This should never be reached unless retry logic is incorrect
+LOG.error("Unexpected end of retry loop without result")
+return {"generation": "", "num_generated_tokens": 0}
🧹 Nitpick comments (5)
nemo_skills/inference/model/defaults.py (1)

41-79: De-duplicate param sets to prevent drift between CHAT_COMPLETION_PARAMS and RESPONSES_PARAMS

Both sets are currently identical. Define a single base set and derive others (add/remove as needed) to avoid future divergence bugs.

nemo_skills/inference/model/__init__.py (1)

58-58: Optional: shorten exception message to appease Ruff TRY003

Current message length triggers TRY003. Consider a shorter message or a custom exception type.

nemo_skills/inference/model/tool_call.py (1)

45-46: Update outdated docstring

The wrapper now supports both chat-completions and responses via adapters; update the TODO.

nemo_skills/inference/model/base.py (2)

593-615: Consider improving the fallback serialization strategy.

The current implementation silently catches all exceptions and falls back to string representation. Consider logging more specific error types or providing more structured fallback data.

Consider this improved error handling approach:

 def _serialize_response_output(self, response):
     """Serialize response output objects using model_dump() for conversation history."""
     serialized_output = []
 
     if hasattr(response, "output") and response.output:
         for output_item in response.output:
             try:
                 # Try to use model_dump() method if available (Pydantic models)
                 if hasattr(output_item, "model_dump"):
                     serialized_output.append(output_item.model_dump())
                 # Fallback to dict conversion
                 elif hasattr(output_item, "__dict__"):
                     serialized_output.append(output_item.__dict__)
                 # Last resort: convert to string representation
                 else:
                     serialized_output.append({"content": str(output_item), "type": "unknown"})
-            except Exception as e:
-                LOG.warning(f"Failed to serialize output item: {e}")
-                # Fallback serialization
-                serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})
+            except (AttributeError, TypeError) as e:
+                LOG.warning(f"Failed to serialize output item of type {type(output_item).__name__}: {e}")
+                # Fallback serialization with type information
+                serialized_output.append({
+                    "content": str(output_item), 
+                    "type": "serialization_error", 
+                    "original_type": type(output_item).__name__,
+                    "error": str(e)
+                })
+            except Exception as e:
+                LOG.error(f"Unexpected error serializing output item: {e}")
+                # Generic fallback for unexpected errors
+                serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})
 
     return serialized_output

405-429: Consider extracting SSH tunnel configuration to a dataclass.

The SSH tunnel setup logic could be encapsulated in a configuration dataclass for better maintainability and reusability.

Would you like me to create a separate SSHTunnelConfig dataclass to encapsulate the SSH configuration logic and make it more reusable?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97b4f2f and 5688efe.

📒 Files selected for processing (6)
  • nemo_skills/inference/generate.py (2 hunks)
  • nemo_skills/inference/model/__init__.py (2 hunks)
  • nemo_skills/inference/model/base.py (13 hunks)
  • nemo_skills/inference/model/defaults.py (1 hunks)
  • nemo_skills/inference/model/openai.py (4 hunks)
  • nemo_skills/inference/model/tool_call.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
  • CompletionConversationManager (145-170)
  • OpenAISchemaAdapter (70-83)
  • ResponsesCallInterpreter (204-222)
  • ResponsesConversationManager (173-196)
  • ResponsesResponseFormatter (225-234)
  • ResponsesSchemaAdapter (237-250)
  • ChatCompletionCallInterpreter (103-122)
  • ChatCompletionResponseFormatter (125-137)
  • add_assistant_response (50-52)
  • add_assistant_response (148-156)
  • add_assistant_response (176-183)
  • parse (35-36)
  • parse (87-90)
  • parse (112-122)
  • parse (207-222)
  • add_tool_results (55-62)
  • add_tool_results (158-170)
  • add_tool_results (185-196)
nemo_skills/inference/generate.py (1)
nemo_skills/inference/model/__init__.py (3)
  • get_code_execution_model (62-70)
  • get_tool_calling_model (97-113)
  • get_model (53-59)
nemo_skills/inference/model/openai.py (1)
nemo_skills/inference/model/base.py (5)
  • get_supported_params (44-46)
  • get_supported_params (150-151)
  • get_supported_params (257-258)
  • get_supported_params (501-503)
  • apply_model_specific_params (505-507)
nemo_skills/inference/model/base.py (3)
nemo_skills/inference/model/openai.py (3)
  • get_supported_params (81-92)
  • apply_model_specific_params (94-109)
  • _get_api_key (60-72)
nemo_skills/inference/model/defaults.py (1)
  • GenerationDefaults (20-38)
nemo_skills/inference/model/context_retry.py (1)
  • ContextLimitRetryConfig (118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/base.py

73-75: Avoid specifying long messages outside the exception class

(TRY003)


310-310: Unused method argument: prompt

(ARG002)


310-310: Unused method argument: params

(ARG002)


312-312: Avoid specifying long messages outside the exception class

(TRY003)


314-314: Unused method argument: prompt

(ARG002)


318-318: Unused method argument: prompt

(ARG002)


400-400: Avoid specifying long messages outside the exception class

(TRY003)


505-505: Unused method argument: params

(ARG002)


609-609: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (9)
nemo_skills/inference/generate.py (2)

95-97: LGTM: client_type surfaced in config

Propagating client_type at the config level is correct and consistent with model init.


342-346: LGTM: client_type forwarded to code execution model

Forwarding client_type here maintains consistent handler selection.

nemo_skills/inference/model/tool_call.py (1)

71-86: Adapters selection looks correct

Using client_type to choose schema/interpreter/formatter/conversation manager is appropriate and avoids import-time cycles.

nemo_skills/inference/model/base.py (6)

32-138: Well-designed handler abstraction with clear public API.

The new BaseClientHandler class provides a clean separation of concerns, clearly distinguishing between public methods that models can override and internal implementation details. The parameter extraction/validation logic with support for model-specific restrictions is particularly well thought out.


140-245: Clean implementation of chat completion handler.

The ChatCompletionHandler properly encapsulates litellm-specific logic, handles both chat and completion APIs, and correctly manages extra_body parameters for non-standard fields. The setup of httpx sessions with appropriate connection limits ensures efficient connection pooling.


247-325: Comprehensive ResponsesHandler implementation for the new API format.

The handler correctly uses the OpenAI client directly and properly structures requests for the responses API, including proper handling of tools and extra_body parameters.


517-592: Robust response parsing for the Responses API with proper fallbacks.

The implementation handles various response structures gracefully, including tool calls, reasoning content, and message content. The serialization logic for conversation history is comprehensive with good error handling.


375-404: Well-structured initialization with proper separation of concerns.

The initialization sequence is logical - setup networking, resolve API configuration, then initialize the handler. The public client_handler attribute allows for clean delegation while maintaining encapsulation.


48-86: extract_and_validate_params correctly applies model-level filtering
extract_and_validate_params intersects handler-level and OpenAI’s get_supported_params (non-empty) and skips filtering for base models. No issues found.

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT switch to use_responses_api parameter

Signed-off-by: George Armstrong <georgea@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
nemo_skills/inference/model/base.py (2)

310-313: Silence linters for unused args in completion-path error

Avoid ARG002/TRY003 nits; keep behavior unchanged.

-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:  # noqa: ARG002
         """Responses API doesn't support completion - raise error"""
         raise ValueError("ResponsesHandler only supports message lists, not string prompts")

711-736: Remove unreachable trailing return

result is always returned inside the loop; trailing return can be dropped.

-        return result
+        # Unreachable
+        # return result
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5688efe and 4e3a5a4.

📒 Files selected for processing (5)
  • nemo_skills/inference/generate.py (2 hunks)
  • nemo_skills/inference/model/__init__.py (2 hunks)
  • nemo_skills/inference/model/base.py (13 hunks)
  • nemo_skills/inference/model/openai.py (4 hunks)
  • nemo_skills/inference/model/tool_call.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemo_skills/inference/generate.py
🧰 Additional context used
🧬 Code graph analysis (4)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
  • CompletionConversationManager (145-170)
  • OpenAISchemaAdapter (70-83)
  • ResponsesCallInterpreter (204-222)
  • ResponsesConversationManager (173-196)
  • ResponsesResponseFormatter (225-234)
  • ResponsesSchemaAdapter (237-250)
  • ChatCompletionCallInterpreter (103-122)
  • ChatCompletionResponseFormatter (125-137)
  • add_assistant_response (50-52)
  • add_assistant_response (148-156)
  • add_assistant_response (176-183)
  • parse (35-36)
  • parse (87-90)
  • parse (112-122)
  • parse (207-222)
  • add_tool_results (55-62)
  • add_tool_results (158-170)
  • add_tool_results (185-196)
nemo_skills/inference/model/__init__.py (1)
nemo_skills/inference/chat_interface/core.py (2)
  • get (136-151)
  • sandbox (177-178)
nemo_skills/inference/model/openai.py (1)
nemo_skills/inference/model/base.py (5)
  • get_supported_params (44-46)
  • get_supported_params (150-151)
  • get_supported_params (257-258)
  • get_supported_params (518-520)
  • apply_model_specific_params (522-524)
nemo_skills/inference/model/base.py (3)
nemo_skills/inference/model/openai.py (3)
  • get_supported_params (81-92)
  • apply_model_specific_params (94-109)
  • _get_api_key (60-72)
nemo_skills/inference/model/defaults.py (1)
  • GenerationDefaults (20-38)
nemo_skills/inference/model/context_retry.py (1)
  • ContextLimitRetryConfig (118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/__init__.py

65-65: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/base.py

73-75: Avoid specifying long messages outside the exception class

(TRY003)


310-310: Unused method argument: prompt

(ARG002)


310-310: Unused method argument: params

(ARG002)


312-312: Avoid specifying long messages outside the exception class

(TRY003)


314-314: Unused method argument: prompt

(ARG002)


318-318: Unused method argument: prompt

(ARG002)


415-417: Avoid specifying long messages outside the exception class

(TRY003)


522-522: Unused method argument: params

(ARG002)


626-626: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (4)
nemo_skills/inference/model/__init__.py (2)

53-66: Normalize server_type, add unknown-type guard, and make trtllm check case-insensitive

Prevents KeyError on unknown inputs and fixes mixed‑case trtllm from bypassing the guard.

-def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs):
+def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs):
@@
-    model_class = models[server_type.lower()]
-    if server_type == "trtllm" and kwargs.get("enable_soft_fail", False):
+    st = server_type.lower()
+    if st not in models:
+        raise ValueError(f"Unknown server_type: {server_type}. Choices: {list(models.keys())}")
+    model_class = models[st]
+    if st == "trtllm" and kwargs.get("enable_soft_fail", False):
         if kwargs.get("context_limit_retry_strategy", None) is not None:
             raise ValueError("context_limit_retry_strategy is not supported for trtllm")
     return model_class(use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)

122-135: Fix incorrect keyword when delegating to get_model

Passing model=... raises TypeError; get_model expects server_type.

-    if isinstance(model, str):
-        model = get_model(model=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)
+    if isinstance(model, str):
+        model = get_model(server_type=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)
nemo_skills/inference/model/tool_call.py (1)

68-86: Adapters setup and conversation manager integration look good

Good separation by responses vs chat and centralized conversation updates.

Also applies to: 156-168

nemo_skills/inference/model/openai.py (1)

81-92: Replace self.model_name_or_path with self.model in reasoning checks

self.model_name_or_path is not defined on OpenAIModel; this will raise AttributeError at runtime.

-        if self.is_reasoning_model(self.model_name_or_path):
+        if self.is_reasoning_model(self.model):
             base_params -= OPENAI_REASONING_UNSUPPORTED
@@
-        if self.is_reasoning_model(self.model_name_or_path):
+        if self.is_reasoning_model(self.model):
             # Reasoning model specific handling
             if "messages" in request:
                 request["messages"] = [
                     {**msg, "role": "developer"} if msg.get("role") == "system" else msg for msg in request["messages"]
                 ]
             if params.get("reasoning_effort"):
                 request["reasoning_effort"] = params["reasoning_effort"]

Also applies to: 94-110

gwarmstrong and others added 5 commits September 25, 2025 15:21
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Moves timeout to the main body parameters

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
nemo_skills/mcp/adapters.py (1)

181-189: Silence unused param in ResponsesConversationManager.add_tool_results.

Rename to underscore to satisfy Ruff ARG002 and clarify intent.

Apply this diff:

-    def add_tool_results(
-        self,
-        conversation: List[Dict[str, Any]],
-        tool_calls_message: Dict[str, Any],
-        formatted_results: List[Dict[str, Any]],
-    ) -> None:
+    def add_tool_results(
+        self,
+        conversation: List[Dict[str, Any]],
+        _tool_calls_message: Dict[str, Any],
+        formatted_results: List[Dict[str, Any]],
+    ) -> None:  # noqa: ARG002

As per static analysis.

nemo_skills/inference/model/base.py (3)

308-315: Suppress unused-arg warnings in ResponsesHandler.make_*_call.

prompt isn’t used; rename to underscore.

Apply this diff:

-    async def make_async_call(self, request: dict, prompt):
+    async def make_async_call(self, request: dict, _prompt):  # noqa: ARG002
@@
-    def make_sync_call(self, request: dict, prompt):
+    def make_sync_call(self, request: dict, _prompt):  # noqa: ARG002

597-619: Narrow overly broad exception in _serialize_response_output.

Catching Exception hides real failures and trips linters.

Apply this diff:

-                except Exception as e:
+                except (AttributeError, TypeError, ValueError) as e:
                     LOG.warning(f"Failed to serialize output item: {e}")
                     # Fallback serialization
                     serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})

519-522: Suppress unused-arg warning for base apply_model_specific_params.

Subclasses use params; base impl doesn’t.

Apply this diff:

-    def apply_model_specific_params(self, request: dict, params: dict) -> dict:
+    def apply_model_specific_params(self, request: dict, params: dict) -> dict:  # noqa: ARG002
         """Public method for model-specific parameter handling - override in subclasses"""
         return request
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8428242 and f8c6367.

📒 Files selected for processing (2)
  • nemo_skills/inference/model/base.py (13 hunks)
  • nemo_skills/mcp/adapters.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/model/base.py (6)
nemo_skills/inference/model/openai.py (3)
  • get_supported_params (79-90)
  • apply_model_specific_params (92-107)
  • _get_api_key (58-70)
nemo_skills/inference/model/defaults.py (1)
  • GenerationDefaults (20-38)
nemo_skills/inference/model/context_retry.py (1)
  • ContextLimitRetryConfig (118-159)
nemo_skills/inference/model/azure.py (1)
  • _get_api_key (32-39)
nemo_skills/inference/model/gemini.py (1)
  • _get_api_key (32-39)
nemo_skills/inference/model/megatron.py (1)
  • _parse_completion_response (72-106)
🪛 Ruff (0.13.1)
nemo_skills/mcp/adapters.py

184-184: Unused method argument: tool_calls_message

(ARG002)

nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)


304-304: Unused method argument: prompt

(ARG002)


304-304: Unused method argument: params

(ARG002)


306-306: Avoid specifying long messages outside the exception class

(TRY003)


308-308: Unused method argument: prompt

(ARG002)


312-312: Unused method argument: prompt

(ARG002)


412-414: Avoid specifying long messages outside the exception class

(TRY003)


519-519: Unused method argument: params

(ARG002)


613-613: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (5)
nemo_skills/mcp/adapters.py (2)

146-171: ConversationManager abstraction and default/chat implementations look solid.

Interfaces and responsibilities are clear; integration points are cohesive.


200-219: Do not silently fall back to 'unknown' for tool_call id/name; raise on malformed input.

Fail fast if a tool_call lacks id or name to avoid corrupting conversation state later.

Apply this diff:

 class ResponsesCallInterpreter(ToolCallInterpreter):
@@
-    def parse(self, tool_calls: List[Any]) -> Dict[str, Any]:
+    def parse(self, tool_calls: List[Any]) -> Dict[str, Any]:
         """Parse tool calls from responses API format."""
         parsed_calls = []
 
         for tool_call in tool_calls:
-            parsed_call = {
-                "type": "function",
-                "id": getattr(tool_call, "call_id", getattr(tool_call, "id", "unknown")),
-                "function": {
-                    "name": getattr(tool_call, "name", "unknown"),
-                    "arguments": getattr(tool_call, "arguments", "{}"),
-                },
-            }
+            call_id = getattr(tool_call, "call_id", getattr(tool_call, "id", None))
+            name = getattr(tool_call, "name", None)
+            args = getattr(tool_call, "arguments", "{}")
+            if not call_id or not name:
+                raise ValueError(f"Malformed tool_call: missing id or name: {tool_call}")
+            parsed_call = {
+                "type": "function",
+                "id": call_id,
+                "function": {"name": name, "arguments": args},
+            }
             parsed_calls.append(parsed_call)
 
         return {"role": "assistant", "tool_calls": parsed_calls}
nemo_skills/inference/model/base.py (3)

304-306: Suppress unused-arg warnings in ResponsesHandler.build_completion_request_structure.

Keep signature but mark unused to satisfy linters.

Apply this diff:

-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, _prompt: str, _params: dict) -> dict:  # noqa: ARG002
         """Responses API doesn't support completion - raise error"""
-        raise ValueError("ResponsesHandler only supports message lists, not string prompts")
+        raise ValueError("ResponsesHandler only supports message lists, not string prompts")  # noqa: TRY003

62-71: Param validation correctly allows None-equals-default for unsupported params.

Prevents false positives when callers pass default None. Good.


261-301: ResponsesHandler: move timeout to top-level; block stream until supported.

Timeout in extra_body is ignored by OpenAI client; streaming for Responses isn't parsed by _handle_streaming_response and will break.

Apply this diff:

     def build_chat_request_structure(self, messages: list, params: dict) -> dict:
         """Build responses API request structure"""
         # Use proper list of dicts format for vLLM servers
         request = {
             "input": messages,
             "max_output_tokens": params["tokens_to_generate"],
             "temperature": params["temperature"],
             "top_p": params["top_p"],
-            "stream": params["stream"],
+            "stream": params["stream"],
         }
+
+        # Pass timeout to client at top-level
+        if params["timeout"] is not None:
+            request["timeout"] = params["timeout"]
+
+        # Guard until streaming parsing for Responses is implemented
+        if params["stream"]:
+            raise ValueError("stream=True is not supported for ResponsesHandler yet.")
@@
-        if params["timeout"] is not None:
-            extra_body["timeout"] = params["timeout"]

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8c6367 and ab7ad54.

📒 Files selected for processing (1)
  • nemo_skills/inference/model/base.py (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/inference/model/base.py (5)
nemo_skills/inference/model/defaults.py (1)
  • GenerationDefaults (20-38)
nemo_skills/inference/model/context_retry.py (1)
  • ContextLimitRetryConfig (118-159)
nemo_skills/inference/model/gemini.py (1)
  • _get_api_key (32-39)
nemo_skills/inference/model/azure.py (1)
  • _get_api_key (32-39)
nemo_skills/inference/model/megatron.py (1)
  • _parse_completion_response (72-106)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)


278-278: Avoid specifying long messages outside the exception class

(TRY003)


312-312: Unused method argument: prompt

(ARG002)


312-312: Unused method argument: params

(ARG002)


314-314: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Unused method argument: prompt

(ARG002)


320-320: Unused method argument: prompt

(ARG002)


420-422: Avoid specifying long messages outside the exception class

(TRY003)


527-527: Unused method argument: params

(ARG002)


621-621: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests

Comment on lines +76 to +82
for param_name in all_param_names:
if param_name in kwargs:
params[param_name] = kwargs[param_name]
else:
params[param_name] = getattr(self.defaults, param_name)

return params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Restore defaults when params are left unset

When a caller omits tokens_to_generate (the common path, since the public API defaults it to None), extract_and_validate_params now forwards None straight through. That means the subsequent request builders send max_tokens=None / max_output_tokens=None, which OpenAI/vLLM reject with a 400. We need to reapply the GenerationDefaults value whenever the provided value is None and the default is not, so the happy path keeps working.

-        for param_name in all_param_names:
-            if param_name in kwargs:
-                params[param_name] = kwargs[param_name]
-            else:
-                params[param_name] = getattr(self.defaults, param_name)
+        for param_name in all_param_names:
+            if param_name in kwargs:
+                value = kwargs[param_name]
+                if value is None and hasattr(self.defaults, param_name):
+                    default_value = getattr(self.defaults, param_name)
+                    if default_value is not None:
+                        params[param_name] = default_value
+                        continue
+                params[param_name] = value
+            else:
+                params[param_name] = getattr(self.defaults, param_name)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for param_name in all_param_names:
if param_name in kwargs:
params[param_name] = kwargs[param_name]
else:
params[param_name] = getattr(self.defaults, param_name)
return params
for param_name in all_param_names:
if param_name in kwargs:
value = kwargs[param_name]
if value is None and hasattr(self.defaults, param_name):
default_value = getattr(self.defaults, param_name)
if default_value is not None:
params[param_name] = default_value
continue
params[param_name] = value
else:
params[param_name] = getattr(self.defaults, param_name)
return params
🤖 Prompt for AI Agents
In nemo_skills/inference/model/base.py around lines 76 to 82, the current logic
copies values from kwargs even when they are present but set to None, which
causes None to be forwarded for parameters like tokens_to_generate and leads to
downstream 400 errors; change the logic so that for each param_name you use
kwargs[param_name] only if it exists in kwargs and is not None, otherwise fall
back to getattr(self.defaults, param_name) (i.e., treat explicit None as "unset"
and restore the GenerationDefaults value).

gwarmstrong and others added 3 commits September 26, 2025 13:03
Signed-off-by: George Armstrong <georgea@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
nemo_skills/inference/model/tool_call.py (1)

105-107: Avoid logging.exception(e); let logger attach the traceback

Pass a message, not the exception object.

-        except Exception as e:
-            LOG.exception(e)
+        except Exception:
+            LOG.exception("Tool execution failed")
             return {"error": "Tool execution failed."}

Based on static analysis hints

nemo_skills/inference/model/base.py (4)

308-311: Remove duplicate return

Second return is unreachable.

-        return request
-
         return request

316-323: Silence unused-arg warnings in ResponsesHandler call methods

Prompt isn’t used here.

-    async def make_async_call(self, request: dict, prompt):
+    async def make_async_call(self, request: dict, prompt):  # noqa: ARG002
@@
-    def make_sync_call(self, request: dict, prompt):
+    def make_sync_call(self, request: dict, prompt):  # noqa: ARG002

452-461: Fix return type to allow None from _setup_base_url when base_url is ''

Prevents type mismatches.

-def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str:
+def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str | None:

628-634: Async streaming should use async chunk iterators

_generate_async returns a generator; consider picking async iterators here to avoid blocking on async streams.

-        if hasattr(response, "choices") and hasattr(response.choices[0], "message"):
-            return self._stream_chat_chunks_sync(response)
-        else:
-            return self._stream_completion_chunks_sync(response)
+        if hasattr(response, "choices") and hasattr(response.choices[0], "message"):
+            return self._stream_chat_chunks_async(response)
+        else:
+            return self._stream_completion_chunks_async(response)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab7ad54 and 503a9f1.

📒 Files selected for processing (2)
  • nemo_skills/inference/model/base.py (13 hunks)
  • nemo_skills/inference/model/tool_call.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/inference/model/tool_call.py (1)
nemo_skills/mcp/adapters.py (18)
  • CompletionConversationManager (145-170)
  • OpenAISchemaAdapter (70-83)
  • ResponsesCallInterpreter (200-218)
  • ResponsesConversationManager (173-192)
  • ResponsesResponseFormatter (221-230)
  • ResponsesSchemaAdapter (233-246)
  • ChatCompletionCallInterpreter (103-122)
  • ChatCompletionResponseFormatter (125-137)
  • add_assistant_response (50-52)
  • add_assistant_response (148-156)
  • add_assistant_response (176-179)
  • parse (35-36)
  • parse (87-90)
  • parse (112-122)
  • parse (203-218)
  • add_tool_results (55-62)
  • add_tool_results (158-170)
  • add_tool_results (181-192)
nemo_skills/inference/model/base.py (4)
nemo_skills/inference/model/openai.py (3)
  • get_supported_params (79-90)
  • apply_model_specific_params (92-107)
  • _get_api_key (58-70)
nemo_skills/inference/model/defaults.py (1)
  • GenerationDefaults (20-38)
nemo_skills/evaluation/metrics/base.py (1)
  • update (145-189)
nemo_skills/inference/model/context_retry.py (1)
  • ContextLimitRetryConfig (118-159)
🪛 Ruff (0.13.1)
nemo_skills/inference/model/tool_call.py

98-98: Redundant exception object included in logging.exception call

(TRY401)

nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)


278-278: Avoid specifying long messages outside the exception class

(TRY003)


312-312: Unused method argument: prompt

(ARG002)


312-312: Unused method argument: params

(ARG002)


314-314: Avoid specifying long messages outside the exception class

(TRY003)


316-316: Unused method argument: prompt

(ARG002)


320-320: Unused method argument: prompt

(ARG002)


420-422: Avoid specifying long messages outside the exception class

(TRY003)


527-527: Unused method argument: params

(ARG002)


621-621: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: unit-tests
🔇 Additional comments (5)
nemo_skills/inference/model/tool_call.py (2)

71-85: Good adapter/conversation manager selection

Clean split between chat-completions and responses API with centralized conversation management. This will keep tool-calling flow consistent across clients.


156-168: LGTM: unified conversation updates

Using ConversationManager for both assistant and tool-result updates simplifies the flow and keeps formats correct for each client type.

nemo_skills/inference/model/base.py (3)

261-279: Timeout placement and stream guard for Responses API look correct

Setting timeout as a top-level request option and explicitly rejecting stream=True until implemented is the right call.


312-315: Silence unused-arg warnings on unsupported completion path

Keep signature but mark args as intentionally unused.

-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:  # noqa: ARG002

76-82: Treat explicit None as “unset” to restore defaults

Forwarding None (e.g., tokens_to_generate=None) breaks downstream APIs. Fall back to GenerationDefaults when provided value is None and default is not None.

-        for param_name in all_param_names:
-            if param_name in kwargs:
-                params[param_name] = kwargs[param_name]
-            else:
-                params[param_name] = getattr(self.defaults, param_name)
+        for param_name in all_param_names:
+            if param_name in kwargs:
+                value = kwargs[param_name]
+                if value is None and hasattr(self.defaults, param_name):
+                    default_value = getattr(self.defaults, param_name)
+                    if default_value is not None:
+                        params[param_name] = default_value
+                        continue
+                params[param_name] = value
+            else:
+                params[param_name] = getattr(self.defaults, param_name)

Comment on lines +462 to 470
def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
"""Resolve API key from various sources"""
resolved_key = self._get_api_key(api_key, api_key_env_var, base_url)
if resolved_key is None: # self-hosted models don't need the key
resolved_key = "EMPTY"
return resolved_key

def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
if api_key: # explicit cmd argument always takes precedence
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Allow None base_url in API-key resolution; avoids TypeError in subclasses

_base_url may be None; loosen types accordingly. Update overrides to match.

-def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
+def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None:
@@
-    def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
+    def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None:

Please align subclass overrides (e.g., openai.py, azure.py, gemini.py) to accept base_url: str | None.


🏁 Script executed:

#!/bin/bash
rg -n "def _.*api_key" -g "nemo_skills/inference/model/*.py"

Length of output: 740


Loosen base_url type to allow None

  • In nemo_skills/inference/model/base.py, change both _resolve_api_key and _get_api_key signatures to base_url: str | None.
  • In nemo_skills/inference/model/openai.py, azure.py, and gemini.py, update their _get_api_key methods to accept base_url: str | None as well.
🧰 Tools
🪛 Ruff (0.13.1)

469-469: Unused method argument: base_url

(ARG002)

🤖 Prompt for AI Agents
In nemo_skills/inference/model/base.py around lines 462 to 470, loosen the type
of the base_url parameter from str to str | None in both _resolve_api_key and
_get_api_key signatures (i.e., change base_url: str to base_url: str | None) and
adjust any internal uses to handle None safely; also update the corresponding
_get_api_key method signatures in nemo_skills/inference/model/openai.py,
nemo_skills/inference/model/azure.py, and nemo_skills/inference/model/gemini.py
to accept base_url: str | None so the types align across implementations.

Comment on lines 95 to 99
try:
tool_args = json.loads(tool_args)
except json.decoder.JSONDecodeError as e:
LOG.exception(e)
LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")
return {"error": "Tool argument parsing failed."}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle non-string tool args and avoid JSONDecode-only catch

Responses API may pass dict args; json.loads on dict raises TypeError. Also avoid embedding the exception in logging.exception.

Apply:

-        try:
-            tool_args = json.loads(tool_args)
-        except json.decoder.JSONDecodeError as e:
-            LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")
-            return {"error": "Tool argument parsing failed."}
+        try:
+            tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args
+        except (json.JSONDecodeError, TypeError):
+            LOG.exception("Failed to parse tool arguments %r", tool_args)
+            return {"error": "Tool argument parsing failed."}

Based on static analysis hints

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
tool_args = json.loads(tool_args)
except json.decoder.JSONDecodeError as e:
LOG.exception(e)
LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")
return {"error": "Tool argument parsing failed."}
try:
tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args
except (json.JSONDecodeError, TypeError):
LOG.exception("Failed to parse tool arguments %r", tool_args)
return {"error": "Tool argument parsing failed."}
🧰 Tools
🪛 Ruff (0.13.1)

98-98: Redundant exception object included in logging.exception call

(TRY401)

🤖 Prompt for AI Agents
In nemo_skills/inference/model/tool_call.py around lines 95-99, the code
currently calls json.loads on tool_args and only catches JSONDecodeError, which
fails for non-string inputs (e.g., dict) and incorrectly embeds the exception in
LOG.exception; update the logic to first check the type: if tool_args is already
a dict, use it directly; if it's a string, call json.loads inside a try/except
that catches json.decoder.JSONDecodeError and logs with LOG.exception("Failed to
parse tool arguments") without interpolating the exception object; for any other
types return a clear error or coerce safely (e.g., convert bytes to str before
parsing) so TypeError is avoided and logging follows best practices.

@smahdavi4
Copy link
Collaborator

Thanks for adding the responses api! I just want to share two small suggestions, in case they might be helpful:

  • Generalizing text completion, chat completion and responses into a “completion_type” and calling litellm text completion, litellm chat completion, or litellm responses api depending on the state. This allows for all types of completions with a single model instance and through a single argument in generate_async (so we don’t have to pass responses type to the model initializer). This will also allow for reusing the http clients and not having to reinitialize an openai client.
  • deprecating “use_completions_api” in favour of a tri-state “completion_type”

Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smahdavi4 could you also review this please? I looked through the changes on the high level, but didn't have a chance to dive into details

@smahdavi4 for your current comment I think @gwarmstrong found that responses api in litellm is in a fairly bad state and it's really hard to make it work, so going through a separate client is basically necessary until they fix it. About completion_type, I think maybe we can do it in another PR, that would be a breaking change and it's not fully clear how responses api will be eventually supported in litellm or other providers in a stable way, so we can also wait before doing a "first-class" support for it. But if we find the current structure inconvenient, we can definitely make this change

@gwarmstrong Since the changes are fairly big and touch many parts, could you please run slurm tests from this branch? Let me know if you need details on how to do this. I will also share another test offline that I will ask you to run for a non-public model that goes through tool-calling api to verify that nothing is broken there

if extra_body:
request["extra_body"] = extra_body

return request
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double return?

"""Public method for model-specific parameter handling - override in subclasses"""
return request

def parse_chat_completion_response(self, response, **kwargs) -> dict:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function inherited anywhere? If not, we can remove this function and directly call _parse_chat_completion_response

"""A helper function to create a tool calling model.

Args:
model: Model name/path or model instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the function docstrings and some code comments are not informative enough (probably LLM artifacts). Maybe we could remove them to avoid redundancy?

if hasattr(output_item, "content") and output_item.content:
for content_item in output_item.content:
if hasattr(content_item, "text"):
reasoning_content += content_item.text + "\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we somehow keep the order of events? If we want to consider a scenario where we want to distill the outputs of a model into a another model, we need a list of interleaved text and tools.

@smahdavi4
Copy link
Collaborator

The PR looks great. Left some minor comments.

For the litellm responses, it might be a good idea to comment somewhere in the code the complications of the API, so we know for future reference when we come back to it and unify all the APIs. I tried the latest litellm with vllm tool calling and it seems to work very well (so I am not sure where it is to fix it later).

@gwarmstrong
Copy link
Collaborator Author

Closed in favor of #889

@gwarmstrong gwarmstrong closed this Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants