Add Responses API by gwarmstrong · Pull Request #844 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-09-24T21:20:39Z

Add Responses API Support for GPT-OSS Generation

This PR adds support for the OpenAI Responses API in nemo-skills, enabling generation with gpt-oss models that use the responses format instead of the standard chat completions API.

Key Changes

1. Client Handler Architecture

New BaseClientHandler system in nemo_skills/inference/model/base.py
ChatCompletionHandler: Handles standard chat completion and text completion APIs via litellm
ResponsesHandler: Handles OpenAI Responses API with direct OpenAI client integration

2. Unified BaseModel with Client Type Support

Added use_responses_api parameter to BaseModel (default: false)
Removed litellm coupling from BaseModel - moved to ChatCompletionHandler
Unified generate methods that delegate to client handlers
Two-stage request building: Handler builds structure, model applies customizations

3. Strict Parameter Validation System

New defaults.py with shared parameter defaults and validation schemas
Explicit parameter handling with strict validation (errors on unsupported non-default params)
Model-specific restrictions (e.g., OpenAI reasoning models) in respective model files

4. Enhanced Configuration Support

Added use_responses_api to GenerateSolutionsConfig for direct configuration control
Updated all factory functions (get_model, get_tool_calling_model, etc.) to support use_responses_api
Full tool calling support with responses API

Usage Examples

Pipeline Integration

from nemo_skills.pipeline.cli import generate, wrap_arguments

generate(
    ctx=wrap_arguments(
        "++use_responses_api=true "
        "++prompt_config=your/prompt/config "
        "++inference.tokens_to_generate=8192 "
        "++tool_modules=[nemo_skills.mcp.servers.python_tool::PythonTool] "
        "++inference.temperature=0.6 "
    ),
    input_file='input.jsonl',
    cluster='local',
    model='openai/gpt-oss-20b',
    server_type='vllm',  # Model provider
    server_gpus=2,
    output_dir='outputs/',
    num_random_seeds=1,
    with_sandbox=True,
)

Summary by CodeRabbit

New Features
- Optional Responses API mode available across standard, tool-calling, and code-execution flows via a new toggle.
- Unified tool-calling with improved conversation management and support for serialized/structured tool outputs.
- Exposed generation config option to toggle Responses API and added shared generation defaults (tokens, temperature, stop phrases, streaming, tools, etc.).
Refactor
- Pluggable request/response handlers for consistent API calls, parameter validation, and unified response parsing.
- Enhanced compatibility with reasoning-aware models and automatic parameter mapping.

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2025-09-24T21:20:47Z

Walkthrough

Adds a Responses API pathway and pluggable client-handler system for inference, new conversation managers/adapters for tool calling, a GenerationDefaults module, and a use_responses_api flag propagated through model factories and generation setup. OpenAI model logic updated for reasoning-aware parameter handling.

Changes

Cohort / File(s)	Summary of Changes
Factory API updates `nemo_skills/inference/model/__init__.py`	Added `use_responses_api` parameter to `get_model`, `get_code_execution_model`, and `get_tool_calling_model`; forwarded flag through creation paths and updated docstrings.
Client-handler architecture `nemo_skills/inference/model/base.py`	Added `BaseClientHandler`, `ChatCompletionHandler`, `ResponsesHandler`, and `CLIENT_HANDLERS`. `BaseModel` accepts `use_responses_api`, selects a handler, delegates request construction/calls/parsing, and exposes helper parsing/support methods.
OpenAI model param handling `nemo_skills/inference/model/openai.py`	`OpenAIModel` now accepts `use_responses_api`; added reasoning-model detection, `get_supported_params`, and `apply_model_specific_params` to adjust request shapes (developer/system remapping, reasoning_effort, presence_penalty).
Tool calling adapters & managers `nemo_skills/inference/model/tool_call.py`	Added `_setup_adapters()` to `ToolCallingWrapper` to pick schema/call/response adapters and a `conversation_manager` based on `use_responses_api`. `generate_async` now routes assistant responses and tool-result integration through the conversation manager; improved JSON error logging.
MCP adapters and conversation managers `nemo_skills/mcp/adapters.py`	Added `ConversationManager` (abstract), `CompletionConversationManager`, `ResponsesConversationManager`, `ResponsesSchemaAdapter`, `ResponsesCallInterpreter`, and `ResponsesResponseFormatter`; expanded typing imports (`Any`, `Dict`).
Generation config plumbing `nemo_skills/inference/generate.py`	Added `GenerateSolutionsConfig.use_responses_api` and forwarded it to `get_model`, `get_code_execution_model`, and `get_tool_calling_model` in `GenerationTask.setup_llm()`.
Defaults module `nemo_skills/inference/model/defaults.py`	New `GenerationDefaults` dataclass with shared generation parameters and parameter sets: `CHAT_COMPLETION_PARAMS`, `RESPONSES_PARAMS`, `COMPLETION_PARAMS`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant User
  participant GenerationTask
  participant Factory as Model Factory
  participant Model as BaseModel
  participant Handler as ClientHandler
  participant Provider as LLM API

  User->>GenerationTask: run(config.use_responses_api)
  GenerationTask->>Factory: get_*_model(use_responses_api)
  Factory-->>GenerationTask: model instance
  GenerationTask->>Model: generate_async(prompt, params)
  Model->>Handler: call_api_async(prompt, params)
  Handler->>Handler: build request (chat/responses)
  Handler->>Provider: API call
  Provider-->>Handler: response
  Handler->>Model: parse_response(...)
  Model-->>GenerationTask: generation (content, tool_calls, serialized_output)

sequenceDiagram
  autonumber
  participant TC as ToolCallingWrapper
  participant Conv as ConversationManager
  participant CI as CallInterpreter
  participant Tools as Tools Runtime

  Note over TC: Init chooses adapters<br/>based on use_responses_api
  TC->>Conv: add_assistant_response(conversation, generation)
  TC->>CI: parse(tool_calls)
  CI-->>TC: normalized tool_calls
  TC->>Tools: _execute_tool_calls(...)
  Tools-->>TC: tool_results
  TC->>Conv: add_tool_results(conversation, tool_calls_msg, formatted_results)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

I twitch my ears at toggled flows,
Two paths now ripple where one rose.
Handlers hop, adapters chime,
Conversations keep their time.
With tools and responses in one warren deep,
I ship this patch—and take a leap. 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.59% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title “Add Responses API” succinctly reflects the primary goal of the changeset—introducing support for the Responses API—and is clear, concise, and directly related to the main feature without extraneous details.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/response-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/pipeline/utils/server.py (1)

155-169: Fix server startup for “responses”
serve_vllm.py doesn’t expose a /v1/responses endpoint, so launching it for server_type="responses" will break ResponsesModel clients. Either add /v1/responses support to serve_vllm.py or invoke a dedicated serve_responses entrypoint, and update any server_type validations to include "responses".

🧹 Nitpick comments (7)

nemo_skills/inference/model/tool_call.py (2)

71-89: Prefer capability flag over isinstance() to select adapters.

Avoid tight coupling to ResponsesModel; check a capability flag on the model instead.

Apply this diff:

-        # Import here to avoid circular imports
-        from .responses import ResponsesModel
-
-        # Detect model type and configure adapters
-        if isinstance(self.model, ResponsesModel):
+        # Detect model type and configure adapters
+        if getattr(self.model, "USES_RESPONSES_API", False):
             # Responses API model - uses flatter tool schema format
             self.schema_adapter = ResponsesSchemaAdapter()
             self.call_interpreter = ResponsesCallInterpreter()
             self.response_formatter = ResponsesResponseFormatter()
             self.conversation_manager = ResponsesConversationManager()
         else:
             # Chat completion model (default) - uses nested function format
             self.schema_adapter = OpenAISchemaAdapter()
             self.call_interpreter = ChatCompletionCallInterpreter()
             self.response_formatter = ChatCompletionResponseFormatter()
             self.conversation_manager = CompletionConversationManager()

Also set a flag on the model (outside this file):

# in nemo_skills/inference/model/responses.py
class ResponsesModel(BaseModel):
    USES_RESPONSES_API = True

99-103: Return richer error context on JSON arg parse failure.

Include tool name and raw args; avoid noisy stack traces for expected parse errors.

-        except json.decoder.JSONDecodeError as e:
-            LOG.exception(e)
-            return {"error": "Tool argument parsing failed."}
+        except json.decoder.JSONDecodeError:
+            LOG.warning("Tool argument parsing failed for %s", tool_name)
+            return {"error": "Tool argument parsing failed.", "tool": tool_name, "raw_args": tool_args}

nemo_skills/mcp/adapters.py (2)

186-197: Silence Ruff ARG002: unused parameter.

Rename the unused parameter to underscore.
-    def add_tool_results(
+    def add_tool_results(
         self,
-        conversation: List[Dict[str, Any]],
-        tool_calls_message: Dict[str, Any],
-        formatted_results: List[Dict[str, Any]],
+        conversation: List[Dict[str, Any]],
+        _tool_calls_message: Dict[str, Any],
+        formatted_results: List[Dict[str, Any]],
     ) -> None:
228-235: Align subclass signature with base interface to avoid type-checker conflicts.

The base ToolResponseFormatter.format expects a ChatCompletionMessageToolCall; this subclass narrows the type. Either relax the base interface or drop annotations here.
-    def format(self, tool_call: Dict[str, Any], result: Dict[str, Any]) -> Dict[str, Any]:
+    def format(self, tool_call, result):
         """Format the response from a tool call for responses API."""
         return {
             "type": "function_call_output",
             "call_id": tool_call["id"],
             "output": json.dumps(result) if not isinstance(result, str) else result,
         }
Optionally, broaden the base interface (outside this hunk) to accept Any for tool_call/result.

nemo_skills/inference/model/responses.py (3)

131-131: Rename unused parameter to underscore.

Prevents ARG002 lint and clarifies intent.
-    def _parse_responses_response(self, response, include_response: bool = False, **kwargs) -> dict:
+    def _parse_responses_response(self, response, include_response: bool = False, **_kwargs) -> dict:
270-272: Reduce response logging to debug (optional).

Logging full response objects at INFO can be noisy and may include sensitive content. Consider downgrading to DEBUG or summarizing.
-        LOG.info(f"Response from server: {response}")
-        LOG.info(f"Response type: {type(response)}")
+        LOG.debug("Response type: %s", type(response))
337-339: Reduce sync response logging to debug (optional).

Same rationale as async path.
-        LOG.info(f"Response from server: {response}")
-        LOG.info(f"Response type: {type(response)}")
+        LOG.debug("Response type: %s", type(response))

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c1c65b and 97b4f2f.

📒 Files selected for processing (5)

nemo_skills/inference/model/__init__.py (2 hunks)
nemo_skills/inference/model/responses.py (1 hunks)
nemo_skills/inference/model/tool_call.py (4 hunks)
nemo_skills/mcp/adapters.py (3 hunks)
nemo_skills/pipeline/utils/server.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

nemo_skills/inference/model/responses.py (4)

nemo_skills/utils.py (1)

get_logger_name (130-131)

nemo_skills/inference/model/base.py (3)

BaseModel (32-513)

_build_request_params (193-200)

_maybe_apply_stop_phrase_removal (149-153)

nemo_skills/inference/model/context_retry.py (1)

with_context_retry (162-182)

nemo_skills/inference/model/tool_call.py (1)

generate_async (122-183)

nemo_skills/inference/model/tool_call.py (1)

nemo_skills/mcp/adapters.py (18)

CompletionConversationManager (145-170)

OpenAISchemaAdapter (70-83)

ResponsesCallInterpreter (204-222)

ResponsesConversationManager (173-196)

ResponsesResponseFormatter (225-234)

ResponsesSchemaAdapter (237-250)

ChatCompletionCallInterpreter (103-122)

ChatCompletionResponseFormatter (125-137)

add_assistant_response (50-52)

add_assistant_response (148-156)

add_assistant_response (176-183)

parse (35-36)

parse (87-90)

parse (112-122)

parse (207-222)

add_tool_results (55-62)

add_tool_results (158-170)

add_tool_results (185-196)

🪛 Ruff (0.13.1)

nemo_skills/inference/model/responses.py

52-52: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

58-58: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

103-103: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

131-131: Unused method argument: kwargs

(ARG002)

204-204: Do not catch blind exception: Exception

(BLE001)

221-221: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

230-230: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

238-238: Prefer TypeError exception for invalid type

(TRY004)

238-238: Avoid specifying long messages outside the exception class

(TRY003)

291-291: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

300-300: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

308-308: Prefer TypeError exception for invalid type

(TRY004)

308-308: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/mcp/adapters.py

188-188: Unused method argument: tool_calls_message

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (3)

nemo_skills/inference/model/__init__.py (1)

31-31: Wire-up for ResponsesModel looks good.

Import and registry exposure for "responses" are correct.

Also applies to: 51-51

nemo_skills/pipeline/utils/server.py (1)

126-127: LGTM: mount check allowlist updated.

Including "responses" alongside vllm/sglang/trtllm is appropriate here.

nemo_skills/inference/model/tool_call.py (1)

159-161: Good centralization via conversation_manager.

Appending assistant responses through the manager keeps formats consistent across model types.

nemo_skills/inference/model/responses.py

nemo_skills/inference/model/tool_call.py

nemo_skills/mcp/adapters.py

nemo_skills/pipeline/utils/server.py

Signed-off-by: George Armstrong <georgea@nvidia.com>

nemo_skills/inference/model/__init__.py

nemo_skills/inference/model/responses.py

nemo_skills/inference/model/tool_call.py

nemo_skills/inference/model/responses.py

nemo_skills/pipeline/utils/server.py

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

nemo_skills/inference/generate.py (1)

193-202: Fix attribute access: self.cfg.use_completions_api → self.use_completions_api

GenerateSolutionsConfig has no cfg attribute. This will crash on Megatron path.

Apply this diff:
-            self.cfg.use_completions_api = True
+            self.use_completions_api = True

nemo_skills/inference/model/tool_call.py (1)

92-100: Handle non-string tool arguments (dict) to avoid TypeError

Responses API may supply already-parsed dicts. Current code only catches JSONDecodeError and will raise on non-str types.

Apply this diff:

-        tool_args = tool_call["function"]["arguments"]
-
-        ##
-        # TODO(sanyamk): Not all tool arguments might necessarily be in JSON format.
-        #   Kept here to handle errors for now.
-        try:
-            tool_args = json.loads(tool_args)
-        except json.decoder.JSONDecodeError as e:
-            LOG.exception(e)
-            return {"error": "Tool argument parsing failed."}
+        tool_args = tool_call["function"]["arguments"]
+        # Accept both JSON strings and dicts
+        if isinstance(tool_args, str):
+            try:
+                tool_args = json.loads(tool_args)
+            except json.JSONDecodeError as e:
+                LOG.exception("Tool argument parsing failed: %s", e)
+                return {"error": "Tool argument parsing failed."}
+        elif not isinstance(tool_args, dict):
+            LOG.warning("Unexpected tool argument type: %s", type(tool_args).__name__)
+            tool_args = {}

nemo_skills/inference/model/base.py (1)

641-719: Handle potential UnboundLocalError in retry loop.

If the while loop exhausts all retries without returning, the result variable will be undefined on line 719, causing an UnboundLocalError.

Apply this diff to fix the UnboundLocalError:

 # TODO: remove this after we no longer use gpt-oss or it's fixed in vllm
 max_retries = 2
 retry_count = 0
+result = None
 
 while retry_count <= max_retries:
     try:
         # Delegate to client handler using public API
         response = await self.client_handler.call_api_async(prompt, **kwargs)
 
         if stream:
             return self._handle_streaming_response(response)
         else:
             result = self.client_handler.parse_response(response, **kwargs)
             if remove_stop_phrases:
                 self._maybe_apply_stop_phrase_removal(result, remove_stop_phrases, stop_phrases)
             return result
 
     except openai.BadRequestError as e:
         if "output messages (reasoning and final)" in str(e):
             if retry_count < max_retries:
                 retry_count += 1
                 LOG.warning(f"BadRequestError, retrying {retry_count}/{max_retries}: {e}")
                 continue
 
             LOG.error(f"BadRequestError after {max_retries} retries, returning empty response: {e}")
             return {"generation": "", "reasoning_content": "", "num_generated_tokens": 0}
         else:
             raise e
 
-return result
+# This should never be reached unless retry logic is incorrect
+LOG.error("Unexpected end of retry loop without result")
+return {"generation": "", "num_generated_tokens": 0}

🧹 Nitpick comments (5)

nemo_skills/inference/model/defaults.py (1)

41-79: De-duplicate param sets to prevent drift between CHAT_COMPLETION_PARAMS and RESPONSES_PARAMS

Both sets are currently identical. Define a single base set and derive others (add/remove as needed) to avoid future divergence bugs.

nemo_skills/inference/model/__init__.py (1)

58-58: Optional: shorten exception message to appease Ruff TRY003

Current message length triggers TRY003. Consider a shorter message or a custom exception type.

nemo_skills/inference/model/tool_call.py (1)

45-46: Update outdated docstring

The wrapper now supports both chat-completions and responses via adapters; update the TODO.
nemo_skills/inference/model/base.py (2)
593-615: Consider improving the fallback serialization strategy.

The current implementation silently catches all exceptions and falls back to string representation. Consider logging more specific error types or providing more structured fallback data.

Consider this improved error handling approach:
 def _serialize_response_output(self, response):
     """Serialize response output objects using model_dump() for conversation history."""
     serialized_output = []
 
     if hasattr(response, "output") and response.output:
         for output_item in response.output:
             try:
                 # Try to use model_dump() method if available (Pydantic models)
                 if hasattr(output_item, "model_dump"):
                     serialized_output.append(output_item.model_dump())
                 # Fallback to dict conversion
                 elif hasattr(output_item, "__dict__"):
                     serialized_output.append(output_item.__dict__)
                 # Last resort: convert to string representation
                 else:
                     serialized_output.append({"content": str(output_item), "type": "unknown"})
-            except Exception as e:
-                LOG.warning(f"Failed to serialize output item: {e}")
-                # Fallback serialization
-                serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})
+            except (AttributeError, TypeError) as e:
+                LOG.warning(f"Failed to serialize output item of type {type(output_item).__name__}: {e}")
+                # Fallback serialization with type information
+                serialized_output.append({
+                    "content": str(output_item), 
+                    "type": "serialization_error", 
+                    "original_type": type(output_item).__name__,
+                    "error": str(e)
+                })
+            except Exception as e:
+                LOG.error(f"Unexpected error serializing output item: {e}")
+                # Generic fallback for unexpected errors
+                serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})
 
     return serialized_output
405-429: Consider extracting SSH tunnel configuration to a dataclass.

The SSH tunnel setup logic could be encapsulated in a configuration dataclass for better maintainability and reusability.

Would you like me to create a separate SSHTunnelConfig dataclass to encapsulate the SSH configuration logic and make it more reusable?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97b4f2f and 5688efe.

📒 Files selected for processing (6)

nemo_skills/inference/generate.py (2 hunks)
nemo_skills/inference/model/__init__.py (2 hunks)
nemo_skills/inference/model/base.py (13 hunks)
nemo_skills/inference/model/defaults.py (1 hunks)
nemo_skills/inference/model/openai.py (4 hunks)
nemo_skills/inference/model/tool_call.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (4)

nemo_skills/inference/model/tool_call.py (1)

nemo_skills/mcp/adapters.py (18)

CompletionConversationManager (145-170)

OpenAISchemaAdapter (70-83)

ResponsesCallInterpreter (204-222)

ResponsesConversationManager (173-196)

ResponsesResponseFormatter (225-234)

ResponsesSchemaAdapter (237-250)

ChatCompletionCallInterpreter (103-122)

ChatCompletionResponseFormatter (125-137)

add_assistant_response (50-52)

add_assistant_response (148-156)

add_assistant_response (176-183)

parse (35-36)

parse (87-90)

parse (112-122)

parse (207-222)

add_tool_results (55-62)

add_tool_results (158-170)

add_tool_results (185-196)

nemo_skills/inference/generate.py (1)

nemo_skills/inference/model/__init__.py (3)

get_code_execution_model (62-70)

get_tool_calling_model (97-113)

get_model (53-59)

nemo_skills/inference/model/openai.py (1)

nemo_skills/inference/model/base.py (5)

get_supported_params (44-46)

get_supported_params (150-151)

get_supported_params (257-258)

get_supported_params (501-503)

apply_model_specific_params (505-507)

nemo_skills/inference/model/base.py (3)

nemo_skills/inference/model/openai.py (3)

get_supported_params (81-92)

apply_model_specific_params (94-109)

_get_api_key (60-72)

nemo_skills/inference/model/defaults.py (1)

GenerationDefaults (20-38)

nemo_skills/inference/model/context_retry.py (1)

ContextLimitRetryConfig (118-159)

🪛 Ruff (0.13.1)

nemo_skills/inference/model/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/base.py

73-75: Avoid specifying long messages outside the exception class

(TRY003)

310-310: Unused method argument: prompt

(ARG002)

310-310: Unused method argument: params

(ARG002)

312-312: Avoid specifying long messages outside the exception class

(TRY003)

314-314: Unused method argument: prompt

(ARG002)

318-318: Unused method argument: prompt

(ARG002)

400-400: Avoid specifying long messages outside the exception class

(TRY003)

505-505: Unused method argument: params

(ARG002)

609-609: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (9)

nemo_skills/inference/generate.py (2)

95-97: LGTM: client_type surfaced in config

Propagating client_type at the config level is correct and consistent with model init.

342-346: LGTM: client_type forwarded to code execution model

Forwarding client_type here maintains consistent handler selection.

nemo_skills/inference/model/tool_call.py (1)

71-86: Adapters selection looks correct

Using client_type to choose schema/interpreter/formatter/conversation manager is appropriate and avoids import-time cycles.

nemo_skills/inference/model/base.py (6)

32-138: Well-designed handler abstraction with clear public API.

The new BaseClientHandler class provides a clean separation of concerns, clearly distinguishing between public methods that models can override and internal implementation details. The parameter extraction/validation logic with support for model-specific restrictions is particularly well thought out.

140-245: Clean implementation of chat completion handler.

The ChatCompletionHandler properly encapsulates litellm-specific logic, handles both chat and completion APIs, and correctly manages extra_body parameters for non-standard fields. The setup of httpx sessions with appropriate connection limits ensures efficient connection pooling.

247-325: Comprehensive ResponsesHandler implementation for the new API format.

The handler correctly uses the OpenAI client directly and properly structures requests for the responses API, including proper handling of tools and extra_body parameters.

517-592: Robust response parsing for the Responses API with proper fallbacks.

The implementation handles various response structures gracefully, including tool calls, reasoning content, and message content. The serialization logic for conversation history is comprehensive with good error handling.

375-404: Well-structured initialization with proper separation of concerns.

The initialization sequence is logical - setup networking, resolve API configuration, then initialize the handler. The public client_handler attribute allows for clean delegation while maintaining encapsulation.

48-86: extract_and_validate_params correctly applies model-level filtering
extract_and_validate_params intersects handler-level and OpenAI’s get_supported_params (non-empty) and skips filtering for base models. No issues found.

nemo_skills/inference/generate.py

nemo_skills/inference/model/__init__.py

nemo_skills/inference/model/base.py

nemo_skills/inference/model/openai.py

Signed-off-by: George Armstrong <georgea@nvidia.com> MAINT switch to use_responses_api parameter Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

nemo_skills/inference/model/base.py (2)
310-313: Silence linters for unused args in completion-path error

Avoid ARG002/TRY003 nits; keep behavior unchanged.
-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:  # noqa: ARG002
         """Responses API doesn't support completion - raise error"""
         raise ValueError("ResponsesHandler only supports message lists, not string prompts")
711-736: Remove unreachable trailing return

result is always returned inside the loop; trailing return can be dropped.
-        return result
+        # Unreachable
+        # return result

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5688efe and 4e3a5a4.

📒 Files selected for processing (5)

nemo_skills/inference/generate.py (2 hunks)
nemo_skills/inference/model/__init__.py (2 hunks)
nemo_skills/inference/model/base.py (13 hunks)
nemo_skills/inference/model/openai.py (4 hunks)
nemo_skills/inference/model/tool_call.py (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

nemo_skills/inference/generate.py

🧰 Additional context used

🧬 Code graph analysis (4)

nemo_skills/inference/model/tool_call.py (1)

nemo_skills/mcp/adapters.py (18)

CompletionConversationManager (145-170)

OpenAISchemaAdapter (70-83)

ResponsesCallInterpreter (204-222)

ResponsesConversationManager (173-196)

ResponsesResponseFormatter (225-234)

ResponsesSchemaAdapter (237-250)

ChatCompletionCallInterpreter (103-122)

ChatCompletionResponseFormatter (125-137)

add_assistant_response (50-52)

add_assistant_response (148-156)

add_assistant_response (176-183)

parse (35-36)

parse (87-90)

parse (112-122)

parse (207-222)

add_tool_results (55-62)

add_tool_results (158-170)

add_tool_results (185-196)

nemo_skills/inference/model/__init__.py (1)

nemo_skills/inference/chat_interface/core.py (2)

get (136-151)

sandbox (177-178)

nemo_skills/inference/model/openai.py (1)

nemo_skills/inference/model/base.py (5)

get_supported_params (44-46)

get_supported_params (150-151)

get_supported_params (257-258)

get_supported_params (518-520)

apply_model_specific_params (522-524)

nemo_skills/inference/model/base.py (3)

nemo_skills/inference/model/openai.py (3)

get_supported_params (81-92)

apply_model_specific_params (94-109)

_get_api_key (60-72)

nemo_skills/inference/model/defaults.py (1)

GenerationDefaults (20-38)

nemo_skills/inference/model/context_retry.py (1)

ContextLimitRetryConfig (118-159)

🪛 Ruff (0.13.1)

nemo_skills/inference/model/__init__.py

65-65: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/base.py

73-75: Avoid specifying long messages outside the exception class

(TRY003)

310-310: Unused method argument: prompt

(ARG002)

310-310: Unused method argument: params

(ARG002)

312-312: Avoid specifying long messages outside the exception class

(TRY003)

314-314: Unused method argument: prompt

(ARG002)

318-318: Unused method argument: prompt

(ARG002)

415-417: Avoid specifying long messages outside the exception class

(TRY003)

522-522: Unused method argument: params

(ARG002)

626-626: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (4)

nemo_skills/inference/model/__init__.py (2)

53-66: Normalize server_type, add unknown-type guard, and make trtllm check case-insensitive

Prevents KeyError on unknown inputs and fixes mixed‑case trtllm from bypassing the guard.

-def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs):
+def get_model(server_type, use_responses_api=False, tokenizer=None, **kwargs):
@@
-    model_class = models[server_type.lower()]
-    if server_type == "trtllm" and kwargs.get("enable_soft_fail", False):
+    st = server_type.lower()
+    if st not in models:
+        raise ValueError(f"Unknown server_type: {server_type}. Choices: {list(models.keys())}")
+    model_class = models[st]
+    if st == "trtllm" and kwargs.get("enable_soft_fail", False):
         if kwargs.get("context_limit_retry_strategy", None) is not None:
             raise ValueError("context_limit_retry_strategy is not supported for trtllm")
     return model_class(use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)

122-135: Fix incorrect keyword when delegating to get_model

Passing model=... raises TypeError; get_model expects server_type.

-    if isinstance(model, str):
-        model = get_model(model=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)
+    if isinstance(model, str):
+        model = get_model(server_type=model, use_responses_api=use_responses_api, tokenizer=tokenizer, **kwargs)

nemo_skills/inference/model/tool_call.py (1)

68-86: Adapters setup and conversation manager integration look good

Good separation by responses vs chat and centralized conversation updates.

Also applies to: 156-168

nemo_skills/inference/model/openai.py (1)

81-92: Replace self.model_name_or_path with self.model in reasoning checks

self.model_name_or_path is not defined on OpenAIModel; this will raise AttributeError at runtime.

-        if self.is_reasoning_model(self.model_name_or_path):
+        if self.is_reasoning_model(self.model):
             base_params -= OPENAI_REASONING_UNSUPPORTED
@@
-        if self.is_reasoning_model(self.model_name_or_path):
+        if self.is_reasoning_model(self.model):
             # Reasoning model specific handling
             if "messages" in request:
                 request["messages"] = [
                     {**msg, "role": "developer"} if msg.get("role") == "system" else msg for msg in request["messages"]
                 ]
             if params.get("reasoning_effort"):
                 request["reasoning_effort"] = params["reasoning_effort"]

Also applies to: 94-110

nemo_skills/inference/model/base.py

Signed-off-by: George Armstrong <georgea@nvidia.com>

Moves timeout to the main body parameters Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

nemo_skills/mcp/adapters.py (1)

181-189: Silence unused param in ResponsesConversationManager.add_tool_results.

Rename to underscore to satisfy Ruff ARG002 and clarify intent.

Apply this diff:

-    def add_tool_results(
-        self,
-        conversation: List[Dict[str, Any]],
-        tool_calls_message: Dict[str, Any],
-        formatted_results: List[Dict[str, Any]],
-    ) -> None:
+    def add_tool_results(
+        self,
+        conversation: List[Dict[str, Any]],
+        _tool_calls_message: Dict[str, Any],
+        formatted_results: List[Dict[str, Any]],
+    ) -> None:  # noqa: ARG002

As per static analysis.

nemo_skills/inference/model/base.py (3)

308-315: Suppress unused-arg warnings in ResponsesHandler.make_*_call.

prompt isn’t used; rename to underscore.

Apply this diff:

-    async def make_async_call(self, request: dict, prompt):
+    async def make_async_call(self, request: dict, _prompt):  # noqa: ARG002
@@
-    def make_sync_call(self, request: dict, prompt):
+    def make_sync_call(self, request: dict, _prompt):  # noqa: ARG002

597-619: Narrow overly broad exception in _serialize_response_output.

Catching Exception hides real failures and trips linters.

Apply this diff:

-                except Exception as e:
+                except (AttributeError, TypeError, ValueError) as e:
                     LOG.warning(f"Failed to serialize output item: {e}")
                     # Fallback serialization
                     serialized_output.append({"content": str(output_item), "type": "error", "error": str(e)})

519-522: Suppress unused-arg warning for base apply_model_specific_params.

Subclasses use params; base impl doesn’t.

Apply this diff:

-    def apply_model_specific_params(self, request: dict, params: dict) -> dict:
+    def apply_model_specific_params(self, request: dict, params: dict) -> dict:  # noqa: ARG002
         """Public method for model-specific parameter handling - override in subclasses"""
         return request

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8428242 and f8c6367.

📒 Files selected for processing (2)

nemo_skills/inference/model/base.py (13 hunks)
nemo_skills/mcp/adapters.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/inference/model/base.py (6)

nemo_skills/inference/model/openai.py (3)

get_supported_params (79-90)

apply_model_specific_params (92-107)

_get_api_key (58-70)

nemo_skills/inference/model/defaults.py (1)

GenerationDefaults (20-38)

nemo_skills/inference/model/context_retry.py (1)

ContextLimitRetryConfig (118-159)

nemo_skills/inference/model/azure.py (1)

_get_api_key (32-39)

nemo_skills/inference/model/gemini.py (1)

_get_api_key (32-39)

nemo_skills/inference/model/megatron.py (1)

_parse_completion_response (72-106)

🪛 Ruff (0.13.1)

nemo_skills/mcp/adapters.py

184-184: Unused method argument: tool_calls_message

(ARG002)

nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)

304-304: Unused method argument: prompt

(ARG002)

304-304: Unused method argument: params

(ARG002)

306-306: Avoid specifying long messages outside the exception class

(TRY003)

308-308: Unused method argument: prompt

(ARG002)

312-312: Unused method argument: prompt

(ARG002)

412-414: Avoid specifying long messages outside the exception class

(TRY003)

519-519: Unused method argument: params

(ARG002)

613-613: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (5)

nemo_skills/mcp/adapters.py (2)

146-171: ConversationManager abstraction and default/chat implementations look solid.

Interfaces and responsibilities are clear; integration points are cohesive.

200-219: Do not silently fall back to 'unknown' for tool_call id/name; raise on malformed input.

Fail fast if a tool_call lacks id or name to avoid corrupting conversation state later.

Apply this diff:

 class ResponsesCallInterpreter(ToolCallInterpreter):
@@
-    def parse(self, tool_calls: List[Any]) -> Dict[str, Any]:
+    def parse(self, tool_calls: List[Any]) -> Dict[str, Any]:
         """Parse tool calls from responses API format."""
         parsed_calls = []
 
         for tool_call in tool_calls:
-            parsed_call = {
-                "type": "function",
-                "id": getattr(tool_call, "call_id", getattr(tool_call, "id", "unknown")),
-                "function": {
-                    "name": getattr(tool_call, "name", "unknown"),
-                    "arguments": getattr(tool_call, "arguments", "{}"),
-                },
-            }
+            call_id = getattr(tool_call, "call_id", getattr(tool_call, "id", None))
+            name = getattr(tool_call, "name", None)
+            args = getattr(tool_call, "arguments", "{}")
+            if not call_id or not name:
+                raise ValueError(f"Malformed tool_call: missing id or name: {tool_call}")
+            parsed_call = {
+                "type": "function",
+                "id": call_id,
+                "function": {"name": name, "arguments": args},
+            }
             parsed_calls.append(parsed_call)
 
         return {"role": "assistant", "tool_calls": parsed_calls}

nemo_skills/inference/model/base.py (3)

304-306: Suppress unused-arg warnings in ResponsesHandler.build_completion_request_structure.

Keep signature but mark unused to satisfy linters.

Apply this diff:

-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, _prompt: str, _params: dict) -> dict:  # noqa: ARG002
         """Responses API doesn't support completion - raise error"""
-        raise ValueError("ResponsesHandler only supports message lists, not string prompts")
+        raise ValueError("ResponsesHandler only supports message lists, not string prompts")  # noqa: TRY003

62-71: Param validation correctly allows None-equals-default for unsupported params.

Prevents false positives when callers pass default None. Good.

261-301: ResponsesHandler: move timeout to top-level; block stream until supported.

Timeout in extra_body is ignored by OpenAI client; streaming for Responses isn't parsed by _handle_streaming_response and will break.

Apply this diff:

     def build_chat_request_structure(self, messages: list, params: dict) -> dict:
         """Build responses API request structure"""
         # Use proper list of dicts format for vLLM servers
         request = {
             "input": messages,
             "max_output_tokens": params["tokens_to_generate"],
             "temperature": params["temperature"],
             "top_p": params["top_p"],
-            "stream": params["stream"],
+            "stream": params["stream"],
         }
+
+        # Pass timeout to client at top-level
+        if params["timeout"] is not None:
+            request["timeout"] = params["timeout"]
+
+        # Guard until streaming parsing for Responses is implemented
+        if params["stream"]:
+            raise ValueError("stream=True is not supported for ResponsesHandler yet.")
@@
-        if params["timeout"] is not None:
-            extra_body["timeout"] = params["timeout"]

nemo_skills/inference/model/base.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8c6367 and ab7ad54.

📒 Files selected for processing (1)

nemo_skills/inference/model/base.py (13 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/inference/model/base.py (5)

nemo_skills/inference/model/defaults.py (1)

GenerationDefaults (20-38)

nemo_skills/inference/model/context_retry.py (1)

ContextLimitRetryConfig (118-159)

nemo_skills/inference/model/gemini.py (1)

_get_api_key (32-39)

nemo_skills/inference/model/azure.py (1)

_get_api_key (32-39)

nemo_skills/inference/model/megatron.py (1)

_parse_completion_response (72-106)

🪛 Ruff (0.13.1)

nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)

278-278: Avoid specifying long messages outside the exception class

(TRY003)

312-312: Unused method argument: prompt

(ARG002)

312-312: Unused method argument: params

(ARG002)

314-314: Avoid specifying long messages outside the exception class

(TRY003)

316-316: Unused method argument: prompt

(ARG002)

320-320: Unused method argument: prompt

(ARG002)

420-422: Avoid specifying long messages outside the exception class

(TRY003)

527-527: Unused method argument: params

(ARG002)

621-621: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

coderabbitai · 2025-09-26T19:15:11Z

nemo_skills/inference/model/base.py

+        for param_name in all_param_names:
+            if param_name in kwargs:
+                params[param_name] = kwargs[param_name]
+            else:
+                params[param_name] = getattr(self.defaults, param_name)
+
+        return params


⚠️ Potential issue | 🔴 Critical

Restore defaults when params are left unset

When a caller omits tokens_to_generate (the common path, since the public API defaults it to None), extract_and_validate_params now forwards None straight through. That means the subsequent request builders send max_tokens=None / max_output_tokens=None, which OpenAI/vLLM reject with a 400. We need to reapply the GenerationDefaults value whenever the provided value is None and the default is not, so the happy path keeps working.

- for param_name in all_param_names: - if param_name in kwargs: - params[param_name] = kwargs[param_name] - else: - params[param_name] = getattr(self.defaults, param_name) + for param_name in all_param_names: + if param_name in kwargs: + value = kwargs[param_name] + if value is None and hasattr(self.defaults, param_name): + default_value = getattr(self.defaults, param_name) + if default_value is not None: + params[param_name] = default_value + continue + params[param_name] = value + else: + params[param_name] = getattr(self.defaults, param_name)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for param_name in all_param_names:

if param_name in kwargs:

params[param_name] = kwargs[param_name]

else:

params[param_name] = getattr(self.defaults, param_name)

return params

for param_name in all_param_names:

if param_name in kwargs:

value = kwargs[param_name]

if value is None and hasattr(self.defaults, param_name):

default_value = getattr(self.defaults, param_name)

if default_value is not None:

params[param_name] = default_value

continue

params[param_name] = value

else:

params[param_name] = getattr(self.defaults, param_name)

return params

🤖 Prompt for AI Agents

In nemo_skills/inference/model/base.py around lines 76 to 82, the current logic copies values from kwargs even when they are present but set to None, which causes None to be forwarded for parameters like tokens_to_generate and leads to downstream 400 errors; change the logic so that for each param_name you use kwargs[param_name] only if it exists in kwargs and is not None, otherwise fall back to getattr(self.defaults, param_name) (i.e., treat explicit None as "unset" and restore the GenerationDefaults value).

Signed-off-by: George Armstrong <georgea@nvidia.com>

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

nemo_skills/inference/model/tool_call.py (1)

105-107: Avoid logging.exception(e); let logger attach the traceback

Pass a message, not the exception object.

-        except Exception as e:
-            LOG.exception(e)
+        except Exception:
+            LOG.exception("Tool execution failed")
             return {"error": "Tool execution failed."}

Based on static analysis hints

nemo_skills/inference/model/base.py (4)

308-311: Remove duplicate return

Second return is unreachable.

-        return request
-
         return request

316-323: Silence unused-arg warnings in ResponsesHandler call methods

Prompt isn’t used here.

-    async def make_async_call(self, request: dict, prompt):
+    async def make_async_call(self, request: dict, prompt):  # noqa: ARG002
@@
-    def make_sync_call(self, request: dict, prompt):
+    def make_sync_call(self, request: dict, prompt):  # noqa: ARG002

452-461: Fix return type to allow None from _setup_base_url when base_url is ''

Prevents type mismatches.

-def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str:
+def _setup_base_url(self, base_url: str | None, use_v1_endpoint: bool) -> str | None:

628-634: Async streaming should use async chunk iterators

_generate_async returns a generator; consider picking async iterators here to avoid blocking on async streams.

-        if hasattr(response, "choices") and hasattr(response.choices[0], "message"):
-            return self._stream_chat_chunks_sync(response)
-        else:
-            return self._stream_completion_chunks_sync(response)
+        if hasattr(response, "choices") and hasattr(response.choices[0], "message"):
+            return self._stream_chat_chunks_async(response)
+        else:
+            return self._stream_completion_chunks_async(response)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ab7ad54 and 503a9f1.

📒 Files selected for processing (2)

nemo_skills/inference/model/base.py (13 hunks)
nemo_skills/inference/model/tool_call.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

nemo_skills/inference/model/tool_call.py (1)

nemo_skills/mcp/adapters.py (18)

CompletionConversationManager (145-170)

OpenAISchemaAdapter (70-83)

ResponsesCallInterpreter (200-218)

ResponsesConversationManager (173-192)

ResponsesResponseFormatter (221-230)

ResponsesSchemaAdapter (233-246)

ChatCompletionCallInterpreter (103-122)

ChatCompletionResponseFormatter (125-137)

add_assistant_response (50-52)

add_assistant_response (148-156)

add_assistant_response (176-179)

parse (35-36)

parse (87-90)

parse (112-122)

parse (203-218)

add_tool_results (55-62)

add_tool_results (158-170)

add_tool_results (181-192)

nemo_skills/inference/model/base.py (4)

nemo_skills/inference/model/openai.py (3)

get_supported_params (79-90)

apply_model_specific_params (92-107)

_get_api_key (58-70)

nemo_skills/inference/model/defaults.py (1)

GenerationDefaults (20-38)

nemo_skills/evaluation/metrics/base.py (1)

update (145-189)

nemo_skills/inference/model/context_retry.py (1)

ContextLimitRetryConfig (118-159)

🪛 Ruff (0.13.1)

nemo_skills/inference/model/tool_call.py

98-98: Redundant exception object included in logging.exception call

(TRY401)

nemo_skills/inference/model/base.py

69-71: Avoid specifying long messages outside the exception class

(TRY003)

278-278: Avoid specifying long messages outside the exception class

(TRY003)

312-312: Unused method argument: prompt

(ARG002)

312-312: Unused method argument: params

(ARG002)

314-314: Avoid specifying long messages outside the exception class

(TRY003)

316-316: Unused method argument: prompt

(ARG002)

320-320: Unused method argument: prompt

(ARG002)

420-422: Avoid specifying long messages outside the exception class

(TRY003)

527-527: Unused method argument: params

(ARG002)

621-621: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (5)

nemo_skills/inference/model/tool_call.py (2)

71-85: Good adapter/conversation manager selection

Clean split between chat-completions and responses API with centralized conversation management. This will keep tool-calling flow consistent across clients.

156-168: LGTM: unified conversation updates

Using ConversationManager for both assistant and tool-result updates simplifies the flow and keeps formats correct for each client type.
nemo_skills/inference/model/base.py (3)
261-279: Timeout placement and stream guard for Responses API look correct

Setting timeout as a top-level request option and explicitly rejecting stream=True until implemented is the right call.

312-315: Silence unused-arg warnings on unsupported completion path

Keep signature but mark args as intentionally unused.
-    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:
+    def build_completion_request_structure(self, prompt: str, params: dict) -> dict:  # noqa: ARG002
76-82: Treat explicit None as “unset” to restore defaults

Forwarding None (e.g., tokens_to_generate=None) breaks downstream APIs. Fall back to GenerationDefaults when provided value is None and default is not None.
-        for param_name in all_param_names:
-            if param_name in kwargs:
-                params[param_name] = kwargs[param_name]
-            else:
-                params[param_name] = getattr(self.defaults, param_name)
+        for param_name in all_param_names:
+            if param_name in kwargs:
+                value = kwargs[param_name]
+                if value is None and hasattr(self.defaults, param_name):
+                    default_value = getattr(self.defaults, param_name)
+                    if default_value is not None:
+                        params[param_name] = default_value
+                        continue
+                params[param_name] = value
+            else:
+                params[param_name] = getattr(self.defaults, param_name)

coderabbitai · 2025-09-26T21:18:59Z

nemo_skills/inference/model/base.py

+    def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
+        """Resolve API key from various sources"""
+        resolved_key = self._get_api_key(api_key, api_key_env_var, base_url)
+        if resolved_key is None:  # self-hosted models don't need the key
+            resolved_key = "EMPTY"
+        return resolved_key

    def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None:
        if api_key:  # explicit cmd argument always takes precedence


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Allow None base_url in API-key resolution; avoids TypeError in subclasses

_base_url may be None; loosen types accordingly. Update overrides to match.

-def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None: +def _resolve_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None: @@ - def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str) -> str | None: + def _get_api_key(self, api_key: str | None, api_key_env_var: str | None, base_url: str | None) -> str | None:

Please align subclass overrides (e.g., openai.py, azure.py, gemini.py) to accept base_url: str | None.

🏁 Script executed:

#!/bin/bash rg -n "def _.*api_key" -g "nemo_skills/inference/model/*.py"

Length of output: 740

Loosen base_url type to allow None

In nemo_skills/inference/model/base.py, change both _resolve_api_key and _get_api_key signatures to base_url: str | None.

In nemo_skills/inference/model/openai.py, azure.py, and gemini.py, update their _get_api_key methods to accept base_url: str | None as well.

🧰 Tools

🪛 Ruff (0.13.1)

469-469: Unused method argument: base_url

(ARG002)

🤖 Prompt for AI Agents

In nemo_skills/inference/model/base.py around lines 462 to 470, loosen the type of the base_url parameter from str to str | None in both _resolve_api_key and _get_api_key signatures (i.e., change base_url: str to base_url: str | None) and adjust any internal uses to handle None safely; also update the corresponding _get_api_key method signatures in nemo_skills/inference/model/openai.py, nemo_skills/inference/model/azure.py, and nemo_skills/inference/model/gemini.py to accept base_url: str | None so the types align across implementations.

coderabbitai · 2025-09-26T21:18:59Z

nemo_skills/inference/model/tool_call.py

        try:
            tool_args = json.loads(tool_args)
        except json.decoder.JSONDecodeError as e:
-            LOG.exception(e)
+            LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")
            return {"error": "Tool argument parsing failed."}


⚠️ Potential issue | 🔴 Critical

Handle non-string tool args and avoid JSONDecode-only catch

Responses API may pass dict args; json.loads on dict raises TypeError. Also avoid embedding the exception in logging.exception.

Apply:

- try: - tool_args = json.loads(tool_args) - except json.decoder.JSONDecodeError as e: - LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}") - return {"error": "Tool argument parsing failed."} + try: + tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args + except (json.JSONDecodeError, TypeError): + LOG.exception("Failed to parse tool arguments %r", tool_args) + return {"error": "Tool argument parsing failed."}

Based on static analysis hints

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

tool_args = json.loads(tool_args)

except json.decoder.JSONDecodeError as e:

LOG.exception(e)

LOG.exception(f"Failed to parse tool arguments {tool_args}: {e}")

return {"error": "Tool argument parsing failed."}

try:

tool_args = json.loads(tool_args) if isinstance(tool_args, str) else tool_args

except (json.JSONDecodeError, TypeError):

LOG.exception("Failed to parse tool arguments %r", tool_args)

return {"error": "Tool argument parsing failed."}

🧰 Tools

🪛 Ruff (0.13.1)

98-98: Redundant exception object included in logging.exception call

(TRY401)

🤖 Prompt for AI Agents

In nemo_skills/inference/model/tool_call.py around lines 95-99, the code currently calls json.loads on tool_args and only catches JSONDecodeError, which fails for non-string inputs (e.g., dict) and incorrectly embeds the exception in LOG.exception; update the logic to first check the type: if tool_args is already a dict, use it directly; if it's a string, call json.loads inside a try/except that catches json.decoder.JSONDecodeError and logs with LOG.exception("Failed to parse tool arguments") without interpolating the exception object; for any other types return a clear error or coerce safely (e.g., convert bytes to str before parsing) so TypeError is avoided and logging follows best practices.

smahdavi4 · 2025-09-29T03:03:44Z

Thanks for adding the responses api! I just want to share two small suggestions, in case they might be helpful:

Generalizing text completion, chat completion and responses into a “completion_type” and calling litellm text completion, litellm chat completion, or litellm responses api depending on the state. This allows for all types of completions with a single model instance and through a single argument in generate_async (so we don’t have to pass responses type to the model initializer). This will also allow for reusing the http clients and not having to reinitialize an openai client.
deprecating “use_completions_api” in favour of a tri-state “completion_type”

Kipok

@smahdavi4 could you also review this please? I looked through the changes on the high level, but didn't have a chance to dive into details

@smahdavi4 for your current comment I think @gwarmstrong found that responses api in litellm is in a fairly bad state and it's really hard to make it work, so going through a separate client is basically necessary until they fix it. About completion_type, I think maybe we can do it in another PR, that would be a breaking change and it's not fully clear how responses api will be eventually supported in litellm or other providers in a stable way, so we can also wait before doing a "first-class" support for it. But if we find the current structure inconvenient, we can definitely make this change

@gwarmstrong Since the changes are fairly big and touch many parts, could you please run slurm tests from this branch? Let me know if you need details on how to do this. I will also share another test offline that I will ask you to run for a non-public model that goes through tool-calling api to verify that nothing is broken there

Kipok · 2025-09-29T21:57:58Z

nemo_skills/inference/model/base.py

+        if extra_body:
+            request["extra_body"] = extra_body
+
+        return request


double return?

smahdavi4 · 2025-09-30T00:01:46Z

nemo_skills/inference/model/base.py

+        """Public method for model-specific parameter handling - override in subclasses"""
+        return request
+
+    def parse_chat_completion_response(self, response, **kwargs) -> dict:


Is this function inherited anywhere? If not, we can remove this function and directly call _parse_chat_completion_response

smahdavi4 · 2025-09-30T00:04:06Z

nemo_skills/inference/model/__init__.py

+    """A helper function to create a tool calling model.
+
+    Args:
+        model: Model name/path or model instance


Some of the function docstrings and some code comments are not informative enough (probably LLM artifacts). Maybe we could remove them to avoid redundancy?

smahdavi4 · 2025-09-30T00:16:15Z

nemo_skills/inference/model/base.py

+                    if hasattr(output_item, "content") and output_item.content:
+                        for content_item in output_item.content:
+                            if hasattr(content_item, "text"):
+                                reasoning_content += content_item.text + "\n"


Shall we somehow keep the order of events? If we want to consider a scenario where we want to distill the outputs of a model into a another model, we need a list of interleaved text and tools.

smahdavi4 · 2025-09-30T00:21:27Z

The PR looks great. Left some minor comments.

For the litellm responses, it might be a good idea to comment somewhere in the code the complications of the API, so we know for future reference when we come back to it and unify all the APIs. I tried the latest litellm with vllm tool calling and it seems to work very well (so I am not sure where it is to fix it later).

gwarmstrong · 2025-10-07T00:32:42Z

Closed in favor of #889

gwarmstrong added 8 commits September 24, 2025 11:50

WIP responses client

20b6a98

Signed-off-by: George Armstrong <georgea@nvidia.com>

WIP make python tool http runnable

6c9760a

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH add see protocol to python tool

06dd03c

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH enable response API for toolwrapper

04d15ef

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH support responses model type

2ef2683

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT revert unrelated python tool changes

70aac38

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT cleanup adapters

2eb538f

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT remove unused method

97b4f2f

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong requested review from Kipok and activatedgeek September 24, 2025 21:20

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

nemo_skills/inference/model/responses.py Outdated Show resolved Hide resolved

Kipok reviewed Sep 24, 2025

View reviewed changes

nemo_skills/inference/model/tool_call.py Outdated Show resolved Hide resolved

nemo_skills/mcp/adapters.py Outdated Show resolved Hide resolved

nemo_skills/pipeline/utils/server.py Outdated Show resolved Hide resolved

gwarmstrong added 2 commits September 24, 2025 15:36

MAINT move import to top

92e61b0

Signed-off-by: George Armstrong <georgea@nvidia.com>

Draft add client handler

7a1f019

Signed-off-by: George Armstrong <georgea@nvidia.com>

activatedgeek reviewed Sep 25, 2025

View reviewed changes

FIX up responses conversion

5688efe

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

MAINT switch to use_responses_api parameter

4e3a5a4

Signed-off-by: George Armstrong <georgea@nvidia.com> MAINT switch to use_responses_api parameter Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

nemo_skills/inference/model/base.py Show resolved Hide resolved

nemo_skills/inference/model/base.py Show resolved Hide resolved

gwarmstrong and others added 5 commits September 25, 2025 15:21

MAINT cleanup

8428242

Signed-off-by: George Armstrong <georgea@nvidia.com>

Revert server type changes

51fca89

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH error if no serialized output

3ea3bb6

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH improve non default arg errors

f8c6367

Signed-off-by: George Armstrong <georgea@nvidia.com>

Apply suggestion from @coderabbitai[bot]

ab7ad54

Moves timeout to the main body parameters Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

nemo_skills/inference/model/base.py Show resolved Hide resolved

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

gwarmstrong and others added 3 commits September 26, 2025 13:03

log args

8e137b8

Signed-off-by: George Armstrong <georgea@nvidia.com>

Apply suggestion from @coderabbitai[bot]

39b947e

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com>

Merge branch 'main' into georgea/response-api

503a9f1

gwarmstrong requested review from Kipok and activatedgeek September 26, 2025 21:12

coderabbitai bot reviewed Sep 26, 2025

View reviewed changes

Merge branch 'main' into georgea/response-api

ec7551f

Kipok reviewed Sep 29, 2025

View reviewed changes

nemo_skills/inference/model/base.py

if extra_body:

request["extra_body"] = extra_body

return request

Copy link

Collaborator

Kipok Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double return?

smahdavi4 reviewed Sep 30, 2025

View reviewed changes

gwarmstrong closed this Oct 7, 2025

Conversation

gwarmstrong commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Responses API Support for GPT-OSS Generation

Key Changes

1. Client Handler Architecture

2. Unified BaseModel with Client Type Support

3. Strict Parameter Validation System

4. Enhanced Configuration Support

Usage Examples

Pipeline Integration

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

smahdavi4 commented Sep 29, 2025

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

gwarmstrong commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading