Add responses api type by smahdavi4 · Pull Request #889 · NVIDIA-NeMo/Skills

smahdavi4 · 2025-10-03T21:44:25Z

Adding responses api type (currently for vllm only)
Deprecating use_completions_api in favour of completion_type.
Removing all the sync logic.

Running example:

ns eval \
  --benchmarks aime25:2 \
  --cluster slurm \
  --model /hf_models/gpt-oss-20b \
  --server_gpus 8 \
  --server_type vllm \
  --output_dir /workspace/aime-responses \
  ++tool_modules="[nemo_skills.mcp.servers.python_tool::PythonTool]" \
  ++inference.tokens_to_generate=100000 \
  ++inference.endpoint_type=responses \
  --with_sandbox

Limitations with VLLM+GPT-OSS:

VLLM fails to parse most of the tool calls. Most of the times putting <channel> or other template related keywords in tool name
Usually first tool succeeds, but when calling tool for the second time, vllm usually fails even worse: it returns the tool call as a text reasoning content.

Summary by CodeRabbit

New Features
- Added explicit endpoint types (text, chat, responses), Responses API support, expanded async generation, and improved tool-calling with serialized outputs.
Deprecations / Breaking Changes
- Legacy completions flag replaced by inference.endpoint_type; legacy flag deprecated and mapped to text with migration guidance.
Documentation
- Examples updated to use async invocation and the new inference.endpoint_type syntax.
Chores
- Pinned a dependency version and disabled noisy logging worker for stability.

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

nemo_skills/inference/model/tool_call.py

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

nemo_skills/inference/model/tool_call.py (1)
71-75: Handle non-string tool arguments and use LOG.exception.

This is a duplicate of a previous review comment that remains unaddressed. The code has two issues:

json.loads(tool_args) raises TypeError (not JSONDecodeError) when tool_args is already a dict/list. Some providers return pre-parsed arguments, so this exception escapes the handler and breaks subsequent tool calls.

Line 73 uses LOG.error instead of LOG.exception, making debugging harder.

Apply this diff to fix both issues:
-        try:
-            tool_args = json.loads(tool_args)
-        except json.decoder.JSONDecodeError as e:
-            LOG.error(f"Tool arguments are not in JSON format: {tool_args}")
-            LOG.exception(e)
+        try:
+            if isinstance(tool_args, str):
+                tool_args = json.loads(tool_args)
+        except (json.JSONDecodeError, TypeError) as e:
+            LOG.exception(f"Tool arguments are not in JSON format: {tool_args}")
             return {"error": "Tool argument parsing failed."}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51f7d80 and 77fc5c9.

📒 Files selected for processing (1)

nemo_skills/inference/model/tool_call.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/inference/model/tool_call.py (2)

nemo_skills/mcp/adapters.py (3)

format_tool_list_by_completion_type (51-75)

format_tool_response_by_completion_type (95-110)

get_tool_details_by_completion_type (113-123)

nemo_skills/inference/model/base.py (1)

CompletionType (37-40)

🪛 Ruff (0.13.3)

nemo_skills/inference/model/tool_call.py

73-73: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

95-95: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (2)

nemo_skills/inference/model/tool_call.py (2)

23-31: LGTM! Good use of adapter functions.

The imports align well with the PR's objective to add completion_type support. The new adapter functions properly encapsulate the completion_type-specific formatting logic.

98-158: LGTM! Clean completion_type integration.

The method correctly threads completion_type through the entire tool calling flow. The use of adapter functions for tool formatting and the serialized_output approach align well with the PR's objectives.

nemo_skills/inference/model/tool_call.py

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_skills/inference/generate.py (1)
227-229: LGTM with optional refinement.

The hard deprecation approach is correct, providing clear migration guidance to users. The error message appropriately directs users to the new ++inference.completion_type=text parameter.

Optional: The static analysis tool suggests moving the error message to a constant to follow best practices (TRY003). While not critical, this could improve consistency if similar patterns appear elsewhere:
+_DEPRECATED_USE_COMPLETIONS_API_MSG = "use_completions_api is deprecated, please use ++inference.completion_type=text instead."
+
 def _post_init_deprecated_params(self):
     if self.use_completions_api:
-        raise ValueError("use_completions_api is deprecated, please use ++inference.completion_type=text instead.")
+        raise ValueError(_DEPRECATED_USE_COMPLETIONS_API_MSG)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77fc5c9 and 3c15220.

📒 Files selected for processing (1)

nemo_skills/inference/generate.py (10 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/inference/generate.py (1)

nemo_skills/inference/model/base.py (1)

CompletionType (37-40)

🪛 Ruff (0.13.3)

nemo_skills/inference/generate.py

229-229: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)

nemo_skills/inference/generate.py (6)

43-43: LGTM!

The import of CompletionType is correctly placed and necessary for the new completion type logic.

60-66: LGTM!

The completion_type field is well-defined with clear documentation covering all three modes (chat, text, responses). The default of CompletionType.chat maintains backward compatibility.

87-89: LGTM!

The deprecation comment clearly directs users to the new completion_type parameter. The field is retained to provide a meaningful error message when used, which aligns with the hard deprecation strategy discussed in previous reviews.

190-190: LGTM!

The call to _post_init_deprecated_params() ensures that deprecated parameter usage is caught during initialization, providing early feedback to users.

211-211: LGTM!

Correctly sets completion_type to CompletionType.text for Megatron server, replacing the old use_completions_api behavior. This aligns with Megatron's lack of chat completion support.

277-277: LGTM!

All logic correctly migrates from the deprecated use_completions_api to the new completion_type parameter:

Line 277: Chat template kwargs handling excludes text completions

Line 289: Tokenizer setup includes text completions (plus soft_fail and count_prompt_tokens cases)

Line 301: Tokens_to_generate validation applies only to text completions

Line 358: Tokenizer passed to prompt setup only for text completions

The logic treats "responses" mode similarly to "chat" mode in this file, with responses-specific handling presumably implemented in the model classes.

Also applies to: 289-289, 301-302, 358-358

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Kipok

Could you please test if responses api is supported in https://build.nvidia.com/openai/gpt-oss-20b? If so, then we can add it as a simple generation test in here to ensure this functionality doesn't break https://github.com/NVIDIA/NeMo-Skills/blob/main/tests/test_generation.py

nemo_skills/inference/eval/bfcl.py

nemo_skills/inference/model/base.py

nemo_skills/inference/patch_litellm_logging.py

Kipok · 2025-10-07T00:41:30Z

requirements/main.txt

 ipython
 iso639-lang
-litellm[caching] < 1.75.0  # some bug with asyncio.run hanging forever
+litellm[caching] == 1.77.5 # Requires patching the logging worker


do we have to pin to exact version? What happens if we use latest?

Because the patch explicitly change one of its classes, so if they modify that class in the future, we need to change that class's modification too.

ok, can youplease update the comment to add a little more details, so that when we look at this line in the future it's easy to remember why we pinned this and what we should test to unpin.

And the behavior here is that this keeps slurm jobs from existing and we waste gpus? But it only happens in some cases, not always?

yes, left more comments about that.

nemo_skills/inference/eval/bfcl.py

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 · 2025-10-07T01:33:21Z

Could you please test if responses api is supported in https://build.nvidia.com/openai/gpt-oss-20b? If so, then we can add it as a simple generation test in here to ensure this functionality doesn't break https://github.com/NVIDIA/NeMo-Skills/blob/main/tests/test_generation.py

I tested that endpoint, and it seems to be worse than vllm: Its function call responses are empty!

Kipok · 2025-10-07T04:45:00Z

ok, but should we still have a responses test with that endpoint, but without function calls? Just to ensure there is no obvious mistakes in the code, like some parameter not passed or some other issue like this?

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (2)

nemo_skills/mcp/adapters.py (1)
119-119: Replace assert with explicit validation.

The assert statement for validating tool_call["type"] can be disabled with Python's -O optimization flag, which could allow invalid tool calls to proceed silently in production environments.

Apply this diff to use explicit validation:
     elif endpoint_type == EndpointType.responses:
-        assert tool_call["type"] == "function_call", "Tool call must be a function call"
+        if tool_call.get("type") != "function_call":
+            raise ValueError(f"Tool call must be a function call, got type: {tool_call.get('type')}")
         tool_name = tool_call["name"]
         tool_args = tool_call["arguments"]
Based on past review comments and security best practices.
nemo_skills/inference/model/tool_call.py (1)
93-96: Add strict=True to zip() for safety.

The zip() call should use strict=True to ensure tool_calls and tool_results have matching lengths. This prevents silent bugs if the lists don't match (e.g., if a tool call fails to return a result).

Apply this diff:
         return [
             format_tool_response_by_endpoint_type(tool_call, tool_result, endpoint_type)
-            for tool_call, tool_result in zip(tool_calls, tool_results)
+            for tool_call, tool_result in zip(tool_calls, tool_results, strict=True)
         ]
Based on past review comments and static analysis hints.

🧹 Nitpick comments (4)

docs/basics/prompt-format.md (1)

111-112: Reflect the new completion_type flag in this doc snippet

The CLI now steers completions vs. responses via the completion_type setting (per this PR’s goals). This example still references only ++inference.endpoint_type=text, which will leave readers without guidance on the new flag. Update the prose (and sample command, if applicable) to mention the completion_type setting alongside the endpoint type so the instructions stay accurate.
nemo_skills/inference/eval/bfcl.py (1)
120-135: Improve error logging in exception handler.

The exception handler logs the error but can be simplified by using logging.exception() which automatically includes the traceback and exception details.

Apply this diff to improve the logging:
         try:
             fmted_prompt = self.message_formatter(messages, tools=tools)
         except Exception as e:
-            # Sometimes the parsed tool-call is a string, which is not JSON serializable
-            # Putting a debugging here in case it happens in the future and we need to address it.
             LOG.info(f"Messages: {messages}, Tools: {tools}")
-            LOG.error(f"Error formatting prompt: {e}")
-            raise e
+            LOG.exception("Error formatting prompt")
+            raise
Based on static analysis hints.
nemo_skills/inference/model/base.py (1)

445-445: Consider removing unused kwargs parameter.

The kwargs parameter is not used in this method. If it's included for API consistency with other parse methods, that's acceptable, but consider adding a comment explaining this or removing it if not needed.

Based on static analysis hints.
nemo_skills/inference/model/tool_call.py (1)
71-75: Improve JSON parsing error handling.

The current error handling only catches JSONDecodeError but doesn't handle cases where tool_args is already a parsed dict/list (not a string), which would raise TypeError. Additionally, the logging can be simplified.

Apply this diff to handle both cases:
         try:
-            tool_args = json.loads(tool_args)
+            if isinstance(tool_args, str):
+                tool_args = json.loads(tool_args)
+            elif not isinstance(tool_args, dict):
+                raise ValueError(f"Tool arguments must be a string or dict, got {type(tool_args)}")
         except json.decoder.JSONDecodeError as e:
-            LOG.error(f"Tool arguments are not in JSON format: {tool_args}")
-            LOG.exception(e)
+            LOG.exception("Tool arguments are not in valid JSON format: %s", tool_args)
             return {"error": "Tool argument parsing failed."}
Based on past review comments and static analysis hints.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5046adb and 3a5b96b.

📒 Files selected for processing (15)

docs/basics/prompt-format.md (1 hunks)
docs/pipelines/generation.md (7 hunks)
docs/releases/openmathinstruct2/dataset.md (6 hunks)
docs/releases/openmathreasoning/evaluation.md (2 hunks)
docs/tutorials/posts/gpt-oss-python.md (2 hunks)
nemo_skills/dataset/ruler/prepare.py (1 hunks)
nemo_skills/inference/eval/bfcl.py (2 hunks)
nemo_skills/inference/generate.py (10 hunks)
nemo_skills/inference/model/base.py (8 hunks)
nemo_skills/inference/model/code_execution.py (8 hunks)
nemo_skills/inference/model/openai.py (1 hunks)
nemo_skills/inference/model/parallel_thinking.py (2 hunks)
nemo_skills/inference/model/tool_call.py (6 hunks)
nemo_skills/mcp/adapters.py (4 hunks)
tests/slurm-tests/gpt_oss_python_aime25/run_test.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

docs/releases/openmathreasoning/evaluation.md
nemo_skills/dataset/ruler/prepare.py
tests/slurm-tests/gpt_oss_python_aime25/run_test.py
docs/releases/openmathinstruct2/dataset.md
docs/pipelines/generation.md
docs/tutorials/posts/gpt-oss-python.md

🧰 Additional context used

🧬 Code graph analysis (8)

nemo_skills/inference/model/code_execution.py (1)

nemo_skills/inference/model/base.py (2)

BaseModel (43-509)

EndpointType (37-40)

nemo_skills/inference/model/parallel_thinking.py (1)

nemo_skills/inference/model/base.py (2)

BaseModel (43-509)

EndpointType (37-40)

nemo_skills/mcp/adapters.py (1)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

nemo_skills/inference/model/openai.py (2)

nemo_skills/inference/model/base.py (2)

_build_responses_request_params (205-206)

_build_chat_request_params (198-199)

nemo_skills/inference/model/vllm.py (2)

_build_responses_request_params (142-148)

_build_chat_request_params (102-140)

nemo_skills/inference/model/base.py (4)

nemo_skills/inference/patch_litellm_logging.py (1)

patch_litellm_logging_worker (59-71)

nemo_skills/inference/model/openai.py (1)

_build_responses_request_params (155-160)

nemo_skills/inference/model/vllm.py (1)

_build_responses_request_params (142-148)

nemo_skills/inference/model/tool_call.py (1)

generate_async (98-160)

nemo_skills/inference/model/tool_call.py (2)

nemo_skills/mcp/adapters.py (3)

format_tool_list_by_endpoint_type (51-76)

format_tool_response_by_endpoint_type (96-111)

get_tool_details_by_endpoint_type (114-124)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

nemo_skills/inference/generate.py (1)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

nemo_skills/inference/eval/bfcl.py (1)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

🪛 Ruff (0.13.3)

nemo_skills/mcp/adapters.py

76-76: Avoid specifying long messages outside the exception class

(TRY003)

111-111: Avoid specifying long messages outside the exception class

(TRY003)

123-123: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/base.py

291-291: Avoid specifying long messages outside the exception class

(TRY003)

445-445: Unused method argument: kwargs

(ARG002)

502-502: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/tool_call.py

73-73: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

95-95: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

nemo_skills/inference/generate.py

229-229: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/eval/bfcl.py

126-126: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

127-127: Use raise without specifying exception name

Remove exception name

(TRY201)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (7)

nemo_skills/inference/model/base.py (1)

491-503: Robust serialization with clear error handling.

The _serialize_output method handles both responses and chat completion formats cleanly. The error message for unsupported types is clear and actionable.

nemo_skills/inference/model/openai.py (1)

155-160: Clean adapter for responses API format.

The method correctly delegates to the existing chat request builder and remaps the keys to match the responses API surface (messages → input, max_completion_tokens → max_output_tokens). This avoids code duplication while providing the necessary format transformation.

nemo_skills/inference/model/parallel_thinking.py (1)

55-55: Clean migration to endpoint_type enum.

The replacement of use_completions_api: bool = False with endpoint_type: EndpointType = EndpointType.chat is a clean refactor that maintains backward compatibility in behavior while providing better clarity and extensibility.

nemo_skills/inference/model/code_execution.py (1)

68-68: Proper endpoint_type propagation through code execution flow.

The endpoint_type parameter is correctly threaded through the entire code execution call chain (_generate_single, generate_async, _stream_single) and included in request dictionaries, ensuring consistent behavior with the new endpoint-type based routing.

Also applies to: 113-113, 263-263, 321-321, 336-336

nemo_skills/inference/generate.py (2)

60-66: Well-documented endpoint_type configuration.

The new endpoint_type field is clearly documented with examples for each endpoint type ("chat", "text", "responses") and includes guidance on tokenizer override behavior. This provides good clarity for users migrating from the deprecated use_completions_api flag.

227-229: Explicit deprecation prevents silent errors.

The hard deprecation of use_completions_api with a clear error message and migration guidance is the right approach. This addresses the concerns raised in past review comments about silent failures being worse than breaking changes, ensuring users are immediately aware they need to update their configuration.

nemo_skills/inference/model/tool_call.py (1)

112-113: Solid tool invocation flow with endpoint-type awareness.

The tool invocation logic correctly:

Formats tools based on endpoint type

Propagates endpoint_type through all generation calls

Serializes outputs appropriately

Handles tool call execution with proper error handling

The integration of endpoint_type throughout the flow maintains consistency with the broader refactor.

Also applies to: 138-147

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_skills/inference/patch_litellm_logging.py (1)
28-59: Consider adding type hints for completeness.

While the current implementation is functional, adding complete type hints would improve code clarity and maintainability. Consider adding return type hints (e.g., -> None) to all methods.

Example:
-    def start(self) -> None:
+    def start(self) -> None:
         pass

-    async def stop(self) -> None:
+    async def stop(self) -> None:
         pass
Note: Some methods already have -> None, so this would ensure consistency across all methods.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a5b96b and 3383408.

📒 Files selected for processing (2)

nemo_skills/inference/patch_litellm_logging.py (1 hunks)
requirements/main.txt (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

requirements/main.txt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (3)

nemo_skills/inference/patch_litellm_logging.py (3)

15-23: Past review comment addressed.

The docstring now provides the requested details about why this patch is needed, including the specific symptoms (async_loop gets stuck), impact (slurm jobs don't exit, GPU waste), and environment differences (Docker vs. local).

43-49: Coroutine closing prevents resource leaks.

The enqueue and ensure_initialized_and_enqueue methods correctly close coroutines to prevent resource leaks and suppress "coroutine was never awaited" warnings. This is the appropriate approach for a no-op implementation meant to disable functionality.

61-73: LGTM!

The patch function correctly replaces both the LoggingWorker class and the GLOBAL_LOGGING_WORKER instance, and gracefully handles version compatibility by catching ModuleNotFoundError. The implementation is safe to call during module import.

Kipok

lgtm, as long as tests pass

…sponses-api Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

requirements/main.txt (1)
30-30: Document the pin rationale.

Please capture the concrete failure mode and what to retest before unpinning so future upgrades aren’t blind. Right now the comment just says the patch exists, but not why it matters.
-litellm[caching] == 1.77.5 # Requires patching the logging worker (See nemo_skills/inference/patch_litellm_logging.py)
+litellm[caching] == 1.77.5 # Pin because litellm's logging worker can block Slurm job teardown; we patch it (see nemo_skills/inference/patch_litellm_logging.py) and must re-run the multi-GPU teardown test before unpinning.
nemo_skills/inference/eval/bfcl.py (1)
133-140: Consider using logging.exception and bare raise.

The try/except block is useful for debugging, but consider these minor improvements:

Line 139: Use logging.exception(...) instead of logging.error(...) to automatically include the exception traceback.

Line 140: Use bare raise instead of raise e for cleaner exception re-raising.

Apply this diff:
         try:
             fmted_prompt = self.message_formatter(messages, tools=tools)
         except Exception as e:
             # Sometimes the parsed tool-call is a string, which is not JSON serializable
             # Putting a debugging here in case it happens in the future and we need to address it.
             LOG.info(f"Messages: {messages}, Tools: {tools}")
-            LOG.error(f"Error formatting prompt: {e}")
-            raise e
+            LOG.exception("Error formatting prompt")
+            raise
nemo_skills/inference/generate.py (1)
227-229: Hard deprecation approach aligns with past discussion.

The validation correctly raises an error when use_completions_api=True, forcing users to migrate to endpoint_type. This aligns with the team's decision to hard-deprecate rather than maintain backward compatibility. The error message clearly directs users to the new parameter.

Optional: Address static analysis hint.

Ruff suggests avoiding long error messages inline (TRY003). If preferred, you could extract the message to a constant:
+    _DEPRECATED_USE_COMPLETIONS_API_MSG = (
+        "use_completions_api is deprecated, please use ++inference.endpoint_type=text instead."
+    )
+
     def _post_init_deprecated_params(self):
         if self.use_completions_api:
-            raise ValueError("use_completions_api is deprecated, please use ++inference.endpoint_type=text instead.")
+            raise ValueError(self._DEPRECATED_USE_COMPLETIONS_API_MSG)
However, this is a minor style preference and not required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3383408 and fc8aca5.

📒 Files selected for processing (3)

nemo_skills/inference/eval/bfcl.py (2 hunks)
nemo_skills/inference/generate.py (10 hunks)
requirements/main.txt (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

nemo_skills/inference/generate.py (1)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

nemo_skills/inference/eval/bfcl.py (1)

nemo_skills/inference/model/base.py (1)

EndpointType (37-40)

🪛 Ruff (0.13.3)

nemo_skills/inference/generate.py

229-229: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/eval/bfcl.py

139-139: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

140-140: Use raise without specifying exception name

Remove exception name

(TRY201)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: unit-tests

🔇 Additional comments (5)

nemo_skills/inference/eval/bfcl.py (2)

41-41: LGTM!

The import is necessary for the endpoint_type setting added to construct_input_dict.

141-148: LGTM!

Explicitly setting endpoint_type = EndpointType.text is correct for BFCL's use case and aligns with the broader shift toward endpoint-type aware configurations across the codebase. Based on past review comments, this prevents unwanted assumptions about completion type based on input format.

nemo_skills/inference/generate.py (3)

60-66: LGTM! Clean migration to enum-based endpoint configuration.

The endpoint_type field with the EndpointType enum is a significant improvement over the boolean use_completions_api. The comprehensive documentation clearly explains the behavior for each endpoint type (chat, text, responses), and the enum provides better type safety and extensibility.

211-211: LGTM! Correct migration for Megatron server.

Setting endpoint_type = EndpointType.text for Megatron server correctly replaces the previous use_completions_api logic.

277-277: LGTM! Consistent logic updates across all usage points.

All conditional checks have been correctly updated to use endpoint_type:

Line 277: endpoint_type != EndpointType.text for chat_template_kwargs (includes chat and responses)

Line 289: endpoint_type == EndpointType.text for tokenizer setup

Line 301: endpoint_type == EndpointType.text for tokens_to_generate validation

Line 361: endpoint_type == EndpointType.text for prompt tokenizer passing

The logic consistently treats the responses API the same as the chat API in this file, with model-specific implementation details handled in the model classes.

Also applies to: 289-289, 301-302, 361-361

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 added 30 commits October 1, 2025 18:01

Remove sync apis

1a5ec3f

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Remove sync apis

b83f930

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Remove sync apis

c5bb015

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

add responess api type to generate

98fb075

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

responses fix

836f580

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

responses fix

aa22312

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

fix

2946fac

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

complete tool calls

50044a5

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

dbb1817

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

c7974cf

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

eb6f8a3

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

3b31189

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

b486f23

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

988bc0b

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

3b46f64

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

42bb1f3

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

7c12f93

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

0a715ef

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

7a2e5d7

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

fix

10ac8f9

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

bump up litellm version

d0d324f

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

4886d73

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

cancel async tasks

f4263fa

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

cancel async tasks

f30b8cb

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

cancel async tasks

6909820

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

57f33ec

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

1d412cc

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

f6b8a5f

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

833d94a

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

debug

7ba7529

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

gwarmstrong reviewed Oct 6, 2025

View reviewed changes

nemo_skills/inference/model/tool_call.py Outdated Show resolved Hide resolved

logging instead of print

77fc5c9

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

nemo_skills/inference/model/tool_call.py Show resolved Hide resolved

hard deprecate use_completions_api

3c15220

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai bot reviewed Oct 6, 2025

View reviewed changes

smahdavi4 added 2 commits October 6, 2025 15:56

log available tools

0f3d7f2

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

make tool strict

9f4b79b

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

gwarmstrong mentioned this pull request Oct 7, 2025

Add Responses API #844

Closed

Kipok reviewed Oct 7, 2025

View reviewed changes

nemo_skills/inference/eval/bfcl.py Show resolved Hide resolved

more comments + strict function calling

5046adb

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

smahdavi4 added 4 commits October 7, 2025 10:02

rename completion type to endpoint type

c225fb6

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

implement responses for openai

4b6a9d3

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

merge with main

3a5b96b

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

more comments on litellm failure

3383408

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

Kipok approved these changes Oct 7, 2025

View reviewed changes

Merge branch 'main' of github.com:NVIDIA/NeMo-Skills into smahdavi/re…

fc8aca5

…sponses-api Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Kipok merged commit 475b036 into main Oct 7, 2025
5 of 6 checks passed

Kipok deleted the smahdavi/responses-api branch October 7, 2025 21:59

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025

Add responses api type (NVIDIA-NeMo#889)

a94d82d

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

SeanNaren pushed a commit that referenced this pull request Oct 9, 2025

Add responses api type (#889)

2bb05b2

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Oct 9, 2025

Add responses api type (#889)

3af8f58

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Oct 29, 2025

Add responses api type (#889)

9aa9b8a

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

coderabbitai bot mentioned this pull request Dec 25, 2025

Audio OpenAI format #1144

Closed

coderabbitai bot mentioned this pull request Jan 9, 2026

Audio input output integration #1157

Merged

Conversation

smahdavi4 commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Limitations with VLLM+GPT-OSS:

Summary by CodeRabbit

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kipok Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

smahdavi4 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Kipok Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

smahdavi4 Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

smahdavi4 commented Oct 7, 2025

Uh oh!

Kipok commented Oct 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smahdavi4 commented Oct 3, 2025 •

edited

Loading