Fix sglang tool calling by gwarmstrong · Pull Request #1070 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-12-03T23:51:19Z

Summary by CodeRabbit

Release Notes

New Features
- SGLang is now available as a supported model server type in order to pass "tool_choice": "auto" in sglang requests with tools--this should support some models like Kimi-K2 out of the box with SGLang
- get_model() function now accepts an optional model_class parameter to specify custom model implementations via dotted paths or colon-delimited syntax.
Tests
- Added GPU-based tool-calling tests for VLLM and SGLang server implementations.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2025-12-03T23:54:00Z

📝 Walkthrough

Walkthrough

This pull request adds dynamic model class loading support to the inference API, introduces a new SGLangModel variant that customizes tool request handling, and adds GPU-based tests to validate tool-calling behavior across VLLM and SGLang server types.

Changes

Cohort / File(s)	Change Summary
Model Infrastructure `nemo_skills/inference/model/__init__.py`, `nemo_skills/inference/model/sglang.py`	Added `model_class` parameter to `get_model()` to support dynamic class loading via `locate()` utility; updated registry mapping `"sglang"` from `VLLMModel` to new `SGLangModel`; created new `SGLangModel` class extending `VLLMModel` that injects `"tool_choice": "auto"` into chat request parameters when tools are provided.
Test Configuration `tests/gpu-tests/run_qwen.sh`	Added pytest invocation for tool-calling tests (`pytest tests/gpu-tests/test_tool_calling.py -s -x`) to GPU test workflow.
Test Module `tests/gpu-tests/test_tool_calling.py`	New test module with helper functions and two GPU-enabled test cases (`test_vllm_tool_calling`, `test_sglang_tool_calling`) that validate tool-call generation, output file creation, and correct line counts across model server types.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Dynamic class loading logic: Review the locate() utility integration and fallback behavior in get_model() to ensure registry and custom class paths are handled correctly.
Registry change semantics: Verify the "sglang": SGLangModel mapping and backward compatibility implications.
Tool-choice injection: Confirm the _build_chat_request_params() override correctly injects "tool_choice": "auto" only when tools are present and preserves parent behavior otherwise.
Test validation: Review the test file output parsing and tool-call detection logic to ensure test assertions are sound.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title 'Fix sglang tool calling' directly aligns with the main changes: introducing SGLangModel with tool-choice handling, updating the registry mapping, and adding corresponding GPU tests for tool-calling behavior in SGLang.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/fix-sglang-tool-calling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

nemo_skills/inference/model/sglang.py (1)
25-63: Clean implementation with correct tool_choice handling.

The override properly delegates to the parent class and conditionally injects tool_choice="auto" only when tools are provided. The docstring clearly documents the SGLang vs VLLM behavioral difference.

One minor fix per the static analysis hint:
-        extra_body: dict = None,
+        extra_body: dict | None = None,
nemo_skills/inference/model/__init__.py (1)
60-80: Clean implementation of dynamic model class loading.

The model_class parameter provides good flexibility for custom model implementations without modifying the registry. The docstring clearly documents both supported path syntaxes.

One edge case to consider: if model_class is None and server_type is invalid, models[server_type.lower()] will raise a KeyError. This was pre-existing behavior, but you might want to provide a more descriptive error message:
     if model_class is not None:
         loaded_class = locate(model_class)
     else:
+        server_key = server_type.lower()
+        if server_key not in models:
+            raise ValueError(f"Unknown server_type '{server_type}'. Choices: {list(models.keys())}")
-        loaded_class = models[server_type.lower()]
+        loaded_class = models[server_key]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48cbf77 and 30f4bcd.

📒 Files selected for processing (4)

nemo_skills/inference/model/__init__.py (3 hunks)
nemo_skills/inference/model/sglang.py (1 hunks)
tests/gpu-tests/run_qwen.sh (1 hunks)
tests/gpu-tests/test_tool_calling.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

tests/gpu-tests/test_tool_calling.py (2)

tests/gpu-tests/utils.py (1)

require_env_var (18-23)

tests/conftest.py (1)

docker_rm (64-66)

nemo_skills/inference/model/__init__.py (2)

nemo_skills/mcp/utils.py (1)

locate (32-53)

nemo_skills/inference/model/sglang.py (1)

SGLangModel (18-63)

nemo_skills/inference/model/sglang.py (1)

nemo_skills/inference/model/vllm.py (1)

VLLMModel (27-148)

🪛 Ruff (0.14.7)

tests/gpu-tests/test_tool_calling.py

77-77: subprocess call with shell=True identified, security issue

(S602)

114-114: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"

(S108)

127-127: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"

(S108)

nemo_skills/inference/model/__init__.py

79-79: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/sglang.py

41-41: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: gpu-tests-qwen
GitHub Check: unit-tests
GitHub Check: pre-commit

🔇 Additional comments (5)

tests/gpu-tests/run_qwen.sh (1)

20-22: LGTM!

Good placement of the tool calling tests after the contamination test, reusing the same Qwen3-4B-Instruct model. The comment correctly documents the model dependency.

tests/gpu-tests/test_tool_calling.py (3)

41-47: LGTM!

Good use of tempfile.mkstemp with proper os.fdopen pattern to avoid file descriptor leaks.

50-107: Well-structured test helper with proper cleanup.

The test logic is comprehensive: creates input, runs generation, validates output structure, and checks for tool usage. The finally block ensures temp file cleanup on both success and failure.

Regarding the shell=True warning (S602): this is acceptable here since the command is constructed from trusted sources (env vars, hardcoded paths, and test configuration), not user input.

110-133: Good test coverage for both server types.

The tests correctly differentiate VLLM (using --enable-auto-tool-choice server arg) from SGLang (relying on the SGLangModel class to inject tool_choice="auto" in the request body). The /tmp paths (S108 warning) are acceptable for test artifacts.

nemo_skills/inference/model/__init__.py (1)

17-17: Good reuse of existing utility.

Importing locate from nemo_skills.mcp.utils avoids code duplication and provides consistent dynamic class loading behavior.

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

gwarmstrong added 4 commits December 3, 2025 11:35

TST add test for sglang tool calling

fcb134f

Signed-off-by: George Armstrong <georgea@nvidia.com>

FIX sglang tool calling

b3f0098

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT add output file to logging

6f20cbe

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH allow custom model client class

30f4bcd

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong added the run GPU tests label Dec 3, 2025

coderabbitai bot reviewed Dec 3, 2025

View reviewed changes

gwarmstrong merged commit 549323a into main Dec 4, 2025
6 checks passed

gwarmstrong deleted the georgea/fix-sglang-tool-calling branch December 4, 2025 05:02

melllinia pushed a commit that referenced this pull request Dec 5, 2025

Fix sglang tool calling (#1070)

92aed3a

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Fix sglang tool calling (#1070)

c74cd99

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Fix sglang tool calling (#1070)

ebedef2

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Fix sglang tool calling (#1070)

a8a2260

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix sglang tool calling#1070

Fix sglang tool calling#1070
gwarmstrong merged 4 commits intomainfrom
georgea/fix-sglang-tool-calling

gwarmstrong commented Dec 3, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gwarmstrong commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 3, 2025

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gwarmstrong commented Dec 3, 2025 •

edited

Loading