Skip to content

Fix sglang tool calling#1070

Merged
gwarmstrong merged 4 commits intomainfrom
georgea/fix-sglang-tool-calling
Dec 4, 2025
Merged

Fix sglang tool calling#1070
gwarmstrong merged 4 commits intomainfrom
georgea/fix-sglang-tool-calling

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Dec 3, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • SGLang is now available as a supported model server type in order to pass "tool_choice": "auto" in sglang requests with tools--this should support some models like Kimi-K2 out of the box with SGLang
    • get_model() function now accepts an optional model_class parameter to specify custom model implementations via dotted paths or colon-delimited syntax.
  • Tests

    • Added GPU-based tool-calling tests for VLLM and SGLang server implementations.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

📝 Walkthrough

Walkthrough

This pull request adds dynamic model class loading support to the inference API, introduces a new SGLangModel variant that customizes tool request handling, and adds GPU-based tests to validate tool-calling behavior across VLLM and SGLang server types.

Changes

Cohort / File(s) Change Summary
Model Infrastructure
nemo_skills/inference/model/__init__.py, nemo_skills/inference/model/sglang.py
Added model_class parameter to get_model() to support dynamic class loading via locate() utility; updated registry mapping "sglang" from VLLMModel to new SGLangModel; created new SGLangModel class extending VLLMModel that injects "tool_choice": "auto" into chat request parameters when tools are provided.
Test Configuration
tests/gpu-tests/run_qwen.sh
Added pytest invocation for tool-calling tests (pytest tests/gpu-tests/test_tool_calling.py -s -x) to GPU test workflow.
Test Module
tests/gpu-tests/test_tool_calling.py
New test module with helper functions and two GPU-enabled test cases (test_vllm_tool_calling, test_sglang_tool_calling) that validate tool-call generation, output file creation, and correct line counts across model server types.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Dynamic class loading logic: Review the locate() utility integration and fallback behavior in get_model() to ensure registry and custom class paths are handled correctly.
  • Registry change semantics: Verify the "sglang": SGLangModel mapping and backward compatibility implications.
  • Tool-choice injection: Confirm the _build_chat_request_params() override correctly injects "tool_choice": "auto" only when tools are present and preserves parent behavior otherwise.
  • Test validation: Review the test file output parsing and tool-call detection logic to ensure test assertions are sound.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'Fix sglang tool calling' directly aligns with the main changes: introducing SGLangModel with tool-choice handling, updating the registry mapping, and adding corresponding GPU tests for tool-calling behavior in SGLang.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/fix-sglang-tool-calling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
nemo_skills/inference/model/sglang.py (1)

25-63: Clean implementation with correct tool_choice handling.

The override properly delegates to the parent class and conditionally injects tool_choice="auto" only when tools are provided. The docstring clearly documents the SGLang vs VLLM behavioral difference.

One minor fix per the static analysis hint:

-        extra_body: dict = None,
+        extra_body: dict | None = None,
nemo_skills/inference/model/__init__.py (1)

60-80: Clean implementation of dynamic model class loading.

The model_class parameter provides good flexibility for custom model implementations without modifying the registry. The docstring clearly documents both supported path syntaxes.

One edge case to consider: if model_class is None and server_type is invalid, models[server_type.lower()] will raise a KeyError. This was pre-existing behavior, but you might want to provide a more descriptive error message:

     if model_class is not None:
         loaded_class = locate(model_class)
     else:
+        server_key = server_type.lower()
+        if server_key not in models:
+            raise ValueError(f"Unknown server_type '{server_type}'. Choices: {list(models.keys())}")
-        loaded_class = models[server_type.lower()]
+        loaded_class = models[server_key]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48cbf77 and 30f4bcd.

📒 Files selected for processing (4)
  • nemo_skills/inference/model/__init__.py (3 hunks)
  • nemo_skills/inference/model/sglang.py (1 hunks)
  • tests/gpu-tests/run_qwen.sh (1 hunks)
  • tests/gpu-tests/test_tool_calling.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/gpu-tests/test_tool_calling.py (2)
tests/gpu-tests/utils.py (1)
  • require_env_var (18-23)
tests/conftest.py (1)
  • docker_rm (64-66)
nemo_skills/inference/model/__init__.py (2)
nemo_skills/mcp/utils.py (1)
  • locate (32-53)
nemo_skills/inference/model/sglang.py (1)
  • SGLangModel (18-63)
nemo_skills/inference/model/sglang.py (1)
nemo_skills/inference/model/vllm.py (1)
  • VLLMModel (27-148)
🪛 Ruff (0.14.7)
tests/gpu-tests/test_tool_calling.py

77-77: subprocess call with shell=True identified, security issue

(S602)


114-114: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"

(S108)


127-127: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"

(S108)

nemo_skills/inference/model/__init__.py

79-79: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/inference/model/sglang.py

41-41: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: gpu-tests-qwen
  • GitHub Check: unit-tests
  • GitHub Check: pre-commit
🔇 Additional comments (5)
tests/gpu-tests/run_qwen.sh (1)

20-22: LGTM!

Good placement of the tool calling tests after the contamination test, reusing the same Qwen3-4B-Instruct model. The comment correctly documents the model dependency.

tests/gpu-tests/test_tool_calling.py (3)

41-47: LGTM!

Good use of tempfile.mkstemp with proper os.fdopen pattern to avoid file descriptor leaks.


50-107: Well-structured test helper with proper cleanup.

The test logic is comprehensive: creates input, runs generation, validates output structure, and checks for tool usage. The finally block ensures temp file cleanup on both success and failure.

Regarding the shell=True warning (S602): this is acceptable here since the command is constructed from trusted sources (env vars, hardcoded paths, and test configuration), not user input.


110-133: Good test coverage for both server types.

The tests correctly differentiate VLLM (using --enable-auto-tool-choice server arg) from SGLang (relying on the SGLangModel class to inject tool_choice="auto" in the request body). The /tmp paths (S108 warning) are acceptable for test artifacts.

nemo_skills/inference/model/__init__.py (1)

17-17: Good reuse of existing utility.

Importing locate from nemo_skills.mcp.utils avoids code duplication and provides consistent dynamic class loading behavior.

@gwarmstrong gwarmstrong merged commit 549323a into main Dec 4, 2025
6 checks passed
@gwarmstrong gwarmstrong deleted the georgea/fix-sglang-tool-calling branch December 4, 2025 05:02
melllinia pushed a commit that referenced this pull request Dec 5, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant