Conversation
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
📝 WalkthroughWalkthroughThis pull request adds dynamic model class loading support to the inference API, introduces a new SGLangModel variant that customizes tool request handling, and adds GPU-based tests to validate tool-calling behavior across VLLM and SGLang server types. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
nemo_skills/inference/model/sglang.py (1)
25-63: Clean implementation with correct tool_choice handling.The override properly delegates to the parent class and conditionally injects
tool_choice="auto"only when tools are provided. The docstring clearly documents the SGLang vs VLLM behavioral difference.One minor fix per the static analysis hint:
- extra_body: dict = None, + extra_body: dict | None = None,nemo_skills/inference/model/__init__.py (1)
60-80: Clean implementation of dynamic model class loading.The
model_classparameter provides good flexibility for custom model implementations without modifying the registry. The docstring clearly documents both supported path syntaxes.One edge case to consider: if
model_classisNoneandserver_typeis invalid,models[server_type.lower()]will raise aKeyError. This was pre-existing behavior, but you might want to provide a more descriptive error message:if model_class is not None: loaded_class = locate(model_class) else: + server_key = server_type.lower() + if server_key not in models: + raise ValueError(f"Unknown server_type '{server_type}'. Choices: {list(models.keys())}") - loaded_class = models[server_type.lower()] + loaded_class = models[server_key]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
nemo_skills/inference/model/__init__.py(3 hunks)nemo_skills/inference/model/sglang.py(1 hunks)tests/gpu-tests/run_qwen.sh(1 hunks)tests/gpu-tests/test_tool_calling.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/gpu-tests/test_tool_calling.py (2)
tests/gpu-tests/utils.py (1)
require_env_var(18-23)tests/conftest.py (1)
docker_rm(64-66)
nemo_skills/inference/model/__init__.py (2)
nemo_skills/mcp/utils.py (1)
locate(32-53)nemo_skills/inference/model/sglang.py (1)
SGLangModel(18-63)
nemo_skills/inference/model/sglang.py (1)
nemo_skills/inference/model/vllm.py (1)
VLLMModel(27-148)
🪛 Ruff (0.14.7)
tests/gpu-tests/test_tool_calling.py
77-77: subprocess call with shell=True identified, security issue
(S602)
114-114: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"
(S108)
127-127: Probable insecure usage of temporary file or directory: "/tmp/nemo-skills-tests/"
(S108)
nemo_skills/inference/model/__init__.py
79-79: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/inference/model/sglang.py
41-41: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: gpu-tests-qwen
- GitHub Check: unit-tests
- GitHub Check: pre-commit
🔇 Additional comments (5)
tests/gpu-tests/run_qwen.sh (1)
20-22: LGTM!Good placement of the tool calling tests after the contamination test, reusing the same Qwen3-4B-Instruct model. The comment correctly documents the model dependency.
tests/gpu-tests/test_tool_calling.py (3)
41-47: LGTM!Good use of
tempfile.mkstempwith properos.fdopenpattern to avoid file descriptor leaks.
50-107: Well-structured test helper with proper cleanup.The test logic is comprehensive: creates input, runs generation, validates output structure, and checks for tool usage. The
finallyblock ensures temp file cleanup on both success and failure.Regarding the
shell=Truewarning (S602): this is acceptable here since the command is constructed from trusted sources (env vars, hardcoded paths, and test configuration), not user input.
110-133: Good test coverage for both server types.The tests correctly differentiate VLLM (using
--enable-auto-tool-choiceserver arg) from SGLang (relying on theSGLangModelclass to injecttool_choice="auto"in the request body). The/tmppaths (S108 warning) are acceptable for test artifacts.nemo_skills/inference/model/__init__.py (1)
17-17: Good reuse of existing utility.Importing
locatefromnemo_skills.mcp.utilsavoids code duplication and provides consistent dynamic class loading behavior.
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Summary by CodeRabbit
Release Notes
New Features
Kimi-K2out of the box with SGLangget_model()function now accepts an optionalmodel_classparameter to specify custom model implementations via dotted paths or colon-delimited syntax.Tests
✏️ Tip: You can customize this high-level summary in your review settings.