Fix run.Script refactor by gwarmstrong · Pull Request #1133 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-12-18T22:22:19Z

Re-applies #1052 with fixes for multi-node

Summary by CodeRabbit

Release Notes

New Features
- Added support for running generation tasks with multiple models simultaneously, enabling coordinated multi-model workflows through a single entry point.
Improvements
- Enhanced GPU test reliability with heartbeat monitoring to track long-running test progress.
Bug Fixes
- Updated dataset evaluation to use correct dataset identifier in test suite.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

This reverts commit 1c0722a. FIX multi-node pipeline creation Signed-off-by: George Armstrong <georgea@nvidia.com> remove hosntame ref change Signed-off-by: George Armstrong <georgea@nvidia.com> make param span_group_nodes Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2025-12-18T22:26:04Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive refactor of the NeMo Skills pipeline architecture, replacing string-based command handling with Script objects (BaseJobScript, ServerScript, SandboxScript, GenerationClientScript, EvaluatorClientScript) and adding multi-model generation support. The generate pipeline is restructured via _create_job_unified to handle per-model server configurations and cross-model references. The evaluator pipeline adopts the new Script-based serving and client orchestration. Declarative job planning is updated to work with Script objects and support heterogeneous multi-group execution. GPU test workflow is enhanced with a heartbeat process, and dataset exclusions are updated.

Changes

Cohort / File(s)	Summary
GPU Tests Infrastructure `.github/workflows/gpu_tests.yml`	GPU test runner now launches a background heartbeat process during test execution, captures its PID, and ensures cleanup and proper exit code propagation.
Script Base Classes `nemo_skills/pipeline/utils/scripts.py`	New module introducing Script wrapper hierarchy: BaseJobScript (heterogeneous SLURM support, hostname resolution, installation command wrapping), ServerScript (port allocation, server command building, address exposition), SandboxScript (sandbox command/environment wrapping), GenerationClientScript (lazy runtime command building with cross-component references).
Multi-Model Generation Refactor `nemo_skills/pipeline/generate.py`	Replaces `_create_commandgroup_from_config` with `_create_job_unified` supporting multi-model setups via list-type parameters (model, server_address, server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container). Adds per-model server/sandbox/client component composition, parameter normalization, and returns CommandGroup list.
Script-Based Evaluator Pipeline `nemo_skills/pipeline/nemo_evaluator.py`	Replaces vLLM server command logic with ServerScript-backed serving. Introduces EvaluatorClientScript for dynamic runtime URL resolution. Updates internal functions to accept plain dicts instead of OmegaConf DictConfig for serialization compatibility.
Script-Based Declarative Job Planning `nemo_skills/pipeline/utils/declarative.py`	Replaces Command.command (string/callable) with Command.script (Script object). Updates prepare_for_execution to return Script objects. Adds local path rewriting for executor="none". Introduces per-group job naming and HardwareConfig.num_tasks field. Refactors job planning to operate on Script objects throughout.
Generation Utilities `nemo_skills/pipeline/utils/generation.py`	Adds normalize_models_config and normalize_parameter functions for multi-model parameter handling. Extends get_generation_cmd with server_addresses, model_names, server_types parameters for multi-model Hydra overrides.
Pipeline Utils Exports `nemo_skills/pipeline/utils/__init__.py`	Exports new public functions: normalize_models_config and normalize_parameter.
Dataset Test Update `tests/gpu-tests/test_eval.py`	Replaces "aalcr" with "mrcr" in EXCLUDED_DATASETS.
Test Updates `tests/test_declarative_pipeline.py`, `tests/test_generation.py`, `tests/test_nemo_evaluator_pipeline.py`	Introduce DummyScript and update test expectations to verify Script instances and attributes instead of string commands. Replace assertions on metadata/commands with Script type and field checks (num_tasks, num_gpus, port, log_prefix). Update calls to _create_job_unified. Validate cross-component script references.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Script inheritance hierarchy and runtime resolution: BaseJobScript's hostname_ref and port allocation mechanism across heterogeneous jobs; tight interdependencies between ServerScript, SandboxScript, and GenerationClientScript for cross-component reference resolution
Multi-model parameter normalization and grouping: normalize_models_config and normalize_parameter logic; _create_job_unified's per-model server/client composition and CommandGroup aggregation
Declarative job planning refactor: Conversion from string commands to Script objects; path rewriting for executor="none"; two-pass planning loop and per-group job naming; HardwareConfig.num_tasks integration
Test assertion updates: Multiple test files heavily rewritten with new Script-based expectations; verify proper migration of all test scenarios
Cross-file integration: Changes span generate.py, nemo_evaluator.py, and declarative.py with cascading structural dependencies

Possibly related PRs

Use run.Script for generate pipeline #1052: Directly related—both PRs introduce the same Script-based architecture (ServerScript, SandboxScript, GenerationClientScript/EvaluatorClientScript), refactor generate/nemo_evaluator to multi-model support, and update declarative Command to use run.Script
Revert "Use run.Script for generate pipeline (#1052)" #1125: Directly related—reverts this PR's core Script-based refactor and multi-model changes, undoing the run.Script architecture and restoring string-based command handling
Evaluation on OJBench #848: Related at code level—modifies nemo_skills/pipeline/generate.py to add keep_mounts_for_sandbox parameter, which overlaps with this PR's generate.py refactoring

Suggested reviewers

activatedgeek
Kipok

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.60% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'Fix run.Script refactor' is vague and generic, using a non-descriptive term 'Fix' without clarifying what specific issue is being addressed or what the main change involves.	Provide a more specific title that describes the actual changes or problem being fixed, such as 'Refactor Command and Script handling for multi-model pipeline support' or 'Update Script-based command execution for declarative pipeline.'

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/fix-run-script-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (5)

nemo_skills/pipeline/utils/scripts.py (1)
283-293: Consider validating env_overrides format.

The parsing at line 288-289 assumes each override contains =. If a malformed entry is passed (e.g., "INVALID"), split("=", 1) returns a single-element list, causing key, value = ... to raise ValueError.
🔎 Proposed defensive handling
             if self.env_overrides:
                 for override in self.env_overrides:
-                    key, value = override.split("=", 1)
-                    env[key] = value
+                    if "=" not in override:
+                        raise ValueError(f"Invalid env_override format '{override}': expected KEY=VALUE")
+                    key, value = override.split("=", 1)
+                    env[key] = value
tests/test_nemo_evaluator_pipeline.py (1)
20-27: Minor: Consolidate imports from the same module.

The imports from nemo_skills.pipeline.nemo_evaluator are split across two separate import statements unnecessarily.
🔎 Suggested consolidation
-from nemo_skills.pipeline.nemo_evaluator import (
-    EvaluatorClientScript,
-)
-from nemo_skills.pipeline.nemo_evaluator import (
-    nemo_evaluator as nemo_evaluator_fn,
-)
+from nemo_skills.pipeline.nemo_evaluator import (
+    EvaluatorClientScript,
+    nemo_evaluator as nemo_evaluator_fn,
+)
nemo_skills/pipeline/generate.py (2)
95-95: Unused loop variable model_path.

The variable model_path is defined in the loop but not used. Since the model path is obtained from server_config["model_path"] on line 111, this outer variable is redundant.
🔎 Suggested fix
-    for model_idx, (model_path, server_config) in enumerate(zip(models, server_configs)):
+    for model_idx, (_, server_config) in enumerate(zip(models, server_configs, strict=True)):
91-94: Consider adding strict=True to zip() for safety.

If models and server_configs have mismatched lengths, zip() will silently truncate to the shorter list, potentially causing confusing behavior. Adding strict=True will raise a ValueError for length mismatches.
🔎 Suggested fix
-    for model_idx, (model_path, server_config) in enumerate(zip(models, server_configs)):
+    for model_idx, (_, server_config) in enumerate(zip(models, server_configs, strict=True)):
nemo_skills/pipeline/utils/declarative.py (1)
223-256: Unused cluster_config parameter.

The cluster_config parameter is declared but never used in the method body. This appears to be a remnant from the refactoring. Consider removing it if not needed, or adding a comment if reserved for future use.
🔎 Suggested fix
-    def prepare_for_execution(self, cluster_config: Dict) -> Tuple[run.Script, Dict]:
-        """Prepare script for execution.
-
-        This method:
-        1. Evaluates lazy commands (if script.inline is callable)
-        2. Builds execution config from Script fields
-
-        Returns:
-            Tuple of (Script_object, execution_config)
-        """
+    def prepare_for_execution(self) -> Tuple[run.Script, Dict]:
+        """Prepare script for execution.
+
+        This method:
+        1. Evaluates lazy commands (if script.inline is callable)
+        2. Builds execution config from Script fields
+
+        Returns:
+            Tuple of (Script_object, execution_config)
+        """
If removing the parameter, also update the call site at line 501:
-                script, exec_config = self._prepare_command(command, cluster_config)
+                script, exec_config = self._prepare_command(command)
And _prepare_command at line 495:
-    def _prepare_command(self, command, cluster_config: Dict) -> Tuple[run.Script, Dict]:
+    def _prepare_command(self, command) -> Tuple[run.Script, Dict]:

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c01c0a7 and 4459850.

📒 Files selected for processing (11)

.github/workflows/gpu_tests.yml (1 hunks)
nemo_skills/pipeline/generate.py (9 hunks)
nemo_skills/pipeline/nemo_evaluator.py (8 hunks)
nemo_skills/pipeline/utils/__init__.py (1 hunks)
nemo_skills/pipeline/utils/declarative.py (11 hunks)
nemo_skills/pipeline/utils/generation.py (5 hunks)
nemo_skills/pipeline/utils/scripts.py (1 hunks)
tests/gpu-tests/test_eval.py (1 hunks)
tests/test_declarative_pipeline.py (25 hunks)
tests/test_generation.py (2 hunks)
tests/test_nemo_evaluator_pipeline.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

tests/test_declarative_pipeline.py (2)

nemo_skills/pipeline/utils/scripts.py (2)

set_inline (104-106)

hostname_ref (108-122)

nemo_skills/pipeline/utils/declarative.py (4)

Command (211-259)

prepare_for_execution (223-256)

get_name (258-259)

run (356-493)

nemo_skills/pipeline/utils/__init__.py (1)

nemo_skills/pipeline/utils/generation.py (2)

normalize_models_config (30-59)

normalize_parameter (62-102)

nemo_skills/pipeline/utils/scripts.py (6)

nemo_skills/pipeline/utils/commands.py (1)

sandbox_command (77-111)

nemo_skills/pipeline/utils/exp.py (1)

install_packages_wrap (368-408)

nemo_skills/pipeline/utils/generation.py (1)

get_generation_cmd (360-494)

nemo_skills/pipeline/utils/server.py (2)

get_free_port (43-59)

get_server_command (114-227)

nemo_skills/utils.py (1)

get_logger_name (39-43)

tests/test_declarative_pipeline.py (2)

set_inline (39-40)

hostname_ref (42-45)

nemo_skills/pipeline/generate.py (4)

nemo_skills/pipeline/utils/scripts.py (2)

GenerationClientScript (297-428)

ServerScript (126-223)

nemo_skills/pipeline/utils/declarative.py (3)

CommandGroup (273-286)

Command (211-259)

HardwareConfig (263-270)

nemo_skills/pipeline/utils/server.py (2)

SupportedServers (32-40)

should_get_random_port (62-63)

nemo_skills/pipeline/utils/generation.py (3)

normalize_models_config (30-59)

normalize_parameter (62-102)

configure_client (519-579)

tests/test_generation.py (2)

nemo_skills/pipeline/generate.py (2)

generate (208-645)

_create_job_unified (50-203)

nemo_skills/pipeline/utils/scripts.py (1)

ServerScript (126-223)

tests/test_nemo_evaluator_pipeline.py (3)

nemo_skills/pipeline/nemo_evaluator.py (2)

nemo_evaluator (113-421)

EvaluatorClientScript (732-781)

nemo_skills/pipeline/utils/declarative.py (2)

Command (211-259)

CommandGroup (273-286)

nemo_skills/pipeline/utils/scripts.py (1)

ServerScript (126-223)

🪛 Ruff (0.14.8)

tests/test_declarative_pipeline.py

587-587: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

590-590: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

676-676: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

nemo_skills/pipeline/utils/generation.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)

58-58: Avoid specifying long messages outside the exception class

(TRY003)

99-102: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/pipeline/utils/declarative.py

223-223: Unused method argument: cluster_config

(ARG002)

nemo_skills/pipeline/generate.py

95-95: Loop control variable model_path not used within loop body

Rename unused model_path to _model_path

(B007)

95-95: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

234-238: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

239-243: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

244-248: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

249-253: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

254-258: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

259-263: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

264-268: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

422-424: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_generation.py

174-174: Probable insecure usage of temporary file or directory: "/tmp/out"

(S108)

186-186: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: gpu-tests-qwen
GitHub Check: unit-tests

🔇 Additional comments (31)

tests/gpu-tests/test_eval.py (1)

46-46: LGTM!

Dataset exclusion updated from "aalcr" to "mrcr". The change aligns with the stated update to dataset exclusions.

.github/workflows/gpu_tests.yml (1)

55-63: LGTM!

The heartbeat mechanism is a good addition for long-running GPU tests to prevent CI timeouts due to lack of output. The implementation correctly:

Captures the heartbeat PID for cleanup

Preserves the test script's exit code

Gracefully handles heartbeat cleanup with || true

Propagates the correct exit status

nemo_skills/pipeline/utils/scripts.py (2)

62-123: LGTM!

BaseJobScript provides a clean abstraction for heterogeneous job support with:

Proper handling of both callable and string inline commands in __post_init__

Correct use of object.__setattr__ for frozen dataclass mutation

Well-documented hostname_ref() with appropriate localhost fallback

125-223: LGTM!

ServerScript correctly encapsulates server configuration with automatic port allocation and proper command building. The get_address() method provides clean cross-component communication support.

nemo_skills/pipeline/utils/__init__.py (1)

52-53: LGTM!

New normalization utilities normalize_models_config and normalize_parameter are correctly exported, providing convenient access to multi-model configuration helpers through the utils package namespace.

tests/test_generation.py (2)

23-24: LGTM!

Imports correctly updated to use the new _create_job_unified API and ServerScript class.

156-193: LGTM!

Test correctly updated to use the new _create_job_unified API with proper validation of:

ServerScript instance type

num_tasks and num_gpus propagation

Hardware configuration alignment

The static analysis hints about /tmp/out and /tmp/logs are false positives for test code where these are placeholder paths for the test configuration, not actual security-sensitive operations.

tests/test_nemo_evaluator_pipeline.py (2)

96-147: Test coverage for external URLs path looks good.

The test properly verifies the Pipeline structure when using external URLs without hosted servers, including checking that the client command uses EvaluatorClientScript.

149-204: Main server hosted test correctly validates ServerScript properties.

The test properly verifies:

ServerScript instance type

num_gpus, log_prefix, and port attributes

Client uses EvaluatorClientScript with callable inline for cross-component refs

tests/test_declarative_pipeline.py (3)

30-46: DummyScript implementation is appropriate for testing.

The DummyScript class properly mirrors the BaseJobScript interface with:

set_inline method for command updates

hostname_ref method for cross-component communication

het_group_index attribute for heterogeneous job support

This provides good test isolation without requiring the full script infrastructure.

48-51: Helper function make_command improves test readability.

Good addition that reduces boilerplate and ensures consistent Command construction with DummyScript instances across tests.

891-922: Environment variable capture test validates sandbox/client integration.

The test properly verifies that:

Sandbox receives LISTEN_PORT and NGINX_PORT

Client receives NEMO_SKILLS_SANDBOX_PORT

Ports match between sandbox and client

This ensures the cross-component environment propagation works correctly with the new Script-based architecture.

nemo_skills/pipeline/utils/generation.py (2)

30-59: normalize_models_config correctly handles model input normalization.

The function properly:

Raises ValueError for None or empty inputs

Converts scalar strings to single-element lists

Passes through existing lists

62-102: normalize_parameter broadcast logic is correct.

The function implements standard broadcast semantics:

Scalar → broadcast to all models

Single-element list → broadcast to all models

Multi-element list → must match num_models exactly

The error message clearly explains the expected values.

nemo_skills/pipeline/nemo_evaluator.py (4)

292-293: Good: Dict conversion for serialization compatibility.

Converting launcher_run_cfg and task_cfg to plain dicts via OmegaConf.to_container() ensures compatibility with nemo-run/fiddle serialization, which cannot handle OmegaConf objects directly.

471-495: ServerScript integration in _create_serving_command_obj looks correct.

The function properly:

Creates ServerScript with all required parameters

Sets log_prefix to "judge-server" for judge servers

Returns a Command wrapping the script

731-781: EvaluatorClientScript correctly implements lazy command building.

The build_command closure:

Resolves server URLs using hostname_ref() (returns shell variable for runtime resolution)

Adds wait commands for server health endpoints

Builds the task command with URL overrides

Returns (command, {"environment": env_vars}) tuple for prepare_for_execution

This pattern enables cross-component communication in heterogeneous jobs where hostnames aren't known until runtime.

692-694: Dict-to-DictConfig conversion in _build_task_cmd is correct.

Since launcher_run_cfg and task_cfg are stored as plain dicts for serialization, converting back to DictConfig here enables OmegaConf operations (OmegaConf.update) as needed.

nemo_skills/pipeline/generate.py (4)

136-183: Client and sandbox correctly placed in group 0 only.

The logic ensures:

Only model_idx == 0 group contains the client and optional sandbox

Client's GenerationClientScript receives references to all server scripts via servers=server_scripts

This enables cross-component hostname resolution for multi-model generation

419-424: Good validation for multi-model requirements.

Requiring generation_type or generation_module for multi-model generation ensures the correct inference module is used to handle multiple model inputs.

611-619: Correct job spec structure for single vs multi-group.

The logic properly uses:

"groups" key for multi-group jobs (heterogeneous execution)

"group" key for single-group jobs (standard execution)

This aligns with the Pipeline API that expects either key based on job type.

368-383: Updated docstring correctly documents multi-model behavior.

The docstring clearly explains:

Parameter type hints use List[T] for Typer CLI compatibility

Both scalars and lists work when calling from Python

Single values broadcast to all models

This helps users understand the flexible input handling.

nemo_skills/pipeline/utils/declarative.py (9)

15-41: LGTM!

Imports are well-organized and appropriate for the Script-based declarative pipeline system.

43-205: LGTM!

Excellent module documentation with comprehensive examples demonstrating the Script-based API, including single jobs, heterogeneous SLURM jobs, and custom Script classes.

262-286: LGTM!

Clean dataclass definitions for HardwareConfig and CommandGroup with sensible defaults.

289-493: LGTM!

The Pipeline class is well-structured with:

Early validation in _validate() for fail-fast behavior

Comprehensive dependency resolution separating internal and external dependencies

Proper code reuse caching for experiment efficiency

568-576: LGTM!

Good addition of span_group_nodes support. This correctly allows server scripts to span all nodes while keeping client/sandbox on the master node in multi-node setups with --overlap.

627-632: LGTM!

Good design decision to pre-assign het_group_index values before evaluating any commands. This ensures cross-component references (e.g., hostname_ref) see the correct indices regardless of processing order.

633-666: LGTM!

The two-pass pattern (prepare first, then build executors) is a clean approach that ensures all scripts have resolved cross-references before executor instantiation.

712-721: LGTM!

Packager sharing logic is correct—for single-group jobs, the first component's packager is reused by subsequent components to avoid redundant packaging operations.

792-832: LGTM!

Clean separation with _add_single_group_job and _add_multi_group_job delegating to the unified _plan_and_add_job method with appropriate heterogeneous flag.

coderabbitai · 2025-12-18T22:33:17Z

nemo_skills/pipeline/utils/generation.py

+
+    # Add multi-model configuration if provided
+    if server_addresses is not None and model_names is not None:
+        num_models = len(model_names)
+        if num_models > 1:
+            # Multi-model: pass server configuration as lists
+            model_names_arg = ",".join(model_names)
+            cmd += f"++server.model=[{model_names_arg}] "
+
+            server_types_arg = ",".join(server_types)
+            cmd += f"++server.server_type=[{server_types_arg}] "
+
+            server_addresses_arg = ",".join(server_addresses)
+            cmd += f"++server.base_url=[{server_addresses_arg}] "
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cd /tmp && git clone https://github.com/NVIDIA/NeMo-Skills.git 2>/dev/null || cd /tmp/NeMo-Skills # Find the file in question find . -name "generation.py" -path "*/pipeline/utils/*" 2>/dev/null | head -1

Repository: NVIDIA-NeMo/Skills

Length of output: 116

🏁 Script executed:

# Read the function to understand the context file_path=$(find . -name "generation.py" -path "*/pipeline/utils/*" 2>/dev/null | head -1) if [ -n "$file_path" ]; then # Get the function signature and context around lines 419-433 sed -n '380,440p' "$file_path" fi

Repository: NVIDIA-NeMo/Skills

Length of output: 2863

🏁 Script executed:

# Also search for where server_types is defined in this function file_path=$(find . -name "generation.py" -path "*/pipeline/utils/*" 2>/dev/null | head -1) if [ -n "$file_path" ]; then # Look for the function definition starting from line 1 head -420 "$file_path" | tail -100 fi

Repository: NVIDIA-NeMo/Skills

Length of output: 3771

🏁 Script executed:

# Search for function signature to see parameters file_path=$(find . -name "generation.py" -path "*/pipeline/utils/*" 2>/dev/null | head -1) if [ -n "$file_path" ]; then # Find get_generation_cmd function definition rg -n "def get_generation_cmd" "$file_path" -A 30 fi

Repository: NVIDIA-NeMo/Skills

Length of output: 1369

🏁 Script executed:

# Search for all calls to get_generation_cmd with server_addresses parameter file_path=$(find . -name "generation.py" -path "*/pipeline/utils/*" 2>/dev/null | head -1) base_dir=$(dirname "$file_path") rg -r -n "get_generation_cmd" --type=py "$base_dir/.." | head -20

Repository: NVIDIA-NeMo/Skills

Length of output: 575

🏁 Script executed:

# Search for calls passing server_addresses to understand usage pattern rg -n "server_addresses\s*=" --type=py -B 2 -A 2 | head -50

Repository: NVIDIA-NeMo/Skills

Length of output: 1115

🏁 Script executed:

# Look for any calls with server_addresses argument specifically rg -n "server_addresses=" --type=py -B 5 -A 5

Repository: NVIDIA-NeMo/Skills

Length of output: 1011

🏁 Script executed:

# Get more context from scripts.py around the call rg -n "server_addresses = None" --type=py -A 50 nemo_skills/pipeline/utils/scripts.py | head -80

Repository: NVIDIA-NeMo/Skills

Length of output: 1953

🏁 Script executed:

# Check what self.model_names and self.server_types are rg -n "self.model_names|self.server_types" --type=py nemo_skills/pipeline/utils/scripts.py -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 310

🏁 Script executed:

# Look at the class definition to understand when these are set rg -n "class.*Script|model_names|server_types" --type=py nemo_skills/pipeline/utils/scripts.py | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 859

🏁 Script executed:

# Check if there's any validation that ensures server_types is set when model_names is set rg -n "server_types" --type=py nemo_skills/pipeline/utils/scripts.py -B 3 -A 3 | head -60

Repository: NVIDIA-NeMo/Skills

Length of output: 1846

🏁 Script executed:

# Look at the full GenerationClientScript class definition sed -n '297,380p' nemo_skills/pipeline/utils/scripts.py

Repository: NVIDIA-NeMo/Skills

Length of output: 3856

🏁 Script executed:

# Check if there's any validation elsewhere that enforces these parameters together rg -n "__post_init__|validate|check" --type=py nemo_skills/pipeline/utils/scripts.py -A 5 | grep -A 5 "GenerationClientScript" | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 285

🏁 Script executed:

# Also check generation.py for any validation after the guard check sed -n '419,450p' nemo_skills/pipeline/utils/generation.py

Repository: NVIDIA-NeMo/Skills

Length of output: 1304

Add server_types to the guard condition to prevent TypeError.

Line 419 checks server_addresses is not None and model_names is not None, but line 428 unconditionally calls ",".join(server_types). If server_types is None (which is allowed by the function signature), this will raise TypeError: 'NoneType' object is not iterable.

# Add multi-model configuration if provided - if server_addresses is not None and model_names is not None: + if server_addresses is not None and model_names is not None and server_types is not None: num_models = len(model_names) if num_models > 1: # Multi-model: pass server configuration as lists

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Add multi-model configuration if provided

if server_addresses is not None and model_names is not None:

num_models = len(model_names)

if num_models > 1:

# Multi-model: pass server configuration as lists

model_names_arg = ",".join(model_names)

cmd += f"++server.model=[{model_names_arg}] "

server_types_arg = ",".join(server_types)

cmd += f"++server.server_type=[{server_types_arg}] "

server_addresses_arg = ",".join(server_addresses)

cmd += f"++server.base_url=[{server_addresses_arg}] "

# Add multi-model configuration if provided

if server_addresses is not None and model_names is not None and server_types is not None:

num_models = len(model_names)

if num_models > 1:

# Multi-model: pass server configuration as lists

model_names_arg = ",".join(model_names)

cmd += f"++server.model=[{model_names_arg}] "

server_types_arg = ",".join(server_types)

cmd += f"++server.server_type=[{server_types_arg}] "

server_addresses_arg = ",".join(server_addresses)

cmd += f"++server.base_url=[{server_addresses_arg}] "

🤖 Prompt for AI Agents

In nemo_skills/pipeline/utils/generation.py around lines 419 to 433, the multi-model branch joins server_types without checking it, which can raise TypeError if server_types is None; update the guard to require server_types is not None (e.g., if server_addresses is not None and model_names is not None and server_types is not None) and then validate that len(server_types) == len(model_names) (or coerce/supply defaults) before building server_types_arg to prevent mismatched lists and runtime errors.

coderabbitai · 2025-12-18T22:33:17Z

nemo_skills/pipeline/utils/scripts.py

+            if self.servers is not None:
+                server_addresses = []
+                for server_idx, server_script in enumerate(self.servers):
+                    if server_script is not None:
+                        # Self-hosted: construct address from hostname and port refs
+                        addr = f"{server_script.hostname_ref()}:{server_script.port}"
+                    else:
+                        # Pre-hosted: use the address from server_addresses_prehosted
+                        addr = self.server_addresses_prehosted[server_idx]
+                    server_addresses.append(addr)


⚠️ Potential issue | 🟡 Minor

Potential IndexError/TypeError if server_addresses_prehosted is misaligned.

When servers[server_idx] is None (pre-hosted), line 400 accesses self.server_addresses_prehosted[server_idx]. If server_addresses_prehosted is None or shorter than servers, this raises an error at runtime.

🔎 Proposed defensive check

if self.servers is not None: server_addresses = [] + prehosted = self.server_addresses_prehosted or [] for server_idx, server_script in enumerate(self.servers): if server_script is not None: # Self-hosted: construct address from hostname and port refs addr = f"{server_script.hostname_ref()}:{server_script.port}" else: # Pre-hosted: use the address from server_addresses_prehosted - addr = self.server_addresses_prehosted[server_idx] + if server_idx >= len(prehosted) or not prehosted[server_idx]: + raise ValueError( + f"Server {server_idx} is pre-hosted but no address provided in server_addresses_prehosted" + ) + addr = prehosted[server_idx] server_addresses.append(addr)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.servers is not None:

server_addresses = []

for server_idx, server_script in enumerate(self.servers):

if server_script is not None:

# Self-hosted: construct address from hostname and port refs

addr = f"{server_script.hostname_ref()}:{server_script.port}"

else:

# Pre-hosted: use the address from server_addresses_prehosted

addr = self.server_addresses_prehosted[server_idx]

server_addresses.append(addr)

if self.servers is not None:

server_addresses = []

prehosted = self.server_addresses_prehosted or []

for server_idx, server_script in enumerate(self.servers):

if server_script is not None:

# Self-hosted: construct address from hostname and port refs

addr = f"{server_script.hostname_ref()}:{server_script.port}"

else:

# Pre-hosted: use the address from server_addresses_prehosted

if server_idx >= len(prehosted) or not prehosted[server_idx]:

raise ValueError(

f"Server {server_idx} is pre-hosted but no address provided in server_addresses_prehosted"

)

addr = prehosted[server_idx]

server_addresses.append(addr)

🤖 Prompt for AI Agents

In nemo_skills/pipeline/utils/scripts.py around lines 392 to 401, the loop assumes self.server_addresses_prehosted exists and has an entry for every index where self.servers[i] is None; add a defensive check when resolving pre-hosted addresses to ensure self.server_addresses_prehosted is not None and len(self.server_addresses_prehosted) > server_idx (or use a safe lookup), and if the check fails raise a clear ValueError (or skip/continue based on desired behavior) with a descriptive message; alternatively validate and normalize self.server_addresses_prehosted length earlier (before the loop) so the loop can safely index it.

greptile-apps · 2026-01-06T17:15:16Z

Greptile Summary

This PR successfully refactors the generation pipeline to use run.Script dataclasses, fixing multi-node issues from #1052. The refactoring introduces three key Script classes (ServerScript, SandboxScript, GenerationClientScript) that encapsulate command building logic with explicit fields and cross-component references.

Key improvements:

Lazy command evaluation ensures het_group_index is assigned before hostname resolution
span_group_nodes flag properly controls node allocation (servers span all nodes, clients/sandbox run on master only)
Multi-model support with heterogeneous SLURM jobs works correctly
Cross-component references (server hostnames, ports) resolve at runtime via environment variables
Resource requirements (num_gpus, num_nodes, num_tasks) centralized in HardwareConfig and Script objects

Changes validated:

Comprehensive test coverage updated for Script-based architecture
GPU test reliability improved with heartbeat monitoring
OmegaConf serialization issue fixed in nemo_evaluator (converts to plain dicts)

The refactoring maintains backward compatibility while making the pipeline more maintainable and extensible.

Confidence Score: 5/5

Safe to merge - well-tested refactoring with comprehensive test coverage and improved architecture
This is a large but well-structured refactoring that addresses the multi-node issues from Use run.Script for generate pipeline #1052. The code introduces clear abstractions (Script dataclasses), maintains test coverage, and includes proper safeguards for multi-node execution. The lazy command evaluation pattern ensures correct hostname resolution, and the span_group_nodes flag prevents incorrect node allocation. All tests have been updated to reflect the new architecture.
No files require special attention

Important Files Changed

Filename	Overview
nemo_skills/pipeline/utils/scripts.py	New file introducing Script dataclasses (ServerScript, SandboxScript, GenerationClientScript) with lazy command building, cross-component references, and multi-node support via span_group_nodes
nemo_skills/pipeline/utils/declarative.py	Refactored to use Script objects instead of raw commands, with het_group_index assignment before command evaluation and span_group_nodes support for multi-node servers
nemo_skills/pipeline/generate.py	Refactored _create_job_unified to use Script objects, creates CommandGroups with cross-component references for multi-model generation
nemo_skills/pipeline/utils/generation.py	Added multi-model support with normalize_models_config and normalize_parameter functions, updated get_generation_cmd for multi-model server lists
nemo_skills/pipeline/nemo_evaluator.py	Refactored to use ServerScript and EvaluatorClientScript, converts OmegaConf to plain dicts for nemo_run serialization compatibility
tests/test_declarative_pipeline.py	Updated tests to use Script-based Command interface, added tests for het_group_index assignment and hostname_ref resolution

Sequence Diagram

sequenceDiagram
    participant User
    participant Pipeline
    participant Generate as _create_job_unified
    participant ServerScript
    participant SandboxScript
    participant ClientScript
    participant CommandGroup
    participant Executor

    User->>Pipeline: generate(model=[m1, m2], ...)
    Pipeline->>Generate: _create_job_unified(models, server_configs, ...)
    
    loop For each model
        Generate->>ServerScript: new ServerScript(server_type, model_path, num_gpus, num_nodes, ...)
        ServerScript->>ServerScript: Allocate port if needed
        ServerScript->>ServerScript: Build server command
        Note over ServerScript: span_group_nodes=True<br/>(server uses all nodes)
    end
    
    alt With Sandbox
        Generate->>SandboxScript: new SandboxScript(cluster_config, keep_mounts, ...)
        SandboxScript->>SandboxScript: Allocate port
        SandboxScript->>SandboxScript: Build sandbox command with env vars
        Note over SandboxScript: span_group_nodes=False<br/>(runs on master node only)
    end
    
    Generate->>ClientScript: new ClientScript(output_dir, servers=[server1, server2, ...], sandbox, ...)
    Note over ClientScript: Cross-component references:<br/>servers list, sandbox ref
    ClientScript->>ClientScript: Set inline to lazy builder (callable)
    Note over ClientScript: span_group_nodes=False<br/>(runs on master node only)
    
    loop For each model group
        Generate->>CommandGroup: Create CommandGroup(commands, hardware)
        Note over CommandGroup: Group 0: server1 + sandbox + client<br/>Group 1: server2 (if multi-model)
    end
    
    Generate-->>Pipeline: Return list of CommandGroups
    
    Pipeline->>Pipeline: Assign het_group_index to all scripts
    Note over Pipeline: BEFORE evaluating any commands
    
    Pipeline->>ClientScript: Evaluate inline callable
    ClientScript->>ServerScript: hostname_ref() → ${SLURM_MASTER_NODE_HET_GROUP_N}
    ClientScript->>ServerScript: Get port
    ClientScript->>SandboxScript: Get port
    ClientScript->>ClientScript: Build generation command with server addresses
    ClientScript-->>Pipeline: Command string + environment vars
    
    Pipeline->>Executor: Create executors with hardware config
    Note over Executor: num_nodes from span_group_nodes:<br/>ServerScript: group.num_nodes<br/>ClientScript/Sandbox: 1
    
    Pipeline->>Executor: exp.add(scripts, executors, dependencies)
    Executor->>Executor: Submit heterogeneous SLURM job
    Note over Executor: Environment exports:<br/>SLURM_MASTER_NODE_HET_GROUP_0,<br/>SLURM_MASTER_NODE_HET_GROUP_1, ...

greptile-apps · 2026-01-06T17:15:17Z

Greptile found no issues!

From now on, if a review finishes and we haven't found any issues, we will not post anything, but you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

gwarmstrong added 2 commits December 18, 2025 14:21

Merge branch 'main' into georgea/fix-run-script-refactor

4459850

gwarmstrong added the run GPU tests label Dec 18, 2025

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

Merge branch 'main' into georgea/fix-run-script-refactor

00b1c4d

gwarmstrong added run GPU tests and removed run GPU tests labels Jan 6, 2026

Merge branch 'main' into georgea/fix-run-script-refactor

ab7560c

gwarmstrong added run GPU tests and removed run GPU tests labels Jan 6, 2026

gwarmstrong merged commit 0f94f5e into main Jan 6, 2026
7 checks passed

gwarmstrong deleted the georgea/fix-run-script-refactor branch January 6, 2026 22:23

blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026

Fix run.Script refactor (NVIDIA-NeMo#1133)

5df7599

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

Fix run.Script refactor (#1133)

ba27f63

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

This was referenced Feb 13, 2026

Add nemo-skills-core subpackage for lightweight installs #1229

Merged

feat: add NeMo Gym rollouts pipeline #1227

Open

Add multi-instance pipeline support (gpus_per_node) #1247

Open

This was referenced Feb 23, 2026

Add Kubernetes multi-node SFT backend validation assets and docs #1271

Open

Add LibTrace recipe for generating domain-specific reasoning data #1224

Merged

This was referenced Mar 5, 2026

Eval kit support #1239

Merged

Revert "Eval kit support (#1239)" #1294

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Fix run.Script refactor (#1133)

4ad13e6

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Fix run.Script refactor (#1133)

e6831ed

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix run.Script refactor#1133

Fix run.Script refactor#1133
gwarmstrong merged 4 commits intomainfrom
georgea/fix-run-script-refactor

gwarmstrong commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 18, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

greptile-apps bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gwarmstrong commented Dec 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot commented Jan 6, 2026

Greptile found no issues!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

gwarmstrong commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 18, 2025 •

edited

Loading

greptile-apps bot commented Jan 6, 2026 •

edited

Loading