Use run.Script for generate pipeline by gwarmstrong · Pull Request #1052 · NVIDIA-NeMo/Skills

gwarmstrong · 2025-11-21T20:38:29Z

Refactor Generate Declarative Pipeline to Scripts

Summary

Introduce reusable script dataclasses (ServerScript, SandboxScript, GenerationClientScript) with explicit fields, lazy command generation, and cross-component references.
Add multi-model support so a single pipeline can launch heterogeneous server/client groups for multiple models.
Centralize resource selection (num_gpus, num_nodes, num_tasks) inside HardwareConfig and script objects; remove redundant command-level overrides.
Preserve existing generation functionality while making the pipeline easier to extend and reason about.

Key Changes

Declarative Core (`nemo_skills/pipeline/utils/declarative.py`)

Command now wraps a run.Script instance; redundant gpus, nodes, and installation_command fields were removed.
num_tasks, num_gpus, and num_nodes come from HardwareConfig when creating executors.

Generation Pipeline (`nemo_skills/pipeline/generate.py`)

_create_job_unified() now instantiates script objects, wires their cross-references (client ↔ server/sandbox), and adds them to CommandGroups.
Hardware provisioning is derived from the group-level values without redundant max() scans of individual commands.

Summary by CodeRabbit

Release Notes

New Features
- Multi-model generation support with per-model server configurations and settings.
- New normalization utilities for configuration parameter handling across multiple models.
API Changes
- Generation function parameters (model, server_address, server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container) now support list inputs for per-model configuration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: George Armstrong <georgea@nvidia.com>

…actor-generate-run-script

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai · 2025-12-04T22:16:36Z

📝 Walkthrough

Walkthrough

This PR introduces multi-model generation support and refactors command execution from string-based to typed Script objects. The generate() function and related utilities now handle per-model configurations (servers, GPUs, nodes, args), while a new unified _create_job_unified() flow constructs CommandGroups. Script classes replace inline command strings, enabling runtime cross-component reference resolution.

Changes

Cohort / File(s)	Change Summary
Multi-model generation refactoring `nemo_skills/pipeline/generate.py`	Replaced `_create_commandgroup_from_config()` with `_create_job_unified()` supporting multi-model flows; converted `model`, `server_address`, `server_type`, `server_gpus`, `server_nodes`, `server_args`, `server_entrypoint`, `server_container` from scalar to list types; added parameter normalization and multi-model validation.
Normalization utilities `nemo_skills/pipeline/utils/generation.py`	Added `normalize_models_config()` and `normalize_parameter()` helper functions; extended `get_generation_cmd()` with optional `server_addresses`, `model_names`, `server_types` parameters for multi-model command generation.
Script-based command abstraction `nemo_skills/pipeline/utils/scripts.py`	New module introducing `BaseJobScript`, `ServerScript`, `SandboxScript`, `GenerationClientScript` dataclasses; encapsulates command construction, port allocation, hostname resolution, and cross-component references.
Script-based declarative pipeline `nemo_skills/pipeline/utils/declarative.py`	Replaced string-based command execution with Script-based approach; `Command` now stores `run.Script` instead of string; `prepare_for_execution()` returns `(Script, Dict)` instead of `(str, Dict)`; added `HardwareConfig.num_tasks` field and script-level het group indexing.
Evaluator client script abstraction `nemo_skills/pipeline/nemo_evaluator.py`	Introduced `EvaluatorClientScript` dataclass for runtime server resolution; replaced inline command strings in `_create_serving_command_obj()` and `_build_client_command()` with script-based construction.
Public API exports `nemo_skills/pipeline/utils/__init__.py`	Re-exported `normalize_models_config` and `normalize_parameter` from `generation` module.
Test updates `tests/test_declarative_pipeline.py`, `tests/test_generation.py`, `tests/test_nemo_evaluator_pipeline.py`	Refactored tests to use `_create_job_unified()` instead of deprecated function; updated assertions to validate `ServerScript` and `EvaluatorClientScript` instances; replaced direct command string expectations with script-based checks; added `DummyScript` test helper.

Sequence Diagram

sequenceDiagram
    actor User
    participant Gen as generate()
    participant Unify as _create_job_unified()
    participant Scripts as Script Factory
    participant Exec as Executor
    
    User->>Gen: Call with models[], server_types[], GPUs[]
    Gen->>Gen: Normalize per-model parameters
    Gen->>Unify: Pass normalized models + configs
    
    Unify->>Scripts: Create ServerScript per model
    activate Scripts
        Scripts->>Scripts: Allocate ports
        Scripts->>Scripts: Build server commands
    deactivate Scripts
    
    Unify->>Scripts: Create SandboxScript (optional)
    activate Scripts
        Scripts->>Scripts: Allocate sandbox port
    deactivate Scripts
    
    Unify->>Scripts: Create GenerationClientScript
    activate Scripts
        Scripts->>Scripts: Wire server references
        Scripts->>Scripts: Build lazy client command
    deactivate Scripts
    
    Unify->>Unify: Aggregate into CommandGroups[]
    Unify-->>Gen: Return groups list
    
    Gen->>Exec: Execute CommandGroups
    activate Exec
        Exec->>Scripts: Resolve script.inline (runtime)
        Scripts-->>Scripts: Substitute hostnames for het jobs
        Scripts-->>Exec: Final command
    deactivate Exec

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Script class hierarchy and initialization logic: Multiple new dataclasses with complex post-init flows (BaseJobScript, ServerScript, SandboxScript, GenerationClientScript); careful review of port allocation, hostname resolution, and lazy command building needed
Multi-model parameter normalization and validation: New branching logic in generate() to broadcast/validate per-model configurations; ensure length matching and error messages are correct
Command execution abstraction change: Fundamental shift from string commands to Script objects throughout declarative.py; verify all command construction paths properly handle the new Script interface
Cross-component reference resolution: Script-based hostname/port references for het jobs; validate that runtime substitution works correctly for multi-model multi-server scenarios
Test compatibility and coverage: Substantial test updates across three test files; ensure all critical paths are properly validated with the new script-based approach

Possibly related PRs

Skills#888: Adds _create_commandgroup_from_config() which this PR replaces with _create_job_unified() — direct predecessor that this PR refactors and extends for multi-model support.
Skills#790: Modifies container-default logic in generate.py — overlapping changes to generation task construction that may require merge conflict resolution or coordination.
Skills#848: Propagates keep_mounts_for_sandbox parameter in generate() — parallel parameter handling that this PR integrates into the unified multi-model flow.

Suggested reviewers

activatedgeek
Kipok

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 73.24% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Use run.Script for generate pipeline' accurately reflects the main refactoring goal: migrating the generate/declarative pipeline to use run.Script-based command objects instead of string-based commands.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch georgea/refactor-generate-run-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

tests/test_generation.py (1)

22-25: New test_server_metadata_from_num_tasks correctly validates Script-based server wiring

The test’s expectations on _create_job_unified (first command being a ServerScript, num_tasks >= 1, and group hardware mirroring server_config["num_gpus"]) match the new Script-based orchestration and will catch regressions in server-side task accounting. Use of fixed /tmp/out//tmp/logs is acceptable here since the test doesn’t actually execute jobs, but could be switched to tmp_path-derived locations if you ever see interference on shared CI hosts.

Also applies to: 156-194
nemo_skills/pipeline/utils/generation.py (1)
360-435: Multi-model argument handling in get_generation_cmd could validate list lengths

The multi-model branch assumes server_types and server_addresses are present and aligned with model_names:
if server_addresses is not None and model_names is not None:
    num_models = len(model_names)
    if num_models > 1:
        model_names_arg = ",".join(model_names)
        server_types_arg = ",".join(server_types)
        server_addresses_arg = ",".join(server_addresses)
Given callers already normalize lists, this is likely fine, but a defensive assertion (e.g., checking len(server_types) == len(model_names) == len(server_addresses)) would make this helper safer when used outside the current generate path.
nemo_skills/pipeline/generate.py (3)

232-271: Typer parameterization for multi-model arguments is reasonable, but docstring slightly overpromises Python types

The CLI-facing List[...] typer.Option parameters for model, server_address, server_type, etc., align with Typer’s multi-value pattern. Internally you normalize via normalize_models_config/normalize_parameter, which supports both scalars and lists.

Just note that the function annotations still show List[...], while the docstring promises Python callers can pass scalars. That’s true at runtime, but slightly mismatched type-wise; if you add type checking later you may want model: Any plus explicit validation, or overloads.

551-586: generation_params dict is a good abstraction point, but contains some implicit invariants

Packing all generation-related knobs plus multi-model fields into generation_params and passing it to _create_job_unified keeps the job builder decoupled from the CLI layer, which is nice.

Two small considerations:

generation_params["output_dir"] and ["script"] are required for GenerationClientScript but not enforced here; if _create_job_unified is ever reused elsewhere, a light assertion docstring or validation would help avoid KeyErrors.

The comment about multi-model groups (“one per model + one for client”) is slightly misleading: currently _create_job_unified returns at most one group per model, with the client in group 0; there is no dedicated “client only” group.

50-201: The main servers-list issue has been fixed; verify test coverage for mixed hosting remains the secondary concern

Good news: The handling of None entries in the servers list has been corrected. GenerationClientScript.__post_init__() now properly iterates through servers with enumerate, using the same index to access both self.servers and self.server_addresses_prehosted. This preserves the intended server-to-model alignment in mixed hosting scenarios (e.g., [host_server_1, None, host_server_3] with corresponding pre-hosted addresses).

Remaining observations:

Test coverage gap
tests/test_generation.py only tests single-model scenarios. Multi-model runs with mixed hosting (some self-hosted, some pre-hosted) lack explicit test cases. Consider adding a test for this scenario to prevent regressions.

Minor improvements

zip(models, server_configs) (line 94) could add strict=True for Python 3.10+ to catch length mismatches early

num_tasks fallback to 1 for client-only groups is reasonable but document if GenerationClientScript may later override this

nemo_skills/pipeline/utils/declarative.py (3)

210-257: Command now correctly encapsulates lazy Script evaluation, with one minor unused-arg nit

Command.prepare_for_execution:

Evaluates script.inline when it is callable, allowing (cmd, metadata) tuples for env injection.

Uses set_inline to update the Script, which matches the new BaseJobScript pattern.

Builds a minimal execution_config (log_prefix, environment, mounts placeholder, container) and returns (script, execution_config).

Two minor notes:

The cluster_config parameter is currently unused in this method; if you don’t plan to use it, you can drop it or add a brief comment indicating it’s reserved for future use (to quiet linters).

This method assumes every script has a set_inline method; that’s true for the new Script types and DummyScript, but third-party run.Script subclasses must expose it as well. If your public API allows arbitrary run.Script instances, consider a hasattr(self.script, "set_inline") guard with a fallback to plain script.inline = ....

495-535: _prepare_command and _rewrite_local_paths correctly adapt Scripts for local execution

Key aspects:

_prepare_command uses Command.prepare_for_execution, and for executor in ("none", "local") rewrites /nemo_run/code/... paths to local repo paths when get_registered_external_repo("nemo_skills") is configured.

_rewrite_local_paths supports both string and callable inline values, transparently rewriting commands while preserving optional (cmd, metadata) tuple semantics.

This is a sensible way to keep local runs working with packaged-code assumptions. Just be aware that _rewrite_local_paths implicitly assumes the Nemo repo is registered under the "nemo_skills" key; if that mapping isn’t present, it silently no-ops, which seems fine but might deserve a one-line doc comment.

586-778: _plan_and_add_job is the core of the refactor and looks mostly solid

Strengths:

Assigns script.het_group_index before any evaluation so hostname-based cross-references see the right indices.

Prepares all commands first (collecting script + exec_config) and only then constructs executors, letting you:

Share environment vars across heterogeneous groups (shared_env_vars)

Share packager across components for single-group jobs

Always uses group.name for the underlying SLURM job name, while per-component command.name feeds log prefixes, simplifying mental mapping.

Keeps code-reuse logic (via REUSE_CODE_EXP and get_packaging_job_key) constrained to non-heterogeneous, non-none executors.

Two behavioural subtleties to watch:

het_group_indices content
You append het_idx per executor, so executors[0].het_group_indices will contain one entry per script, not per group. If nemo_run’s heterogeneous job implementation expects unique group indices only once each, this might differ from previous behaviour. It’s likely benign, but worth verifying against nemo_run expectations.

Environment merging for heterogeneous jobs
For heterogeneous jobs, you merge shared_env_vars back into each exec_config["environment"]. That’s good for consistency but means later commands can overwrite earlier env keys. If multiple groups intentionally set conflicting env vars, the last-wins semantics could be surprising; if that’s not a concern in current use cases, you can leave as-is.

Overall, this function is well-structured and matches the new Script abstraction.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9115aef and 8162f17.

📒 Files selected for processing (9)

nemo_skills/pipeline/generate.py (9 hunks)
nemo_skills/pipeline/nemo_evaluator.py (6 hunks)
nemo_skills/pipeline/utils/__init__.py (1 hunks)
nemo_skills/pipeline/utils/declarative.py (11 hunks)
nemo_skills/pipeline/utils/generation.py (5 hunks)
nemo_skills/pipeline/utils/scripts.py (1 hunks)
tests/test_declarative_pipeline.py (25 hunks)
tests/test_generation.py (2 hunks)
tests/test_nemo_evaluator_pipeline.py (5 hunks)

🧰 Additional context used

🧬 Code graph analysis (6)

nemo_skills/pipeline/utils/__init__.py (1)

nemo_skills/pipeline/utils/generation.py (2)

normalize_models_config (30-59)

normalize_parameter (62-102)

tests/test_generation.py (2)

nemo_skills/pipeline/generate.py (2)

generate (206-637)

_create_job_unified (50-201)

nemo_skills/pipeline/utils/scripts.py (1)

ServerScript (120-214)

tests/test_nemo_evaluator_pipeline.py (3)

nemo_skills/pipeline/nemo_evaluator.py (2)

nemo_evaluator (113-421)

EvaluatorClientScript (726-775)

nemo_skills/pipeline/utils/declarative.py (2)

Command (211-259)

CommandGroup (273-286)

nemo_skills/pipeline/utils/scripts.py (1)

ServerScript (120-214)

nemo_skills/pipeline/utils/scripts.py (4)

nemo_skills/pipeline/utils/commands.py (1)

sandbox_command (77-111)

nemo_skills/pipeline/utils/exp.py (1)

install_packages_wrap (368-408)

nemo_skills/pipeline/utils/generation.py (1)

get_generation_cmd (360-495)

nemo_skills/pipeline/utils/server.py (2)

get_free_port (43-59)

get_server_command (114-227)

tests/test_declarative_pipeline.py (1)

nemo_skills/pipeline/utils/declarative.py (6)

Command (211-259)

prepare_for_execution (223-256)

get_name (258-259)

CommandGroup (273-286)

Pipeline (289-820)

run (356-493)

nemo_skills/pipeline/utils/declarative.py (6)

nemo_skills/pipeline/utils/cluster.py (1)

get_env_variables (163-276)

nemo_skills/pipeline/utils/packager.py (1)

get_registered_external_repo (64-76)

nemo_skills/pipeline/utils/server.py (1)

wrap_python_path (66-67)

nemo_skills/utils.py (1)

get_logger_name (39-43)

tests/test_declarative_pipeline.py (1)

set_inline (39-40)

nemo_skills/pipeline/utils/scripts.py (2)

set_inline (98-100)

wrapped_inline (87-92)

🪛 Ruff (0.14.7)

tests/test_generation.py

174-174: Probable insecure usage of temporary file or directory: "/tmp/out"

(S108)

186-186: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

nemo_skills/pipeline/generate.py

94-94: Loop control variable model_path not used within loop body

Rename unused model_path to _model_path

(B007)

94-94: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

232-236: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

237-241: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

242-246: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

247-251: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

252-256: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

257-261: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

262-266: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

415-417: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/pipeline/utils/generation.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)

58-58: Avoid specifying long messages outside the exception class

(TRY003)

99-102: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_declarative_pipeline.py

587-587: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

590-590: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

676-676: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

nemo_skills/pipeline/utils/declarative.py

223-223: Unused method argument: cluster_config

(ARG002)

🔇 Additional comments (28)

nemo_skills/pipeline/utils/__init__.py (1)

47-55: Re-exporting normalization helpers looks consistent with API surface

Exposing normalize_models_config and normalize_parameter via pipeline.utils keeps the public API coherent with how other generation utilities are surfaced. No issues from this change alone.

nemo_skills/pipeline/utils/generation.py (1)

20-102: Normalization helpers are correct and robust for CLI + Python usage

Both normalize_models_config and normalize_parameter implement the intended broadcast semantics cleanly and should behave well for:

CLI (Typer) lists

Python scalars or lists
The error messaging on mismatched lengths is also clear. No functional issues spotted.

tests/test_nemo_evaluator_pipeline.py (2)

20-28: Evaluator pipeline tests align with Script-based design

The updated tests:

Import and assert on EvaluatorClientScript/ServerScript

Check client command naming (evaluator-test-client-0...), server num_gpus, log_prefix, and non-None ports

Verify hardware allocation is driven by hosted server GPU counts

This gives good coverage of the new Nemo-evaluator orchestration without over-specifying internal details. Looks solid.

Also applies to: 95-204

206-327: Judge-server hosted & dual-server tests correctly exercise grouping and Script wiring

The test_judge_server_hosted and test_both_servers_hosted_separate_groups cases validate:

Judge server uses ServerScript with log_prefix == "judge-server" and correct num_gpus

Client always uses EvaluatorClientScript whose inline is callable (lazy build)

Hardware per-group matches server GPU/node counts

This aligns well with the documented grouping strategy (main+client vs judge-only groups).

nemo_skills/pipeline/nemo_evaluator.py (4)

90-188: High-level nemo_evaluator flow is clear and consistent with the new Script model

The command docstring and option set cleanly describe the four hosting scenarios, and the function now:

Builds a _TaskCreationContext per task

Routes through _build_main_server_if_needed, _build_judge_server_if_needed, and _build_client_command

Groups commands based on hosting strategy

No obvious logic errors in the main orchestration; the separation into helper functions improves readability.

573-647: _build_client_command and EvaluatorClientScript implement clean runtime URL resolution

_build_client_command now creates a Command powered by EvaluatorClientScript, passing in optional main/judge ServerScript references. EvaluatorClientScript.__post_init__:

Computes main/judge URLs from hosted servers (hostname_ref() + port) or external base URLs

Adds health-check waits via get_server_wait_cmd

Delegates to _build_task_cmd for the actual Nemo-Evaluator command, injecting URL/model overrides

Returns (final_cmd, {"environment": env_vars}) for Command.prepare_for_execution

This matches the new Script-based pipeline semantics and should work for all four hosting modes.

546-570: Hardware allocation helper for evaluator groups matches hosting semantics

_hardware_for_group cleanly encapsulates SLURM-related fields, including partition, num_gpus, num_nodes, and sbatch_kwargs with QoS. Using it consistently for all evaluator groups keeps hardware decisions centralized.

430-497: _create_serving_command_obj correctly wraps servers in ServerScript and Command

The helper:

Normalizes server_type (warns on unsupported types)

Instantiates ServerScript with num_gpus, num_nodes, args, entrypoint, and port/allocate_port

Applies a judge-specific log_prefix and chooses container from cluster_config["containers"][stype] when not overridden

Returns a Command with clear role-specific names

tests/test_declarative_pipeline.py (8)

17-52: DummyScript and make_command are well-designed test scaffolding

The DummyScript stand-in (with inline, set_inline, log_prefix, het_group_index, and hostname_ref) plus make_command() cleanly simulate real run.Script instances for unit tests. This keeps tests decoupled from the actual Script implementations while matching the new Command(script=...) API.

54-107: TestCommand suite thoroughly covers new Command semantics

The tests validate:

Basic construction (name, default container, script.inline)

prepare_for_execution with inline strings and callable inlines (with and without metadata/environment)

hostname_ref behaviour for default and heterogeneous cases

get_name() access

This gives good confidence that Command.prepare_for_execution and DummyScript.hostname_ref behave as expected.

112-140: CommandGroup tests reflect the new Script-centric API correctly

The TestCommandGroup tests ensure:

Basic grouping and default HardwareConfig

Custom hardware propagation

log_dir handling

All use make_command and thus implicitly validate compatibility with the new Command signature.

145-237: Pipeline construction tests line up with validation changes and job specs

These tests cover:

Single-job and multi-job pipelines

run_after semantics as string or list

Direct cluster_config usage (no name-based resolution)

Required jobs and name fields

They match the updated Pipeline._validate logic and job schema and look correct.

242-344: Basic pipeline execution and HF_HOME validation remain correct with Script-based flow

The tests around Pipeline.run:

Confirm experiment creation and exp.add calls

Validate HF_HOME presence and mount checks occurring in __init__ rather than run()

Ensure executors are constructed for SLURM paths

The use of make_command + DummyScript keeps them aligned with the new API while retaining the prior behavioural guarantees.

374-477: Het-group index tests validate new Script-level indexing semantics

TestHetGroupIndices asserts:

Non-heterogeneous jobs leave het_group_index as None and hostname resolves to localhost

Heterogeneous jobs assign indices per group (0, 1, …), and hostname_ref() embeds the correct SLURM env vars

Indices are per-job, not global across the pipeline

These tests are an excellent fit for the new _plan_and_add_job het-indexing strategy.

479-724: Dependency resolution tests still accurately describe internal vs external handling

The updated dependency tests cover:

Explicit None dependencies

Pipeline-level run_after propagating to jobs without their own deps

Multiple internal dependencies (job objects → handles)

Separation of external experiment deps (via get_exp_handles) from internal handles

They align well with the logic in Pipeline.run and _plan_and_add_job.

829-897: Sandbox environment propagation test is a good end-to-end check

test_generate_with_sandbox_passes_env_vars_correctly asserts that:

temporary_env_update is called with updates containing NEMO_SKILLS_SANDBOX_PORT on the client side

Env-patching paths in generate() + declarative Pipeline behave as expected with Script-based commands

This is a valuable regression test for the sandbox refactor.

nemo_skills/pipeline/generate.py (4)

32-36: New Script imports are appropriate and localized

Importing GenerationClientScript, SandboxScript, and ServerScript here is consistent with this module’s responsibility for building generation jobs. No issues.

382-412: Model and server parameter normalization logic is sound

The sequence:

models_list = normalize_models_config(model)

Convert server_type enums → strings

Broadcast server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container, and server_address with normalize_parameter

Enforce that multi-model usage requires generation_type or generation_module

gives a coherent multi-model configuration story. The broadcasting semantics are clear and should behave well for both CLI and Python code.

502-541: Per-model configure_client loop correctly separates single vs multi-model server overrides

The inner loop:

Calls configure_client per model to build server_configs and resolved addresses.

Uses extra_arguments_original only for the first model, then:

For single-model: captures srv_extra_args so that server config is expressed via extra_arguments.

For multi-model: ignores srv_extra_args and leaves per-model server configuration to GenerationClientScript + get_generation_cmd.

This avoids double-injecting per-model overrides and nicely preserves the single-model semantics.

603-637: Job spec construction for single vs multi-group is consistent with Pipeline expectations

Using:

"groups": job_groups when len(job_groups) > 1

"group": job_groups[0] otherwise

ensures Pipeline.run correctly chooses between single-group and heterogeneous multi-group jobs. Naming internal_job_name based on task_name and dep_idx is also clear and keeps dependency wiring straightforward.

nemo_skills/pipeline/utils/declarative.py (5)

15-42: Module-level refactor to Script-based execution is well-scoped

Switching this module to:

Import nemo_run as run

Pull in pipeline utilities (get_env_variables, get_executor, etc.)

Use get_registered_external_repo + wrap_python_path for local execution

sets the stage for Script-centric orchestration without leaking responsibilities into callers. No issues at the import/architecture level.

262-270: HardwareConfig extension with num_tasks aligns with executor needs

Adding num_tasks to HardwareConfig and defaulting it to 1 lets the executor distinguish between node count and tasks per node. This plays nicely with ServerScript.num_tasks feeding into _create_job_unified.

356-495: Pipeline validation and high-level run loop remain sound after the Script refactor

The Pipeline.__init__ and run methods still:

Validate job specs and cluster_config upfront

Enforce HF_HOME presence/mounting for non-none executors (unless explicitly skipped)

Cleanly separate internal (handles) vs external (experiment names → SLURM job IDs) dependencies

Distinguish single-group vs multi-group jobs and defer to _add_*_job

This matches prior behaviour while delegating planning details to _plan_and_add_job.

543-585: Executor creation now uses num_tasks and supports a group-wide job name

The updated _create_executor:

Maps HardwareConfig.num_nodes/num_tasks/num_gpus to executor args.

Uses job_name_override (currently the group name) as the SLURM job name, so all components in a group share a stable name without embedding role suffixes.

Wraps environment updates via temporary_env_update, keeping env-setting logic centralized.

This is a good match for the new Script-based design.

780-820: Single-group vs multi-group helper methods correctly delegate to _plan_and_add_job

_add_single_group_job and _add_multi_group_job simply wrap _plan_and_add_job with heterogeneous=False/True, which keeps the public behaviour unchanged while centralizing the complex logic. No issues here.

nemo_skills/pipeline/utils/scripts.py (3)

62-117: LGTM - Solid base class design for heterogeneous job support.

The set_inline pattern using object.__setattr__ is an appropriate workaround for safely mutating frozen dataclass fields. The hostname_ref() method correctly handles both hetjob and non-hetjob scenarios with a sensible localhost fallback.

217-272: LGTM - Good documentation of responsibility boundaries.

The comment on line 265 clarifying that keep_mounts is handled at the executor level is helpful for maintainability. The same port validation concern from ServerScript applies here when allocate_port=False and port=None.

361-407: Well-designed lazy command building pattern.

The closure-based lazy evaluation for build_cmd is appropriate since hostname_ref() depends on het_group_index being set at runtime by the pipeline. The environment variable plumbing for sandbox port communication follows the pattern documented in sandbox_command.

One minor observation: the docstring example on lines 318-325 shows servers=[server1, server2, None] with parallel server_addresses_prehosted=["", "", "https://api.openai.com"]. The empty strings for self-hosted servers work but could be slightly misleading—consider documenting that these values are ignored when the corresponding servers entry is not None.

coderabbitai · 2025-12-04T22:16:39Z

nemo_skills/pipeline/utils/scripts.py

+        # Allocate port if not provided
+        if self.port is None and self.allocate_port:
+            self.port = get_free_port(strategy="random")
+            LOG.debug(f"Allocated port {self.port} for {self.server_type} server")


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find the ServerScript class definition and see port field constraints rg -n "class ServerScript" --type py

Repository: NVIDIA-NeMo/Skills

Length of output: 138

🏁 Script executed:

# Search for get_server_command to understand how it handles port rg -n "def get_server_command" --type py -A 15

Repository: NVIDIA-NeMo/Skills

Length of output: 2561

🏁 Script executed:

# Check the ServerScript class fields and any defaults sed -n '119,180p' nemo_skills/pipeline/utils/scripts.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 2828

🏁 Script executed:

# View the complete __post_init__ method sed -n '180,210p' nemo_skills/pipeline/utils/scripts.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 1355

🏁 Script executed:

# Check where get_server_command is called with the port parameter rg -n "get_server_command" nemo_skills/pipeline/utils/scripts.py -A 5 -B 5

Repository: NVIDIA-NeMo/Skills

Length of output: 1406

🏁 Script executed:

# Check for usages of ServerScript with allocate_port=False rg -n "ServerScript\|allocate_port" --type py -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Also check if there are any type hints or validation that we missed rg -n "server_port.*None" nemo_skills/pipeline/utils/server.py -B 5 -A 5

Repository: NVIDIA-NeMo/Skills

Length of output: 44

🏁 Script executed:

# Let's check the full signature and docstring of get_server_command to see if it handles None sed -n '114,160p' nemo_skills/pipeline/utils/server.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 2325

Add validation to ensure port is set when allocate_port=False.

If allocate_port=False and port=None, the port remains None and is passed to get_server_command, which expects an int for the server_port parameter. This causes a runtime error.

Add validation after the port allocation block:

# Allocate port if not provided if self.port is None and self.allocate_port: self.port = get_free_port(strategy="random") LOG.debug(f"Allocated port {self.port} for {self.server_type} server") + elif self.port is None: + raise ValueError("port must be provided when allocate_port=False")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Allocate port if not provided

if self.port is None and self.allocate_port:

self.port = get_free_port(strategy="random")

LOG.debug(f"Allocated port {self.port} for {self.server_type} server")

# Allocate port if not provided

if self.port is None and self.allocate_port:

self.port = get_free_port(strategy="random")

LOG.debug(f"Allocated port {self.port} for {self.server_type} server")

elif self.port is None:

raise ValueError("port must be provided when allocate_port=False")

🤖 Prompt for AI Agents

In nemo_skills/pipeline/utils/scripts.py around lines 180 to 183, after the port-allocation block add validation to ensure self.port is set when allocate_port is False: if self.port is still None raise a clear exception (e.g., ValueError) or log and exit, so get_server_command always receives an int; implement the check immediately after the allocation block and include a descriptive error message referencing server_type.

coderabbitai · 2025-12-04T22:16:39Z

nemo_skills/pipeline/utils/scripts.py

+            if self.servers is not None:
+                server_addresses = []
+                for server_idx, server_script in enumerate(self.servers):
+                    if server_script is not None:
+                        # Self-hosted: construct address from hostname and port refs
+                        addr = f"{server_script.hostname_ref()}:{server_script.port}"
+                    else:
+                        # Pre-hosted: use the address from server_addresses_prehosted
+                        addr = self.server_addresses_prehosted[server_idx]
+                    server_addresses.append(addr)


⚠️ Potential issue | 🟠 Major

Potential IndexError or TypeError when accessing server_addresses_prehosted.

If self.servers contains None entries (indicating pre-hosted servers), the code assumes self.server_addresses_prehosted is a list with matching indices. However, if server_addresses_prehosted is None or shorter than expected, this will raise an exception at runtime.

Consider adding validation:

# Build server addresses if servers are provided server_addresses = None if self.servers is not None: + # Validate parallel lists + if any(s is None for s in self.servers): + if self.server_addresses_prehosted is None: + raise ValueError( + "server_addresses_prehosted must be provided when servers contains None entries" + ) + if len(self.server_addresses_prehosted) != len(self.servers): + raise ValueError( + "server_addresses_prehosted must have same length as servers" + ) server_addresses = [] for server_idx, server_script in enumerate(self.servers):

Alternatively, move validation to __post_init__ outside the build_cmd closure to fail fast during construction.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if self.servers is not None:

server_addresses = []

for server_idx, server_script in enumerate(self.servers):

if server_script is not None:

# Self-hosted: construct address from hostname and port refs

addr = f"{server_script.hostname_ref()}:{server_script.port}"

else:

# Pre-hosted: use the address from server_addresses_prehosted

addr = self.server_addresses_prehosted[server_idx]

server_addresses.append(addr)

if self.servers is not None:

# Validate parallel lists

if any(s is None for s in self.servers):

if self.server_addresses_prehosted is None:

raise ValueError(

"server_addresses_prehosted must be provided when servers contains None entries"

)

if len(self.server_addresses_prehosted) != len(self.servers):

raise ValueError(

"server_addresses_prehosted must have same length as servers"

)

server_addresses = []

for server_idx, server_script in enumerate(self.servers):

if server_script is not None:

# Self-hosted: construct address from hostname and port refs

addr = f"{server_script.hostname_ref()}:{server_script.port}"

else:

# Pre-hosted: use the address from server_addresses_prehosted

addr = self.server_addresses_prehosted[server_idx]

server_addresses.append(addr)

This reverts commit 9c5b68c.

This reverts commit 9c5b68c. Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

This reverts commit 1c0722a. FIX multi-node pipeline creation Signed-off-by: George Armstrong <georgea@nvidia.com> remove hosntame ref change Signed-off-by: George Armstrong <georgea@nvidia.com> make param span_group_nodes Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

- Add nemo_skills/inference/hilbert.py: unified generation module that orchestrates prover (vLLM) + reasoner (Gemini) in single job using multi-model support from PR #1052 - Add hilbert_unified stage to stages.py for pipeline orchestration - Add tokens_to_generate param to hilbert_prover (default 5K for testing) - Add unified-local-test.yaml config for testing unified pipeline Pipeline flow: hilbert_d0 → split_d0 → hilbert_d1 → split_d1 → assemble 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

…DIA-NeMo#1125) Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

gwarmstrong added 15 commits November 12, 2025 16:17

WIP add run.Script interface

0a79fda

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH actually pass script to nemo-run

069e2eb

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH actually pass script to nemo-run

b5cfa55

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH actually pass script to nemo-run

fae9c52

Signed-off-by: George Armstrong <georgea@nvidia.com>

Make run.Script refactor multi model

d371be9

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT convert run.Script to expose multi model

0d1ec46

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT convert examples to use run.Script

b8238f8

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT some small consolidations for generation

5bca483

Signed-off-by: George Armstrong <georgea@nvidia.com>

resolve some sbatch differences

5691132

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT simplify hardware specificaiton

874ba1d

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT simplify task naming scheme

7534d1f

Signed-off-by: George Armstrong <georgea@nvidia.com>

ENH make method explicit

4834302

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT update group name

ca594a2

Signed-off-by: George Armstrong <georgea@nvidia.com>

MAINT simplify entrypoint

1e6da25

Signed-off-by: George Armstrong <georgea@nvidia.com>

FIX hetgroup references

91a1587

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong requested a review from activatedgeek November 21, 2025 20:38

gwarmstrong mentioned this pull request Nov 21, 2025

Support Multiple Model initialization #1018

Closed

gwarmstrong added 6 commits November 21, 2025 13:32

Merge branch 'main' of github.com:NVIDIA/NeMo-Skills into georgea/ref…

223a482

…actor-generate-run-script

FIX update nemo_evaluator with new run.Script syntax

363be21

Signed-off-by: George Armstrong <georgea@nvidia.com>

TST fix tests for updated internals

93aea1e

Signed-off-by: George Armstrong <georgea@nvidia.com>

FIX declarative pipeline tests

5624ea0

Signed-off-by: George Armstrong <georgea@nvidia.com>

FIX mounting nemo run in local execution context

a737dfc

Signed-off-by: George Armstrong <georgea@nvidia.com>

Merge branch 'main' into georgea/refactor-generate-run-script

b8de6c3

gwarmstrong added the run GPU tests label Nov 24, 2025

Merge branch 'main' into georgea/refactor-generate-run-script

8162f17

gwarmstrong added run GPU tests and removed run GPU tests labels Dec 4, 2025

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

Merge branch 'main' into georgea/refactor-generate-run-script

c79188a

Merge branch 'main' into georgea/refactor-generate-run-script

cef4d1b

gwarmstrong enabled auto-merge (squash) December 17, 2025 02:07

gwarmstrong merged commit 9c5b68c into main Dec 17, 2025
5 checks passed

gwarmstrong deleted the georgea/refactor-generate-run-script branch December 17, 2025 02:17

gwarmstrong added a commit that referenced this pull request Dec 17, 2025

Revert "Use run.Script for generate pipeline (#1052)"

8ab3681

This reverts commit 9c5b68c.

gwarmstrong added a commit that referenced this pull request Dec 17, 2025

Revert "Use run.Script for generate pipeline (#1052)"

a3f35de

This reverts commit 9c5b68c. Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot mentioned this pull request Dec 17, 2025

Revert "Use run.Script for generate pipeline (#1052)" #1125

Merged

gwarmstrong added a commit that referenced this pull request Dec 18, 2025

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

1c0722a

Signed-off-by: George Armstrong <georgea@nvidia.com>

gwarmstrong mentioned this pull request Dec 18, 2025

Fix run.Script refactor #1133

Merged

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Use run.Script for generate pipeline (#1052)

52f957d

Signed-off-by: George Armstrong <georgea@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

cf6bb37

Signed-off-by: George Armstrong <georgea@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Use run.Script for generate pipeline (#1052)

464561d

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad pushed a commit that referenced this pull request Dec 19, 2025

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

0e64314

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026

Use run.Script for generate pipeline (NVIDIA-NeMo#1052)

b2308f7

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026

Revert "Use run.Script for generate pipeline (NVIDIA-NeMo#1052)" (NVI…

cd88e99

…DIA-NeMo#1125) Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dlord <dlord@nvidia.com>

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

Use run.Script for generate pipeline (#1052)

07c23ba

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

f81e450

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

coderabbitai bot mentioned this pull request Feb 3, 2026

New slurm customization parameters (account, containers) #1209

Merged

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Use run.Script for generate pipeline (#1052)

e7582a3

Signed-off-by: George Armstrong <georgea@nvidia.com>

wasiahmad pushed a commit that referenced this pull request Feb 4, 2026

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

9a042c1

Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot mentioned this pull request Feb 13, 2026

feat: add NeMo Gym rollouts pipeline #1227

Open

coderabbitai bot mentioned this pull request Feb 23, 2026

Add Kubernetes multi-node SFT backend validation assets and docs #1271

Open

This was referenced Mar 2, 2026

Add LibTrace recipe for generating domain-specific reasoning data #1224

Merged

Eval kit support #1239

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Use run.Script for generate pipeline (#1052)

09f28f5

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

c6e9477

Signed-off-by: George Armstrong <georgea@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Use run.Script for generate pipeline (#1052)

e7dda53

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

e04f8d7

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use run.Script for generate pipeline#1052

Use run.Script for generate pipeline#1052
gwarmstrong merged 37 commits intomainfrom
georgea/refactor-generate-run-script

gwarmstrong commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 4, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 4, 2025

Uh oh!

coderabbitai bot Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gwarmstrong commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor Generate Declarative Pipeline to Scripts

Summary

Key Changes

Declarative Core (nemo_skills/pipeline/utils/declarative.py)

Generation Pipeline (nemo_skills/pipeline/generate.py)

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 4, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gwarmstrong commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

Declarative Core (`nemo_skills/pipeline/utils/declarative.py`)

Generation Pipeline (`nemo_skills/pipeline/generate.py`)