Skip to content

Use run.Script for generate pipeline#1052

Merged
gwarmstrong merged 37 commits intomainfrom
georgea/refactor-generate-run-script
Dec 17, 2025
Merged

Use run.Script for generate pipeline#1052
gwarmstrong merged 37 commits intomainfrom
georgea/refactor-generate-run-script

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Nov 21, 2025

Refactor Generate Declarative Pipeline to Scripts

Summary

  • Introduce reusable script dataclasses (ServerScript, SandboxScript, GenerationClientScript) with explicit fields, lazy command generation, and cross-component references.
  • Add multi-model support so a single pipeline can launch heterogeneous server/client groups for multiple models.
  • Centralize resource selection (num_gpus, num_nodes, num_tasks) inside HardwareConfig and script objects; remove redundant command-level overrides.
  • Preserve existing generation functionality while making the pipeline easier to extend and reason about.

Key Changes

Declarative Core (nemo_skills/pipeline/utils/declarative.py)

  • Command now wraps a run.Script instance; redundant gpus, nodes, and installation_command fields were removed.
  • num_tasks, num_gpus, and num_nodes come from HardwareConfig when creating executors.

Generation Pipeline (nemo_skills/pipeline/generate.py)

  • _create_job_unified() now instantiates script objects, wires their cross-references (client ↔ server/sandbox), and adds them to CommandGroups.
  • Hardware provisioning is derived from the group-level values without redundant max() scans of individual commands.

Summary by CodeRabbit

Release Notes

  • New Features

    • Multi-model generation support with per-model server configurations and settings.
    • New normalization utilities for configuration parameter handling across multiple models.
  • API Changes

    • Generation function parameters (model, server_address, server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container) now support list inputs for per-model configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

📝 Walkthrough

Walkthrough

This PR introduces multi-model generation support and refactors command execution from string-based to typed Script objects. The generate() function and related utilities now handle per-model configurations (servers, GPUs, nodes, args), while a new unified _create_job_unified() flow constructs CommandGroups. Script classes replace inline command strings, enabling runtime cross-component reference resolution.

Changes

Cohort / File(s) Change Summary
Multi-model generation refactoring
nemo_skills/pipeline/generate.py
Replaced _create_commandgroup_from_config() with _create_job_unified() supporting multi-model flows; converted model, server_address, server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container from scalar to list types; added parameter normalization and multi-model validation.
Normalization utilities
nemo_skills/pipeline/utils/generation.py
Added normalize_models_config() and normalize_parameter() helper functions; extended get_generation_cmd() with optional server_addresses, model_names, server_types parameters for multi-model command generation.
Script-based command abstraction
nemo_skills/pipeline/utils/scripts.py
New module introducing BaseJobScript, ServerScript, SandboxScript, GenerationClientScript dataclasses; encapsulates command construction, port allocation, hostname resolution, and cross-component references.
Script-based declarative pipeline
nemo_skills/pipeline/utils/declarative.py
Replaced string-based command execution with Script-based approach; Command now stores run.Script instead of string; prepare_for_execution() returns (Script, Dict) instead of (str, Dict); added HardwareConfig.num_tasks field and script-level het group indexing.
Evaluator client script abstraction
nemo_skills/pipeline/nemo_evaluator.py
Introduced EvaluatorClientScript dataclass for runtime server resolution; replaced inline command strings in _create_serving_command_obj() and _build_client_command() with script-based construction.
Public API exports
nemo_skills/pipeline/utils/__init__.py
Re-exported normalize_models_config and normalize_parameter from generation module.
Test updates
tests/test_declarative_pipeline.py, tests/test_generation.py, tests/test_nemo_evaluator_pipeline.py
Refactored tests to use _create_job_unified() instead of deprecated function; updated assertions to validate ServerScript and EvaluatorClientScript instances; replaced direct command string expectations with script-based checks; added DummyScript test helper.

Sequence Diagram

sequenceDiagram
    actor User
    participant Gen as generate()
    participant Unify as _create_job_unified()
    participant Scripts as Script Factory
    participant Exec as Executor
    
    User->>Gen: Call with models[], server_types[], GPUs[]
    Gen->>Gen: Normalize per-model parameters
    Gen->>Unify: Pass normalized models + configs
    
    Unify->>Scripts: Create ServerScript per model
    activate Scripts
        Scripts->>Scripts: Allocate ports
        Scripts->>Scripts: Build server commands
    deactivate Scripts
    
    Unify->>Scripts: Create SandboxScript (optional)
    activate Scripts
        Scripts->>Scripts: Allocate sandbox port
    deactivate Scripts
    
    Unify->>Scripts: Create GenerationClientScript
    activate Scripts
        Scripts->>Scripts: Wire server references
        Scripts->>Scripts: Build lazy client command
    deactivate Scripts
    
    Unify->>Unify: Aggregate into CommandGroups[]
    Unify-->>Gen: Return groups list
    
    Gen->>Exec: Execute CommandGroups
    activate Exec
        Exec->>Scripts: Resolve script.inline (runtime)
        Scripts-->>Scripts: Substitute hostnames for het jobs
        Scripts-->>Exec: Final command
    deactivate Exec
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Script class hierarchy and initialization logic: Multiple new dataclasses with complex post-init flows (BaseJobScript, ServerScript, SandboxScript, GenerationClientScript); careful review of port allocation, hostname resolution, and lazy command building needed
  • Multi-model parameter normalization and validation: New branching logic in generate() to broadcast/validate per-model configurations; ensure length matching and error messages are correct
  • Command execution abstraction change: Fundamental shift from string commands to Script objects throughout declarative.py; verify all command construction paths properly handle the new Script interface
  • Cross-component reference resolution: Script-based hostname/port references for het jobs; validate that runtime substitution works correctly for multi-model multi-server scenarios
  • Test compatibility and coverage: Substantial test updates across three test files; ensure all critical paths are properly validated with the new script-based approach

Possibly related PRs

  • Skills#888: Adds _create_commandgroup_from_config() which this PR replaces with _create_job_unified() — direct predecessor that this PR refactors and extends for multi-model support.
  • Skills#790: Modifies container-default logic in generate.py — overlapping changes to generation task construction that may require merge conflict resolution or coordination.
  • Skills#848: Propagates keep_mounts_for_sandbox parameter in generate() — parallel parameter handling that this PR integrates into the unified multi-model flow.

Suggested reviewers

  • activatedgeek
  • Kipok

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 73.24% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Use run.Script for generate pipeline' accurately reflects the main refactoring goal: migrating the generate/declarative pipeline to use run.Script-based command objects instead of string-based commands.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/refactor-generate-run-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
tests/test_generation.py (1)

22-25: New test_server_metadata_from_num_tasks correctly validates Script-based server wiring

The test’s expectations on _create_job_unified (first command being a ServerScript, num_tasks >= 1, and group hardware mirroring server_config["num_gpus"]) match the new Script-based orchestration and will catch regressions in server-side task accounting. Use of fixed /tmp/out//tmp/logs is acceptable here since the test doesn’t actually execute jobs, but could be switched to tmp_path-derived locations if you ever see interference on shared CI hosts.

Also applies to: 156-194

nemo_skills/pipeline/utils/generation.py (1)

360-435: Multi-model argument handling in get_generation_cmd could validate list lengths

The multi-model branch assumes server_types and server_addresses are present and aligned with model_names:

if server_addresses is not None and model_names is not None:
    num_models = len(model_names)
    if num_models > 1:
        model_names_arg = ",".join(model_names)
        server_types_arg = ",".join(server_types)
        server_addresses_arg = ",".join(server_addresses)

Given callers already normalize lists, this is likely fine, but a defensive assertion (e.g., checking len(server_types) == len(model_names) == len(server_addresses)) would make this helper safer when used outside the current generate path.

nemo_skills/pipeline/generate.py (3)

232-271: Typer parameterization for multi-model arguments is reasonable, but docstring slightly overpromises Python types

The CLI-facing List[...] typer.Option parameters for model, server_address, server_type, etc., align with Typer’s multi-value pattern. Internally you normalize via normalize_models_config/normalize_parameter, which supports both scalars and lists.

Just note that the function annotations still show List[...], while the docstring promises Python callers can pass scalars. That’s true at runtime, but slightly mismatched type-wise; if you add type checking later you may want model: Any plus explicit validation, or overloads.


551-586: generation_params dict is a good abstraction point, but contains some implicit invariants

Packing all generation-related knobs plus multi-model fields into generation_params and passing it to _create_job_unified keeps the job builder decoupled from the CLI layer, which is nice.

Two small considerations:

  • generation_params["output_dir"] and ["script"] are required for GenerationClientScript but not enforced here; if _create_job_unified is ever reused elsewhere, a light assertion docstring or validation would help avoid KeyErrors.
  • The comment about multi-model groups (“one per model + one for client”) is slightly misleading: currently _create_job_unified returns at most one group per model, with the client in group 0; there is no dedicated “client only” group.

50-201: The main servers-list issue has been fixed; verify test coverage for mixed hosting remains the secondary concern

Good news: The handling of None entries in the servers list has been corrected. GenerationClientScript.__post_init__() now properly iterates through servers with enumerate, using the same index to access both self.servers and self.server_addresses_prehosted. This preserves the intended server-to-model alignment in mixed hosting scenarios (e.g., [host_server_1, None, host_server_3] with corresponding pre-hosted addresses).

Remaining observations:

  1. Test coverage gap
    tests/test_generation.py only tests single-model scenarios. Multi-model runs with mixed hosting (some self-hosted, some pre-hosted) lack explicit test cases. Consider adding a test for this scenario to prevent regressions.

  2. Minor improvements

    • zip(models, server_configs) (line 94) could add strict=True for Python 3.10+ to catch length mismatches early
    • num_tasks fallback to 1 for client-only groups is reasonable but document if GenerationClientScript may later override this
nemo_skills/pipeline/utils/declarative.py (3)

210-257: Command now correctly encapsulates lazy Script evaluation, with one minor unused-arg nit

Command.prepare_for_execution:

  • Evaluates script.inline when it is callable, allowing (cmd, metadata) tuples for env injection.
  • Uses set_inline to update the Script, which matches the new BaseJobScript pattern.
  • Builds a minimal execution_config (log_prefix, environment, mounts placeholder, container) and returns (script, execution_config).

Two minor notes:

  • The cluster_config parameter is currently unused in this method; if you don’t plan to use it, you can drop it or add a brief comment indicating it’s reserved for future use (to quiet linters).
  • This method assumes every script has a set_inline method; that’s true for the new Script types and DummyScript, but third-party run.Script subclasses must expose it as well. If your public API allows arbitrary run.Script instances, consider a hasattr(self.script, "set_inline") guard with a fallback to plain script.inline = ....

495-535: _prepare_command and _rewrite_local_paths correctly adapt Scripts for local execution

Key aspects:

  • _prepare_command uses Command.prepare_for_execution, and for executor in ("none", "local") rewrites /nemo_run/code/... paths to local repo paths when get_registered_external_repo("nemo_skills") is configured.
  • _rewrite_local_paths supports both string and callable inline values, transparently rewriting commands while preserving optional (cmd, metadata) tuple semantics.

This is a sensible way to keep local runs working with packaged-code assumptions. Just be aware that _rewrite_local_paths implicitly assumes the Nemo repo is registered under the "nemo_skills" key; if that mapping isn’t present, it silently no-ops, which seems fine but might deserve a one-line doc comment.


586-778: _plan_and_add_job is the core of the refactor and looks mostly solid

Strengths:

  • Assigns script.het_group_index before any evaluation so hostname-based cross-references see the right indices.
  • Prepares all commands first (collecting script + exec_config) and only then constructs executors, letting you:
    • Share environment vars across heterogeneous groups (shared_env_vars)
    • Share packager across components for single-group jobs
  • Always uses group.name for the underlying SLURM job name, while per-component command.name feeds log prefixes, simplifying mental mapping.
  • Keeps code-reuse logic (via REUSE_CODE_EXP and get_packaging_job_key) constrained to non-heterogeneous, non-none executors.

Two behavioural subtleties to watch:

  1. het_group_indices content
    You append het_idx per executor, so executors[0].het_group_indices will contain one entry per script, not per group. If nemo_run’s heterogeneous job implementation expects unique group indices only once each, this might differ from previous behaviour. It’s likely benign, but worth verifying against nemo_run expectations.

  2. Environment merging for heterogeneous jobs
    For heterogeneous jobs, you merge shared_env_vars back into each exec_config["environment"]. That’s good for consistency but means later commands can overwrite earlier env keys. If multiple groups intentionally set conflicting env vars, the last-wins semantics could be surprising; if that’s not a concern in current use cases, you can leave as-is.

Overall, this function is well-structured and matches the new Script abstraction.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9115aef and 8162f17.

📒 Files selected for processing (9)
  • nemo_skills/pipeline/generate.py (9 hunks)
  • nemo_skills/pipeline/nemo_evaluator.py (6 hunks)
  • nemo_skills/pipeline/utils/__init__.py (1 hunks)
  • nemo_skills/pipeline/utils/declarative.py (11 hunks)
  • nemo_skills/pipeline/utils/generation.py (5 hunks)
  • nemo_skills/pipeline/utils/scripts.py (1 hunks)
  • tests/test_declarative_pipeline.py (25 hunks)
  • tests/test_generation.py (2 hunks)
  • tests/test_nemo_evaluator_pipeline.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
nemo_skills/pipeline/utils/__init__.py (1)
nemo_skills/pipeline/utils/generation.py (2)
  • normalize_models_config (30-59)
  • normalize_parameter (62-102)
tests/test_generation.py (2)
nemo_skills/pipeline/generate.py (2)
  • generate (206-637)
  • _create_job_unified (50-201)
nemo_skills/pipeline/utils/scripts.py (1)
  • ServerScript (120-214)
tests/test_nemo_evaluator_pipeline.py (3)
nemo_skills/pipeline/nemo_evaluator.py (2)
  • nemo_evaluator (113-421)
  • EvaluatorClientScript (726-775)
nemo_skills/pipeline/utils/declarative.py (2)
  • Command (211-259)
  • CommandGroup (273-286)
nemo_skills/pipeline/utils/scripts.py (1)
  • ServerScript (120-214)
nemo_skills/pipeline/utils/scripts.py (4)
nemo_skills/pipeline/utils/commands.py (1)
  • sandbox_command (77-111)
nemo_skills/pipeline/utils/exp.py (1)
  • install_packages_wrap (368-408)
nemo_skills/pipeline/utils/generation.py (1)
  • get_generation_cmd (360-495)
nemo_skills/pipeline/utils/server.py (2)
  • get_free_port (43-59)
  • get_server_command (114-227)
tests/test_declarative_pipeline.py (1)
nemo_skills/pipeline/utils/declarative.py (6)
  • Command (211-259)
  • prepare_for_execution (223-256)
  • get_name (258-259)
  • CommandGroup (273-286)
  • Pipeline (289-820)
  • run (356-493)
nemo_skills/pipeline/utils/declarative.py (6)
nemo_skills/pipeline/utils/cluster.py (1)
  • get_env_variables (163-276)
nemo_skills/pipeline/utils/packager.py (1)
  • get_registered_external_repo (64-76)
nemo_skills/pipeline/utils/server.py (1)
  • wrap_python_path (66-67)
nemo_skills/utils.py (1)
  • get_logger_name (39-43)
tests/test_declarative_pipeline.py (1)
  • set_inline (39-40)
nemo_skills/pipeline/utils/scripts.py (2)
  • set_inline (98-100)
  • wrapped_inline (87-92)
🪛 Ruff (0.14.7)
tests/test_generation.py

174-174: Probable insecure usage of temporary file or directory: "/tmp/out"

(S108)


186-186: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

nemo_skills/pipeline/generate.py

94-94: Loop control variable model_path not used within loop body

Rename unused model_path to _model_path

(B007)


94-94: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


232-236: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


237-241: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


242-246: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


247-251: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


252-256: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


257-261: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


262-266: Do not perform function call typer.Option in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


415-417: Avoid specifying long messages outside the exception class

(TRY003)

nemo_skills/pipeline/utils/generation.py

50-50: Avoid specifying long messages outside the exception class

(TRY003)


58-58: Avoid specifying long messages outside the exception class

(TRY003)


99-102: Avoid specifying long messages outside the exception class

(TRY003)

tests/test_declarative_pipeline.py

587-587: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)


590-590: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)


676-676: Probable insecure usage of temporary file or directory: "/tmp/logs"

(S108)

nemo_skills/pipeline/utils/declarative.py

223-223: Unused method argument: cluster_config

(ARG002)

🔇 Additional comments (28)
nemo_skills/pipeline/utils/__init__.py (1)

47-55: Re-exporting normalization helpers looks consistent with API surface

Exposing normalize_models_config and normalize_parameter via pipeline.utils keeps the public API coherent with how other generation utilities are surfaced. No issues from this change alone.

nemo_skills/pipeline/utils/generation.py (1)

20-102: Normalization helpers are correct and robust for CLI + Python usage

Both normalize_models_config and normalize_parameter implement the intended broadcast semantics cleanly and should behave well for:

  • CLI (Typer) lists
  • Python scalars or lists
    The error messaging on mismatched lengths is also clear. No functional issues spotted.
tests/test_nemo_evaluator_pipeline.py (2)

20-28: Evaluator pipeline tests align with Script-based design

The updated tests:

  • Import and assert on EvaluatorClientScript/ServerScript
  • Check client command naming (evaluator-test-client-0...), server num_gpus, log_prefix, and non-None ports
  • Verify hardware allocation is driven by hosted server GPU counts

This gives good coverage of the new Nemo-evaluator orchestration without over-specifying internal details. Looks solid.

Also applies to: 95-204


206-327: Judge-server hosted & dual-server tests correctly exercise grouping and Script wiring

The test_judge_server_hosted and test_both_servers_hosted_separate_groups cases validate:

  • Judge server uses ServerScript with log_prefix == "judge-server" and correct num_gpus
  • Client always uses EvaluatorClientScript whose inline is callable (lazy build)
  • Hardware per-group matches server GPU/node counts

This aligns well with the documented grouping strategy (main+client vs judge-only groups).

nemo_skills/pipeline/nemo_evaluator.py (4)

90-188: High-level nemo_evaluator flow is clear and consistent with the new Script model

The command docstring and option set cleanly describe the four hosting scenarios, and the function now:

  • Builds a _TaskCreationContext per task
  • Routes through _build_main_server_if_needed, _build_judge_server_if_needed, and _build_client_command
  • Groups commands based on hosting strategy

No obvious logic errors in the main orchestration; the separation into helper functions improves readability.


573-647: _build_client_command and EvaluatorClientScript implement clean runtime URL resolution

_build_client_command now creates a Command powered by EvaluatorClientScript, passing in optional main/judge ServerScript references. EvaluatorClientScript.__post_init__:

  • Computes main/judge URLs from hosted servers (hostname_ref() + port) or external base URLs
  • Adds health-check waits via get_server_wait_cmd
  • Delegates to _build_task_cmd for the actual Nemo-Evaluator command, injecting URL/model overrides
  • Returns (final_cmd, {"environment": env_vars}) for Command.prepare_for_execution

This matches the new Script-based pipeline semantics and should work for all four hosting modes.


546-570: Hardware allocation helper for evaluator groups matches hosting semantics

_hardware_for_group cleanly encapsulates SLURM-related fields, including partition, num_gpus, num_nodes, and sbatch_kwargs with QoS. Using it consistently for all evaluator groups keeps hardware decisions centralized.


430-497: _create_serving_command_obj correctly wraps servers in ServerScript and Command

The helper:

  • Normalizes server_type (warns on unsupported types)
  • Instantiates ServerScript with num_gpus, num_nodes, args, entrypoint, and port/allocate_port
  • Applies a judge-specific log_prefix and chooses container from cluster_config["containers"][stype] when not overridden
  • Returns a Command with clear role-specific names
tests/test_declarative_pipeline.py (8)

17-52: DummyScript and make_command are well-designed test scaffolding

The DummyScript stand-in (with inline, set_inline, log_prefix, het_group_index, and hostname_ref) plus make_command() cleanly simulate real run.Script instances for unit tests. This keeps tests decoupled from the actual Script implementations while matching the new Command(script=...) API.


54-107: TestCommand suite thoroughly covers new Command semantics

The tests validate:

  • Basic construction (name, default container, script.inline)
  • prepare_for_execution with inline strings and callable inlines (with and without metadata/environment)
  • hostname_ref behaviour for default and heterogeneous cases
  • get_name() access

This gives good confidence that Command.prepare_for_execution and DummyScript.hostname_ref behave as expected.


112-140: CommandGroup tests reflect the new Script-centric API correctly

The TestCommandGroup tests ensure:

  • Basic grouping and default HardwareConfig
  • Custom hardware propagation
  • log_dir handling

All use make_command and thus implicitly validate compatibility with the new Command signature.


145-237: Pipeline construction tests line up with validation changes and job specs

These tests cover:

  • Single-job and multi-job pipelines
  • run_after semantics as string or list
  • Direct cluster_config usage (no name-based resolution)
  • Required jobs and name fields

They match the updated Pipeline._validate logic and job schema and look correct.


242-344: Basic pipeline execution and HF_HOME validation remain correct with Script-based flow

The tests around Pipeline.run:

  • Confirm experiment creation and exp.add calls
  • Validate HF_HOME presence and mount checks occurring in __init__ rather than run()
  • Ensure executors are constructed for SLURM paths

The use of make_command + DummyScript keeps them aligned with the new API while retaining the prior behavioural guarantees.


374-477: Het-group index tests validate new Script-level indexing semantics

TestHetGroupIndices asserts:

  • Non-heterogeneous jobs leave het_group_index as None and hostname resolves to localhost
  • Heterogeneous jobs assign indices per group (0, 1, …), and hostname_ref() embeds the correct SLURM env vars
  • Indices are per-job, not global across the pipeline

These tests are an excellent fit for the new _plan_and_add_job het-indexing strategy.


479-724: Dependency resolution tests still accurately describe internal vs external handling

The updated dependency tests cover:

  • Explicit None dependencies
  • Pipeline-level run_after propagating to jobs without their own deps
  • Multiple internal dependencies (job objects → handles)
  • Separation of external experiment deps (via get_exp_handles) from internal handles

They align well with the logic in Pipeline.run and _plan_and_add_job.


829-897: Sandbox environment propagation test is a good end-to-end check

test_generate_with_sandbox_passes_env_vars_correctly asserts that:

  • temporary_env_update is called with updates containing NEMO_SKILLS_SANDBOX_PORT on the client side
  • Env-patching paths in generate() + declarative Pipeline behave as expected with Script-based commands

This is a valuable regression test for the sandbox refactor.

nemo_skills/pipeline/generate.py (4)

32-36: New Script imports are appropriate and localized

Importing GenerationClientScript, SandboxScript, and ServerScript here is consistent with this module’s responsibility for building generation jobs. No issues.


382-412: Model and server parameter normalization logic is sound

The sequence:

  • models_list = normalize_models_config(model)
  • Convert server_type enums → strings
  • Broadcast server_type, server_gpus, server_nodes, server_args, server_entrypoint, server_container, and server_address with normalize_parameter
  • Enforce that multi-model usage requires generation_type or generation_module

gives a coherent multi-model configuration story. The broadcasting semantics are clear and should behave well for both CLI and Python code.


502-541: Per-model configure_client loop correctly separates single vs multi-model server overrides

The inner loop:

  • Calls configure_client per model to build server_configs and resolved addresses.
  • Uses extra_arguments_original only for the first model, then:
    • For single-model: captures srv_extra_args so that server config is expressed via extra_arguments.
    • For multi-model: ignores srv_extra_args and leaves per-model server configuration to GenerationClientScript + get_generation_cmd.

This avoids double-injecting per-model overrides and nicely preserves the single-model semantics.


603-637: Job spec construction for single vs multi-group is consistent with Pipeline expectations

Using:

  • "groups": job_groups when len(job_groups) > 1
  • "group": job_groups[0] otherwise

ensures Pipeline.run correctly chooses between single-group and heterogeneous multi-group jobs. Naming internal_job_name based on task_name and dep_idx is also clear and keeps dependency wiring straightforward.

nemo_skills/pipeline/utils/declarative.py (5)

15-42: Module-level refactor to Script-based execution is well-scoped

Switching this module to:

  • Import nemo_run as run
  • Pull in pipeline utilities (get_env_variables, get_executor, etc.)
  • Use get_registered_external_repo + wrap_python_path for local execution

sets the stage for Script-centric orchestration without leaking responsibilities into callers. No issues at the import/architecture level.


262-270: HardwareConfig extension with num_tasks aligns with executor needs

Adding num_tasks to HardwareConfig and defaulting it to 1 lets the executor distinguish between node count and tasks per node. This plays nicely with ServerScript.num_tasks feeding into _create_job_unified.


356-495: Pipeline validation and high-level run loop remain sound after the Script refactor

The Pipeline.__init__ and run methods still:

  • Validate job specs and cluster_config upfront
  • Enforce HF_HOME presence/mounting for non-none executors (unless explicitly skipped)
  • Cleanly separate internal (handles) vs external (experiment names → SLURM job IDs) dependencies
  • Distinguish single-group vs multi-group jobs and defer to _add_*_job

This matches prior behaviour while delegating planning details to _plan_and_add_job.


543-585: Executor creation now uses num_tasks and supports a group-wide job name

The updated _create_executor:

  • Maps HardwareConfig.num_nodes/num_tasks/num_gpus to executor args.
  • Uses job_name_override (currently the group name) as the SLURM job name, so all components in a group share a stable name without embedding role suffixes.
  • Wraps environment updates via temporary_env_update, keeping env-setting logic centralized.

This is a good match for the new Script-based design.


780-820: Single-group vs multi-group helper methods correctly delegate to _plan_and_add_job

_add_single_group_job and _add_multi_group_job simply wrap _plan_and_add_job with heterogeneous=False/True, which keeps the public behaviour unchanged while centralizing the complex logic. No issues here.

nemo_skills/pipeline/utils/scripts.py (3)

62-117: LGTM - Solid base class design for heterogeneous job support.

The set_inline pattern using object.__setattr__ is an appropriate workaround for safely mutating frozen dataclass fields. The hostname_ref() method correctly handles both hetjob and non-hetjob scenarios with a sensible localhost fallback.


217-272: LGTM - Good documentation of responsibility boundaries.

The comment on line 265 clarifying that keep_mounts is handled at the executor level is helpful for maintainability. The same port validation concern from ServerScript applies here when allocate_port=False and port=None.


361-407: Well-designed lazy command building pattern.

The closure-based lazy evaluation for build_cmd is appropriate since hostname_ref() depends on het_group_index being set at runtime by the pipeline. The environment variable plumbing for sandbox port communication follows the pattern documented in sandbox_command.

One minor observation: the docstring example on lines 318-325 shows servers=[server1, server2, None] with parallel server_addresses_prehosted=["", "", "https://api.openai.com"]. The empty strings for self-hosted servers work but could be slightly misleading—consider documenting that these values are ignored when the corresponding servers entry is not None.

Comment on lines +180 to +183
# Allocate port if not provided
if self.port is None and self.allocate_port:
self.port = get_free_port(strategy="random")
LOG.debug(f"Allocated port {self.port} for {self.server_type} server")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's find the ServerScript class definition and see port field constraints
rg -n "class ServerScript" --type py

Repository: NVIDIA-NeMo/Skills

Length of output: 138


🏁 Script executed:

# Search for get_server_command to understand how it handles port
rg -n "def get_server_command" --type py -A 15

Repository: NVIDIA-NeMo/Skills

Length of output: 2561


🏁 Script executed:

# Check the ServerScript class fields and any defaults
sed -n '119,180p' nemo_skills/pipeline/utils/scripts.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 2828


🏁 Script executed:

# View the complete __post_init__ method
sed -n '180,210p' nemo_skills/pipeline/utils/scripts.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 1355


🏁 Script executed:

# Check where get_server_command is called with the port parameter
rg -n "get_server_command" nemo_skills/pipeline/utils/scripts.py -A 5 -B 5

Repository: NVIDIA-NeMo/Skills

Length of output: 1406


🏁 Script executed:

# Check for usages of ServerScript with allocate_port=False
rg -n "ServerScript\|allocate_port" --type py -B 2 -A 2

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Also check if there are any type hints or validation that we missed
rg -n "server_port.*None" nemo_skills/pipeline/utils/server.py -B 5 -A 5

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Let's check the full signature and docstring of get_server_command to see if it handles None
sed -n '114,160p' nemo_skills/pipeline/utils/server.py | cat -n

Repository: NVIDIA-NeMo/Skills

Length of output: 2325


Add validation to ensure port is set when allocate_port=False.

If allocate_port=False and port=None, the port remains None and is passed to get_server_command, which expects an int for the server_port parameter. This causes a runtime error.

Add validation after the port allocation block:

         # Allocate port if not provided
         if self.port is None and self.allocate_port:
             self.port = get_free_port(strategy="random")
             LOG.debug(f"Allocated port {self.port} for {self.server_type} server")
+        elif self.port is None:
+            raise ValueError("port must be provided when allocate_port=False")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Allocate port if not provided
if self.port is None and self.allocate_port:
self.port = get_free_port(strategy="random")
LOG.debug(f"Allocated port {self.port} for {self.server_type} server")
# Allocate port if not provided
if self.port is None and self.allocate_port:
self.port = get_free_port(strategy="random")
LOG.debug(f"Allocated port {self.port} for {self.server_type} server")
elif self.port is None:
raise ValueError("port must be provided when allocate_port=False")
🤖 Prompt for AI Agents
In nemo_skills/pipeline/utils/scripts.py around lines 180 to 183, after the
port-allocation block add validation to ensure self.port is set when
allocate_port is False: if self.port is still None raise a clear exception
(e.g., ValueError) or log and exit, so get_server_command always receives an
int; implement the check immediately after the allocation block and include a
descriptive error message referencing server_type.

Comment on lines +371 to +380
if self.servers is not None:
server_addresses = []
for server_idx, server_script in enumerate(self.servers):
if server_script is not None:
# Self-hosted: construct address from hostname and port refs
addr = f"{server_script.hostname_ref()}:{server_script.port}"
else:
# Pre-hosted: use the address from server_addresses_prehosted
addr = self.server_addresses_prehosted[server_idx]
server_addresses.append(addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential IndexError or TypeError when accessing server_addresses_prehosted.

If self.servers contains None entries (indicating pre-hosted servers), the code assumes self.server_addresses_prehosted is a list with matching indices. However, if server_addresses_prehosted is None or shorter than expected, this will raise an exception at runtime.

Consider adding validation:

             # Build server addresses if servers are provided
             server_addresses = None
             if self.servers is not None:
+                # Validate parallel lists
+                if any(s is None for s in self.servers):
+                    if self.server_addresses_prehosted is None:
+                        raise ValueError(
+                            "server_addresses_prehosted must be provided when servers contains None entries"
+                        )
+                    if len(self.server_addresses_prehosted) != len(self.servers):
+                        raise ValueError(
+                            "server_addresses_prehosted must have same length as servers"
+                        )
                 server_addresses = []
                 for server_idx, server_script in enumerate(self.servers):

Alternatively, move validation to __post_init__ outside the build_cmd closure to fail fast during construction.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.servers is not None:
server_addresses = []
for server_idx, server_script in enumerate(self.servers):
if server_script is not None:
# Self-hosted: construct address from hostname and port refs
addr = f"{server_script.hostname_ref()}:{server_script.port}"
else:
# Pre-hosted: use the address from server_addresses_prehosted
addr = self.server_addresses_prehosted[server_idx]
server_addresses.append(addr)
if self.servers is not None:
# Validate parallel lists
if any(s is None for s in self.servers):
if self.server_addresses_prehosted is None:
raise ValueError(
"server_addresses_prehosted must be provided when servers contains None entries"
)
if len(self.server_addresses_prehosted) != len(self.servers):
raise ValueError(
"server_addresses_prehosted must have same length as servers"
)
server_addresses = []
for server_idx, server_script in enumerate(self.servers):
if server_script is not None:
# Self-hosted: construct address from hostname and port refs
addr = f"{server_script.hostname_ref()}:{server_script.port}"
else:
# Pre-hosted: use the address from server_addresses_prehosted
addr = self.server_addresses_prehosted[server_idx]
server_addresses.append(addr)

@gwarmstrong gwarmstrong enabled auto-merge (squash) December 17, 2025 02:07
@gwarmstrong gwarmstrong merged commit 9c5b68c into main Dec 17, 2025
5 checks passed
@gwarmstrong gwarmstrong deleted the georgea/refactor-generate-run-script branch December 17, 2025 02:17
gwarmstrong added a commit that referenced this pull request Dec 17, 2025
gwarmstrong added a commit that referenced this pull request Dec 17, 2025
This reverts commit 9c5b68c.

Signed-off-by: George Armstrong <georgea@nvidia.com>
gwarmstrong added a commit that referenced this pull request Dec 18, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
gwarmstrong added a commit that referenced this pull request Dec 18, 2025
This reverts commit 1c0722a.

FIX multi-node pipeline creation

Signed-off-by: George Armstrong <georgea@nvidia.com>

remove hosntame ref change

Signed-off-by: George Armstrong <georgea@nvidia.com>

make param span_group_nodes

Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad pushed a commit that referenced this pull request Dec 19, 2025
Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
stephencge added a commit that referenced this pull request Dec 19, 2025
- Add nemo_skills/inference/hilbert.py: unified generation module that
  orchestrates prover (vLLM) + reasoner (Gemini) in single job using
  multi-model support from PR #1052
- Add hilbert_unified stage to stages.py for pipeline orchestration
- Add tokens_to_generate param to hilbert_prover (default 5K for testing)
- Add unified-local-test.yaml config for testing unified pipeline

Pipeline flow: hilbert_d0 → split_d0 → hilbert_d1 → split_d1 → assemble

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Stephen Ge <stepheng@nvidia.com>
blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dlord <dlord@nvidia.com>
blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026
…DIA-NeMo#1125)

Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dlord <dlord@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
wasiahmad pushed a commit that referenced this pull request Feb 4, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants