chore: Add evaluation pipeline by ko3n1g · Pull Request #1876 · NVIDIA-NeMo/Megatron-Bridge

ko3n1g · 2026-01-07T14:21:07Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Release Notes

New Features
- Added evaluation pipeline orchestration with support for SLURM and Kubernetes/DGXCloud cluster deployments
- Added comprehensive CLI argument parsing with environment variable configuration support
- Added automated deployment and evaluation execution scripts with server readiness checks
- Added Weights & Biases integration for experiment tracking and results logging
- Added Ray-based job execution and monitoring capabilities

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: oliver könig <okoenig@nvidia.com>

…on-pipeline

Signed-off-by: oliver könig <okoenig@nvidia.com>

chtruong814

Minor comment about copyright years. Please start using 2026 for new files.

chtruong814 · 2026-01-24T00:04:30Z

examples/evaluation/argument_builder.py

@@ -0,0 +1,69 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


Please start using 2026 for the copyright year on new files.

Signed-off-by: oliver könig <okoenig@nvidia.com>

…on-pipeline

coderabbitai · 2026-01-24T00:48:48Z

📝 Walkthrough

Walkthrough

This PR introduces an evaluation pipeline infrastructure for Megatron-Bridge. It adds CLI utilities, shell scripts, and Python modules to handle deployment configuration, evaluation job orchestration across SLURM and Kubernetes environments, and executor setup with appropriate cluster settings.

Changes

Cohort / File(s)	Summary
CLI Argument Utilities `examples/evaluation/argument_builder.py`, `examples/evaluation/argument_parser.py`	`argument_builder.py` adds three functions for constructing CLI argument strings from environment variables: `list_of_strings()`, `normalize_arg_name()`, and `build_cli_args_from_env_vars()`. `argument_parser.py` defines a comprehensive argparse-based CLI with helper utilities, deployment settings (checkpoint, host, port, parallelism), evaluation parameters, SLURM/DGXCloud config, logging, and tokenizer arguments via `parse_cli_args()`.
Evaluation Pipeline Orchestration `examples/evaluation/launch_evaluation_pipeline.py`	Main orchestration script that selects executor type (SLURM or KubeRay) based on input, configures signal handling for job termination, manages RayJob lifecycle (create/start/wait/monitor), streams logs, and handles optional WANDB integration for result logging. Includes `CustomJobDetailsRay` class with specialized log path property.
Deployment & Evaluation Scripts `examples/evaluation/deploy.sh`, `examples/evaluation/eval.sh`	`deploy.sh` unsets MPI environment variables and launches Megatron Ray deployment with fixed parallel sizes. `eval.sh` unsets MPI variables, installs dependencies, checks server readiness, builds evaluation configuration objects, executes evaluation workflow against remote endpoint, and performs cleanup.
Cluster Executor Factories `examples/evaluation/utils/executors.py`	Provides two factory functions: `slurm_executor()` configures a Slurm-based executor with cluster, container, and HF token settings; `kuberay_executor()` constructs a KubeRay cluster definition with worker groups, resource requests, extensive environment variables, and persistent volume configuration.
Script Maintenance `scripts/performance/argument_parser.py`	Removes unused comment block label in Testing arguments section (−2 lines).

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Pipeline as launch_evaluation_pipeline.py
    participant Executor as Executor (SLURM/KubeRay)
    participant RayJob as RayJob Orchestrator
    participant Deploy as deploy.sh
    participant Eval as eval.sh
    participant Service as Ray/Megatron Service
    participant Evaluator as nemo_evaluator

    User->>Pipeline: Execute with config args
    Pipeline->>Pipeline: parse_cli_args()
    alt dgxc_cluster set
        Pipeline->>Executor: kuberay_executor(config)
    else
        Pipeline->>Executor: slurm_executor(config)
    end
    Executor-->>Pipeline: Configured Executor
    Pipeline->>RayJob: Create RayJob with composite bash command
    Pipeline->>RayJob: Start job (deploy.sh + eval.sh)
    RayJob->>Deploy: Execute deploy.sh
    Deploy->>Service: Launch Ray + Megatron model
    Service-->>Deploy: Service ready
    RayJob->>Eval: Execute eval.sh
    Eval->>Eval: check_endpoint() readiness
    Eval->>Evaluator: Build ApiEndpoint, EvaluationTarget, ConfigParams
    Evaluator->>Service: Send evaluation requests
    Service-->>Evaluator: Model responses
    Evaluator-->>Eval: Evaluation results
    Eval->>Eval: Write results.yml
    Pipeline->>RayJob: Monitor job status (RUNNING)
    RayJob-->>Pipeline: Job complete, results available
    Pipeline->>Pipeline: Read results.yml
    alt WANDB configured
        Pipeline->>Pipeline: Log results to WANDB
    end
    Pipeline->>RayJob: Stop job + cleanup
    Pipeline-->>User: Return results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces a major new evaluation pipeline with 588+ lines of new code across 6 files without corresponding test files, violating explicit repository testing requirements in CONTRIBUTING.md and the PR template.	Add unit tests for argument builders and parsers, and functional tests for the evaluation pipeline with SLURM and KubeRay executors to the tests/ directory, and document test results in the PR description.
Title check	❓ Inconclusive	The title is overly broad and vague—'Add evaluation pipeline' does not convey the specific implementation details or scope of the changes, which include multiple utility functions, shell scripts, and executor configurations.	Consider a more specific title that highlights the main components added (e.g., 'Add CLI-based evaluation pipeline utilities and executors') to better communicate the scope to reviewers.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

🤖 Fix all issues with AI agents

In `@examples/evaluation/argument_parser.py`:
- Around line 168-173: The help text for the argument added via
logging_args.add_argument("--wandb_entity_name", ...) is incorrect (it currently
says "wandb project name"); update the help string to accurately describe the
entity (e.g., "wandb entity name" or "WandB entity/username for the project") so
the flag --wandb_entity_name clearly documents its purpose in the argument
parser.

In `@examples/evaluation/deploy.sh`:
- Around line 6-19: Wrap the positional variables in quotes to prevent word
splitting and glob expansion: update the python invocation in deploy.sh to use
"$MEGATRON_CHECKPOINT", "$NUM_REPLICAS", and "$NUM_GPUS" wherever they are
interpolated (the --megatron_checkpoint, --num_gpus, and --num_replicas flags)
so paths or values with spaces or glob characters are passed safely to
deploy_ray_inframework.py.
- Around line 1-4: The shell script is missing a shebang so its interpreter is
ambiguous; add a shebang as the very first line (for example "#!/usr/bin/env
bash") to guarantee POSIX/Bash behavior for the for-loop unsets (the lines using
env | grep ^SLURM_/PMI_/PMIX_ and unset -v), and optionally make the script
executable (chmod +x) after updating the file.

In `@examples/evaluation/eval.sh`:
- Around line 1-4: Add a POSIX bash shebang as the first line of the script
(e.g. #!/usr/bin/env bash) so the interpreter is explicit and portable; insert
it above the existing for loops that unset SLURM_/PMI_/PMIX_ (the lines
beginning with for i in $(env | grep ^SLURM_, for i in $(env | grep ^PMI_, and
for i in $(env | grep ^PMIX_)) and ensure the file remains executable (chmod +x)
after the change.
- Around line 30-35: The heredoc directly interpolates $PARALLELISM and
$OUTPUT_DIR into generated Python config, creating injection risk and path
issues; instead, stop injecting raw vars into the Python block—pass PARALLELISM
and OUTPUT_DIR as environment variables or CLI args and have the Python runtime
read them (e.g., os.environ.get("PARALLELISM") for numeric validation and
os.environ.get("OUTPUT_DIR") for path), validate/cast PARALLELISM to an int,
strip any leading slashes from OUTPUT_DIR or use os.path.join to construct
output_dir (avoid prepending a literal "/" in the heredoc), and ensure any
user-provided values are sanitized before use in functions that consume them.

In `@examples/evaluation/launch_evaluation_pipeline.py`:
- Around line 146-155: The code assumes run_id exists after checking runs; if
runs is empty this causes a NameError when calling wandb.init. Fix by explicitly
setting run_id = runs[0].id if runs else None (or log and exit) and pass the id
to wandb.init only when run_id is not None (e.g., build kwargs = {"project":
args.wandb_project_name, "entity": args.wandb_entity_name, "resume": "allow"};
if run_id: kwargs["id"] = run_id; wandb.init(**kwargs)). Update the block around
runs, run_id, and wandb.init accordingly and add a log message when no matching
run is found.
- Around line 1-14: Move the shebang '#!/usr/bin/env python3' to be the very
first line of the file (no preceding blank lines or comments) so it precedes the
copyright header; ensure the rest of the header and file contents remain
unchanged and that the shebang is exactly '#!/usr/bin/env python3'.
- Around line 133-134: The code opens results.yml directly which can raise
FileNotFoundError if the evaluation didn't produce it; update the block that
reads os.path.join(args.output_dir, "results", "results.yml") to first check
existence (os.path.exists) or wrap the open/yaml.safe_load call in a try/except
FileNotFoundError, then handle the missing file by logging a clear error via the
existing logger, exiting gracefully, or setting a sensible default for results;
ensure you reference args.output_dir, the "results.yml" path, and the variable
results/yaml.safe_load when implementing the check/exception handling.

In `@examples/evaluation/utils/executors.py`:
- Around line 78-79: The parameters hf_token and custom_env_vars in the function
signature are typed as str and Dict[...] but default to None; change their types
to Optional[str] and Optional[Dict[str, str]] and import typing.Optional (or use
from typing import Optional) so the annotations match the None default; update
the signature(s) that include hf_token and custom_env_vars (same pattern as the
earlier slurm_executor fix) and adjust any callers or type checks if needed to
handle Optional values.
- Line 100: Replace the hardcoded developer path for "HF_HOME":
"/nemo-workspace/pagaray/hf_cache" with a parameterized or environment-driven
value; update the code that builds the environment dict (where "HF_HOME" is set)
to read from an optional function/class parameter or os.environ.get("HF_HOME",
"/nemo-workspace/.cache/huggingface") so callers can override it while using
"/nemo-workspace/.cache/huggingface" as the sensible default.
- Around line 23-33: The slurm_executor function uses mutable default arguments
(custom_mounts: List[str] = [] and custom_env_vars: Dict[str, str] = {}) and an
implicit Optional for hf_token; change the defaults to None (e.g.,
custom_mounts: Optional[List[str]] = None, custom_env_vars:
Optional[Dict[str,str]] = None, hf_token: Optional[str] = None) and then inside
slurm_executor immediately initialize them with: custom_mounts = custom_mounts
or [] and custom_env_vars = custom_env_vars or {} so callers behave the same
while avoiding mutable default pitfalls and making the Optional intent explicit.
- Around line 132-135: The image_pull_secrets value is hardcoded; change the
function that builds spec_kwargs (referenced by spec_kwargs and
image_pull_secrets in examples/evaluation/utils/executors.py) to accept a new
parameter (e.g., image_pull_secret or image_pull_secrets) and use that parameter
inside spec_kwargs instead of the literal
"dockerregistry-dockerregistry-pagaray-ngc"; make the parameter optional with a
sensible default (None or empty list), convert a single string input into the
expected list format if necessary, and ensure callers of the function are
updated to pass the appropriate secret name where needed.

🧹 Nitpick comments (3)

examples/evaluation/argument_builder.py (1)

20-22: Duplicate function definition.

list_of_strings is defined identically in examples/evaluation/argument_parser.py (lines 17-19) and scripts/performance/argument_parser.py (lines 29-31). Consider extracting to a shared utility module to avoid DRY violations.

examples/evaluation/argument_parser.py (1)

17-24: Duplicate utility functions.

Both list_of_strings and to_dict are identical to implementations in scripts/performance/argument_parser.py (lines 29-31 and 46-48). Consider consolidating into a shared utility module.
examples/evaluation/launch_evaluation_pipeline.py (1)
113-116: Long command string reduces readability.

The command string spans multiple shell operations with pipes and background processes. Consider breaking it into a multi-line string or a separate script for maintainability.
Proposed refactor
    deploy_cmd = f"bash /opt/Megatron-Bridge/examples/evaluation/deploy.sh {args.megatron_checkpoint} {args.num_replicas} {args.num_gpus}"
    eval_cmd = f"bash /opt/Megatron-Bridge/examples/evaluation/eval.sh {args.output_dir} {args.parallelism}"
    
    command = (
        f"{deploy_cmd} | tee -a deploy.log & "
        f"sleep 120; "
        f"{eval_cmd} | tee -a eval.log"
    )
    
    job.start(command=command, workdir=None)

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/argument_parser.py

+    logging_args.add_argument(
+        "--wandb_entity_name",
+        type=str,
+        help="wandb project name",
+        required=False,
+    )


⚠️ Potential issue | 🟡 Minor

Copy-paste error in help text.

The help text for --wandb_entity_name says "wandb project name" but should describe the entity name.

Proposed fix

logging_args.add_argument( "--wandb_entity_name", type=str, - help="wandb project name", + help="wandb entity name", required=False, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logging_args.add_argument(

"--wandb_entity_name",

type=str,

help="wandb project name",

required=False,

)

logging_args.add_argument(

"--wandb_entity_name",

type=str,

help="wandb entity name",

required=False,

)

🤖 Prompt for AI Agents

In `@examples/evaluation/argument_parser.py` around lines 168 - 173, The help text for the argument added via logging_args.add_argument("--wandb_entity_name", ...) is incorrect (it currently says "wandb project name"); update the help string to accurately describe the entity (e.g., "wandb entity name" or "WandB entity/username for the project") so the flag --wandb_entity_name clearly documents its purpose in the argument parser.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/deploy.sh

+# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
+for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
+for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
+for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done


🛠️ Refactor suggestion | 🟠 Major

Add a shebang line to specify the shell interpreter.

The script lacks a shebang, which can cause portability issues and unexpected behavior. Static analysis (SC2148) correctly flags this.

Proposed fix

+#!/bin/bash + # Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues

for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

#!/bin/bash

# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues

for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

🧰 Tools

🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🤖 Prompt for AI Agents

In `@examples/evaluation/deploy.sh` around lines 1 - 4, The shell script is missing a shebang so its interpreter is ambiguous; add a shebang as the very first line (for example "#!/usr/bin/env bash") to guarantee POSIX/Bash behavior for the for-loop unsets (the lines using env | grep ^SLURM_/PMI_/PMIX_ and unset -v), and optionally make the script executable (chmod +x) after updating the file.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/deploy.sh

+MEGATRON_CHECKPOINT=$1
+NUM_REPLICAS=$2
+NUM_GPUS=$3
+python \
+  /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
+  --megatron_checkpoint $MEGATRON_CHECKPOINT \
+  --model_id megatron_model \
+  --host 0.0.0.0 \
+  --port 8000 \
+  --num_gpus $NUM_GPUS \
+  --num_replicas $NUM_REPLICAS \
+  --tensor_model_parallel_size 1 \
+  --pipeline_model_parallel_size 1 \
+  --context_parallel_size 1 


⚠️ Potential issue | 🟡 Minor

Quote variables to prevent word splitting and glob expansion.

The variables $MEGATRON_CHECKPOINT, $NUM_REPLICAS, and $NUM_GPUS should be quoted to handle paths with spaces or special characters safely.

Proposed fix

MEGATRON_CHECKPOINT=$1 NUM_REPLICAS=$2 NUM_GPUS=$3 python \ /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \ - --megatron_checkpoint $MEGATRON_CHECKPOINT \ + --megatron_checkpoint "$MEGATRON_CHECKPOINT" \ --model_id megatron_model \ --host 0.0.0.0 \ --port 8000 \ - --num_gpus $NUM_GPUS \ - --num_replicas $NUM_REPLICAS \ + --num_gpus "$NUM_GPUS" \ + --num_replicas "$NUM_REPLICAS" \ --tensor_model_parallel_size 1 \ --pipeline_model_parallel_size 1 \ - --context_parallel_size 1 + --context_parallel_size 1

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

MEGATRON_CHECKPOINT=$1

NUM_REPLICAS=$2

NUM_GPUS=$3

python \

/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \

--megatron_checkpoint $MEGATRON_CHECKPOINT \

--model_id megatron_model \

--host 0.0.0.0 \

--port 8000 \

--num_gpus $NUM_GPUS \

--num_replicas $NUM_REPLICAS \

--tensor_model_parallel_size 1 \

--pipeline_model_parallel_size 1 \

--context_parallel_size 1

MEGATRON_CHECKPOINT=$1

NUM_REPLICAS=$2

NUM_GPUS=$3

python \

/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \

--megatron_checkpoint "$MEGATRON_CHECKPOINT" \

--model_id megatron_model \

--host 0.0.0.0 \

--port 8000 \

--num_gpus "$NUM_GPUS" \

--num_replicas "$NUM_REPLICAS" \

--tensor_model_parallel_size 1 \

--pipeline_model_parallel_size 1 \

--context_parallel_size 1

🤖 Prompt for AI Agents

In `@examples/evaluation/deploy.sh` around lines 6 - 19, Wrap the positional variables in quotes to prevent word splitting and glob expansion: update the python invocation in deploy.sh to use "$MEGATRON_CHECKPOINT", "$NUM_REPLICAS", and "$NUM_GPUS" wherever they are interpolated (the --megatron_checkpoint, --num_gpus, and --num_replicas flags) so paths or values with spaces or glob characters are passed safely to deploy_ray_inframework.py.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/eval.sh

+# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
+for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
+for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
+for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done


🛠️ Refactor suggestion | 🟠 Major

Add a shebang line to specify the shell interpreter.

Same as deploy.sh, this script needs a shebang for portability and to ensure the correct shell is used.

Proposed fix

+#!/bin/bash + # Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues

for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

#!/bin/bash

# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues

for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done

for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done

🧰 Tools

🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🤖 Prompt for AI Agents

In `@examples/evaluation/eval.sh` around lines 1 - 4, Add a POSIX bash shebang as the first line of the script (e.g. #!/usr/bin/env bash) so the interpreter is explicit and portable; insert it above the existing for loops that unset SLURM_/PMI_/PMIX_ (the lines beginning with for i in $(env | grep ^SLURM_, for i in $(env | grep ^PMI_, and for i in $(env | grep ^PMIX_)) and ensure the file remains executable (chmod +x) after the change.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/eval.sh

+parallelism = $PARALLELISM
+request_timeout = 1000
+temperature = None
+top_p = None
+top_k = None
+output_dir = "/$OUTPUT_DIR/results/"


⚠️ Potential issue | 🟡 Minor

Shell variable interpolation in heredoc could cause issues.

$PARALLELISM and $OUTPUT_DIR are interpolated directly into Python code. If these come from untrusted sources, this could be a code injection vector.

Line 35: The path "/$OUTPUT_DIR/results/" prepends / which will cause double slashes if OUTPUT_DIR is already an absolute path (e.g., /workspace becomes //workspace/results/).

Proposed fix for path construction

-output_dir = "/$OUTPUT_DIR/results/" +output_dir = "${OUTPUT_DIR}/results/"

Or in Python, use os.path.join after passing the path as an environment variable instead of heredoc interpolation.

🤖 Prompt for AI Agents

In `@examples/evaluation/eval.sh` around lines 30 - 35, The heredoc directly interpolates $PARALLELISM and $OUTPUT_DIR into generated Python config, creating injection risk and path issues; instead, stop injecting raw vars into the Python block—pass PARALLELISM and OUTPUT_DIR as environment variables or CLI args and have the Python runtime read them (e.g., os.environ.get("PARALLELISM") for numeric validation and os.environ.get("OUTPUT_DIR") for path), validate/cast PARALLELISM to an int, strip any leading slashes from OUTPUT_DIR or use os.path.join to construct output_dir (avoid prepending a literal "/" in the heredoc), and ensure any user-provided values are sanitized before use in functions that consume them.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/launch_evaluation_pipeline.py

+        if runs:
+            run_id = runs[0].id
+            print(f"Found run with ID: {run_id}")
+
+        wandb_run = wandb.init(
+            project=args.wandb_project_name,
+            entity=args.wandb_entity_name,
+            id=run_id,
+            resume="allow",
+        )


⚠️ Potential issue | 🔴 Critical

Potential NameError if no matching wandb run is found.

If runs is empty (no matching run found), run_id will be undefined when used on line 153, causing a NameError. The code should handle this case.

Proposed fix

if runs: run_id = runs[0].id print(f"Found run with ID: {run_id}") + else: + run_id = None + print("No existing run found, creating new run") wandb_run = wandb.init( project=args.wandb_project_name, entity=args.wandb_entity_name, - id=run_id, + id=run_id, # Will create new run if None resume="allow", )

🤖 Prompt for AI Agents

In `@examples/evaluation/launch_evaluation_pipeline.py` around lines 146 - 155, The code assumes run_id exists after checking runs; if runs is empty this causes a NameError when calling wandb.init. Fix by explicitly setting run_id = runs[0].id if runs else None (or log and exit) and pass the id to wandb.init only when run_id is not None (e.g., build kwargs = {"project": args.wandb_project_name, "entity": args.wandb_entity_name, "resume": "allow"}; if run_id: kwargs["id"] = run_id; wandb.init(**kwargs)). Update the block around runs, run_id, and wandb.init accordingly and add a log message when no matching run is found.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/utils/executors.py

+def slurm_executor(
+    account: str,
+    partition: str,
+    nodes: int,
+    num_gpus_per_node: int,
+    time_limit: str = "00:30:00",
+    container_image: str = "nvcr.io/nvidia/nemo:dev",
+    custom_mounts: List[str] = [],
+    custom_env_vars: Dict[str, str] = {},
+    hf_token: str = None,
+) -> run.SlurmExecutor:


⚠️ Potential issue | 🟠 Major

Mutable default arguments and implicit Optional types.

Lines 30-31 use mutable defaults ([] and {}), which is a classic Python pitfall. Line 32 uses implicit Optional. Static analysis (B006, RUF013) correctly flags these.

Proposed fix

def slurm_executor( account: str, partition: str, nodes: int, num_gpus_per_node: int, time_limit: str = "00:30:00", container_image: str = "nvcr.io/nvidia/nemo:dev", - custom_mounts: List[str] = [], - custom_env_vars: Dict[str, str] = {}, - hf_token: str = None, + custom_mounts: List[str] | None = None, + custom_env_vars: Dict[str, str] | None = None, + hf_token: str | None = None, ) -> run.SlurmExecutor:

Then initialize inside the function:

custom_mounts = custom_mounts or [] custom_env_vars = custom_env_vars or {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def slurm_executor(

account: str,

partition: str,

nodes: int,

num_gpus_per_node: int,

time_limit: str = "00:30:00",

container_image: str = "nvcr.io/nvidia/nemo:dev",

custom_mounts: List[str] = [],

custom_env_vars: Dict[str, str] = {},

hf_token: str = None,

) -> run.SlurmExecutor:

def slurm_executor(

account: str,

partition: str,

nodes: int,

num_gpus_per_node: int,

time_limit: str = "00:30:00",

container_image: str = "nvcr.io/nvidia/nemo:dev",

custom_mounts: List[str] | None = None,

custom_env_vars: Dict[str, str] | None = None,

hf_token: str | None = None,

) -> run.SlurmExecutor:

🧰 Tools

🪛 Ruff (0.14.13)

30-30: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

31-31: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

32-32: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents

In `@examples/evaluation/utils/executors.py` around lines 23 - 33, The slurm_executor function uses mutable default arguments (custom_mounts: List[str] = [] and custom_env_vars: Dict[str, str] = {}) and an implicit Optional for hf_token; change the defaults to None (e.g., custom_mounts: Optional[List[str]] = None, custom_env_vars: Optional[Dict[str,str]] = None, hf_token: Optional[str] = None) and then inside slurm_executor immediately initialize them with: custom_mounts = custom_mounts or [] and custom_env_vars = custom_env_vars or {} so callers behave the same while avoiding mutable default pitfalls and making the Optional intent explicit.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/utils/executors.py

+    hf_token: str = None,
+    custom_env_vars: Dict[str, str] = None,


⚠️ Potential issue | 🟠 Major

Same implicit Optional issue.

Lines 78-79 have the same implicit Optional problem as the slurm_executor function.

Proposed fix

- hf_token: str = None, - custom_env_vars: Dict[str, str] = None, + hf_token: str | None = None, + custom_env_vars: Dict[str, str] | None = None,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

hf_token: str = None,

custom_env_vars: Dict[str, str] = None,

hf_token: str | None = None,

custom_env_vars: Dict[str, str] | None = None,

🧰 Tools

🪛 Ruff (0.14.13)

78-78: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

79-79: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents

In `@examples/evaluation/utils/executors.py` around lines 78 - 79, The parameters hf_token and custom_env_vars in the function signature are typed as str and Dict[...] but default to None; change their types to Optional[str] and Optional[Dict[str, str]] and import typing.Optional (or use from typing import Optional) so the annotations match the None default; update the signature(s) that include hf_token and custom_env_vars (same pattern as the earlier slurm_executor fix) and adjust any callers or type checks if needed to handle Optional values.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/utils/executors.py

+        "PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True",
+        "TOKENIZERS_PARALLELISM": "False",
+        "TRANSFORMERS_OFFLINE": "1",
+        "HF_HOME": "/nemo-workspace/pagaray/hf_cache",


⚠️ Potential issue | 🟠 Major

Hardcoded developer-specific path.

HF_HOME is set to /nemo-workspace/pagaray/hf_cache, which appears to be a developer-specific path. This should be parameterized or use a more generic default like /nemo-workspace/.cache/huggingface.

Proposed fix

- "HF_HOME": "/nemo-workspace/pagaray/hf_cache", + "HF_HOME": "/nemo-workspace/.cache/huggingface",

Or make it a parameter with a sensible default.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"HF_HOME": "/nemo-workspace/pagaray/hf_cache",

"HF_HOME": "/nemo-workspace/.cache/huggingface",

🤖 Prompt for AI Agents

In `@examples/evaluation/utils/executors.py` at line 100, Replace the hardcoded developer path for "HF_HOME": "/nemo-workspace/pagaray/hf_cache" with a parameterized or environment-driven value; update the code that builds the environment dict (where "HF_HOME" is set) to read from an optional function/class parameter or os.environ.get("HF_HOME", "/nemo-workspace/.cache/huggingface") so callers can override it while using "/nemo-workspace/.cache/huggingface" as the sensible default.

coderabbitai · 2026-01-24T00:48:51Z

examples/evaluation/utils/executors.py

+        spec_kwargs={
+            "schedulerName": "runai-scheduler",
+            "image_pull_secrets": ["dockerregistry-dockerregistry-pagaray-ngc"],
+        },  # e.g. Run:ai


⚠️ Potential issue | 🟠 Major

Hardcoded developer-specific image pull secret.

image_pull_secrets contains "dockerregistry-dockerregistry-pagaray-ngc", which includes a developer name. This should be parameterized.

Proposed fix

Add a parameter to the function:

def kuberay_executor( nodes: int, num_gpus_per_node: int, dgxc_pvc_mount_path: str, dgxc_pvc_claim_name: str, namespace: str = "default", ray_version: str = "2.43.0", container_image: str = "", head_cpu: str = "8", head_memory: str = "32Gi", hf_token: str | None = None, custom_env_vars: Dict[str, str] | None = None, + image_pull_secrets: List[str] | None = None, ):

Then use it in spec_kwargs:

spec_kwargs={ "schedulerName": "runai-scheduler", - "image_pull_secrets": ["dockerregistry-dockerregistry-pagaray-ngc"], + "image_pull_secrets": image_pull_secrets or [], },

🤖 Prompt for AI Agents

In `@examples/evaluation/utils/executors.py` around lines 132 - 135, The image_pull_secrets value is hardcoded; change the function that builds spec_kwargs (referenced by spec_kwargs and image_pull_secrets in examples/evaluation/utils/executors.py) to accept a new parameter (e.g., image_pull_secret or image_pull_secrets) and use that parameter inside spec_kwargs instead of the literal "dockerregistry-dockerregistry-pagaray-ngc"; make the parameter optional with a sensible default (None or empty list), convert a single string input into the expected list format if necessary, and ensure callers of the function are updated to pass the appropriate secret name where needed.

Signed-off-by: oliver könig <okoenig@nvidia.com>

chore: Add evaluation pipeline

84c8c0e

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 7, 2026 14:21 Inactive

refactor

ab8060c

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 7, 2026 15:23 Inactive

ko3n1g requested a review from suiyoubi January 7, 2026 15:25

revert

8e5256a

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 8, 2026 13:19 Inactive

refactor

81e0a4b

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g requested review from a team, erhoo82 and malay-nagda as code owners January 9, 2026 15:39

copy-pr-bot bot temporarily deployed to nemo-ci January 9, 2026 15:39 Inactive

upload to wandb

01883ec

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci January 9, 2026 17:50 Error

write to wandb

74a02e5

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci January 9, 2026 17:59 Error

docstrings

7696d43

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci January 9, 2026 18:02 Error

chtruong814 previously approved these changes Jan 9, 2026

View reviewed changes

linting

9b40152

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g dismissed chtruong814’s stale review via 9b40152 January 9, 2026 18:13

copy-pr-bot bot had a problem deploying to nemo-ci January 9, 2026 18:13 Error

copy-pr-bot bot had a problem deploying to test January 9, 2026 18:14 Error

copyright

b932ca4

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot had a problem deploying to nemo-ci January 9, 2026 18:17 Error

copy-pr-bot bot had a problem deploying to test January 9, 2026 18:17 Error

add argument builder

d3251cc

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci January 9, 2026 18:18 Inactive

copy-pr-bot bot temporarily deployed to test January 9, 2026 18:19 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 08:35 Inactive

ko3n1g added 19 commits January 19, 2026 18:46

update

4aff1d3

Signed-off-by: oliver könig <okoenig@nvidia.com>

d(0GL9TJ2y{d6(w0/<(V

a74faba

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

4d8f1c8

Signed-off-by: oliver könig <okoenig@nvidia.com>

/examples/evaluation

34d14f6

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

ea42d5a

Signed-off-by: oliver könig <okoenig@nvidia.com>

fixes

d07b929

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

1e226ae

Signed-off-by: oliver könig <okoenig@nvidia.com>

cleanup

653f35e

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

69f22ae

Signed-off-by: oliver könig <okoenig@nvidia.com>

cleanup

39e435f

Signed-off-by: oliver könig <okoenig@nvidia.com>

attach to existing run

5b6b7a3

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

d5aac3e

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

ddaaf99

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

c598a42

Signed-off-by: oliver könig <okoenig@nvidia.com>

Merge remote-tracking branch 'origin/main' into ko3n1g/chore/evaluati…

5c075ee

…on-pipeline

status

896ede1

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

c417081

Signed-off-by: oliver könig <okoenig@nvidia.com>

fix

75b9d61

Signed-off-by: oliver könig <okoenig@nvidia.com>

cleanup

3b78b5e

Signed-off-by: oliver könig <okoenig@nvidia.com>

chtruong814 previously approved these changes Jan 24, 2026

View reviewed changes

ko3n1g added 2 commits January 24, 2026 00:40

copyright

b2f092a

Signed-off-by: oliver könig <okoenig@nvidia.com>

Merge remote-tracking branch 'origin/main' into ko3n1g/chore/evaluati…

4308a35

…on-pipeline

coderabbitai bot reviewed Jan 24, 2026

View reviewed changes

ko3n1g added 2 commits January 24, 2026 00:53

desc

d86bbfe

Signed-off-by: oliver könig <okoenig@nvidia.com>

fixes

7e335c1

Signed-off-by: oliver könig <okoenig@nvidia.com>

coderabbitai bot mentioned this pull request Feb 3, 2026

chore(fix): Deployment parallelism #2189

Merged

5 tasks

coderabbitai bot mentioned this pull request Feb 17, 2026

cp: chore(fix): Deployment parallelism (2189) into r0.3.0 #2406

Closed

coderabbitai bot mentioned this pull request Feb 25, 2026

select_num_devices_per_node #2123

Merged

5 tasks

coderabbitai bot mentioned this pull request Mar 7, 2026

Ko3n1g/ci/disable nvrx #2694

Closed

5 tasks

		@@ -0,0 +1,69 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.

		hf_token: str = None,
		custom_env_vars: Dict[str, str] = None,

	"HF_HOME": "/nemo-workspace/pagaray/hf_cache",
	"HF_HOME": "/nemo-workspace/.cache/huggingface",

Conversation

ko3n1g commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

chtruong814 left a comment

Choose a reason for hiding this comment

Uh oh!

chtruong814 Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jan 24, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko3n1g commented Jan 7, 2026 •

edited by coderabbitai bot

Loading