Skip to content

chore: Add evaluation pipeline#1876

Merged
ko3n1g merged 44 commits intomainfrom
ko3n1g/chore/evaluation-pipeline
Jan 24, 2026
Merged

chore: Add evaluation pipeline#1876
ko3n1g merged 44 commits intomainfrom
ko3n1g/chore/evaluation-pipeline

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Jan 7, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

Release Notes

  • New Features
    • Added evaluation pipeline orchestration with support for SLURM and Kubernetes/DGXCloud cluster deployments
    • Added comprehensive CLI argument parsing with environment variable configuration support
    • Added automated deployment and evaluation execution scripts with server readiness checks
    • Added Weights & Biases integration for experiment tracking and results logging
    • Added Ray-based job execution and monitoring capabilities

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
chtruong814
chtruong814 previously approved these changes Jan 9, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
ko3n1g added 19 commits January 19, 2026 18:46
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
chtruong814
chtruong814 previously approved these changes Jan 24, 2026
Copy link
Copy Markdown
Contributor

@chtruong814 chtruong814 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment about copyright years. Please start using 2026 for new files.

@@ -0,0 +1,69 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please start using 2026 for the copyright year on new files.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 24, 2026

📝 Walkthrough

Walkthrough

This PR introduces an evaluation pipeline infrastructure for Megatron-Bridge. It adds CLI utilities, shell scripts, and Python modules to handle deployment configuration, evaluation job orchestration across SLURM and Kubernetes environments, and executor setup with appropriate cluster settings.

Changes

Cohort / File(s) Summary
CLI Argument Utilities
examples/evaluation/argument_builder.py, examples/evaluation/argument_parser.py
argument_builder.py adds three functions for constructing CLI argument strings from environment variables: list_of_strings(), normalize_arg_name(), and build_cli_args_from_env_vars(). argument_parser.py defines a comprehensive argparse-based CLI with helper utilities, deployment settings (checkpoint, host, port, parallelism), evaluation parameters, SLURM/DGXCloud config, logging, and tokenizer arguments via parse_cli_args().
Evaluation Pipeline Orchestration
examples/evaluation/launch_evaluation_pipeline.py
Main orchestration script that selects executor type (SLURM or KubeRay) based on input, configures signal handling for job termination, manages RayJob lifecycle (create/start/wait/monitor), streams logs, and handles optional WANDB integration for result logging. Includes CustomJobDetailsRay class with specialized log path property.
Deployment & Evaluation Scripts
examples/evaluation/deploy.sh, examples/evaluation/eval.sh
deploy.sh unsets MPI environment variables and launches Megatron Ray deployment with fixed parallel sizes. eval.sh unsets MPI variables, installs dependencies, checks server readiness, builds evaluation configuration objects, executes evaluation workflow against remote endpoint, and performs cleanup.
Cluster Executor Factories
examples/evaluation/utils/executors.py
Provides two factory functions: slurm_executor() configures a Slurm-based executor with cluster, container, and HF token settings; kuberay_executor() constructs a KubeRay cluster definition with worker groups, resource requests, extensive environment variables, and persistent volume configuration.
Script Maintenance
scripts/performance/argument_parser.py
Removes unused comment block label in Testing arguments section (−2 lines).

Sequence Diagram(s)

sequenceDiagram
    participant User as User/CLI
    participant Pipeline as launch_evaluation_pipeline.py
    participant Executor as Executor (SLURM/KubeRay)
    participant RayJob as RayJob Orchestrator
    participant Deploy as deploy.sh
    participant Eval as eval.sh
    participant Service as Ray/Megatron Service
    participant Evaluator as nemo_evaluator

    User->>Pipeline: Execute with config args
    Pipeline->>Pipeline: parse_cli_args()
    alt dgxc_cluster set
        Pipeline->>Executor: kuberay_executor(config)
    else
        Pipeline->>Executor: slurm_executor(config)
    end
    Executor-->>Pipeline: Configured Executor
    Pipeline->>RayJob: Create RayJob with composite bash command
    Pipeline->>RayJob: Start job (deploy.sh + eval.sh)
    RayJob->>Deploy: Execute deploy.sh
    Deploy->>Service: Launch Ray + Megatron model
    Service-->>Deploy: Service ready
    RayJob->>Eval: Execute eval.sh
    Eval->>Eval: check_endpoint() readiness
    Eval->>Evaluator: Build ApiEndpoint, EvaluationTarget, ConfigParams
    Evaluator->>Service: Send evaluation requests
    Service-->>Evaluator: Model responses
    Evaluator-->>Eval: Evaluation results
    Eval->>Eval: Write results.yml
    Pipeline->>RayJob: Monitor job status (RUNNING)
    RayJob-->>Pipeline: Job complete, results available
    Pipeline->>Pipeline: Read results.yml
    alt WANDB configured
        Pipeline->>Pipeline: Log results to WANDB
    end
    Pipeline->>RayJob: Stop job + cleanup
    Pipeline-->>User: Return results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR introduces a major new evaluation pipeline with 588+ lines of new code across 6 files without corresponding test files, violating explicit repository testing requirements in CONTRIBUTING.md and the PR template. Add unit tests for argument builders and parsers, and functional tests for the evaluation pipeline with SLURM and KubeRay executors to the tests/ directory, and document test results in the PR description.
Title check ❓ Inconclusive The title is overly broad and vague—'Add evaluation pipeline' does not convey the specific implementation details or scope of the changes, which include multiple utility functions, shell scripts, and executor configurations. Consider a more specific title that highlights the main components added (e.g., 'Add CLI-based evaluation pipeline utilities and executors') to better communicate the scope to reviewers.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Fix all issues with AI agents
In `@examples/evaluation/argument_parser.py`:
- Around line 168-173: The help text for the argument added via
logging_args.add_argument("--wandb_entity_name", ...) is incorrect (it currently
says "wandb project name"); update the help string to accurately describe the
entity (e.g., "wandb entity name" or "WandB entity/username for the project") so
the flag --wandb_entity_name clearly documents its purpose in the argument
parser.

In `@examples/evaluation/deploy.sh`:
- Around line 6-19: Wrap the positional variables in quotes to prevent word
splitting and glob expansion: update the python invocation in deploy.sh to use
"$MEGATRON_CHECKPOINT", "$NUM_REPLICAS", and "$NUM_GPUS" wherever they are
interpolated (the --megatron_checkpoint, --num_gpus, and --num_replicas flags)
so paths or values with spaces or glob characters are passed safely to
deploy_ray_inframework.py.
- Around line 1-4: The shell script is missing a shebang so its interpreter is
ambiguous; add a shebang as the very first line (for example "#!/usr/bin/env
bash") to guarantee POSIX/Bash behavior for the for-loop unsets (the lines using
env | grep ^SLURM_/PMI_/PMIX_ and unset -v), and optionally make the script
executable (chmod +x) after updating the file.

In `@examples/evaluation/eval.sh`:
- Around line 1-4: Add a POSIX bash shebang as the first line of the script
(e.g. #!/usr/bin/env bash) so the interpreter is explicit and portable; insert
it above the existing for loops that unset SLURM_/PMI_/PMIX_ (the lines
beginning with for i in $(env | grep ^SLURM_, for i in $(env | grep ^PMI_, and
for i in $(env | grep ^PMIX_)) and ensure the file remains executable (chmod +x)
after the change.
- Around line 30-35: The heredoc directly interpolates $PARALLELISM and
$OUTPUT_DIR into generated Python config, creating injection risk and path
issues; instead, stop injecting raw vars into the Python block—pass PARALLELISM
and OUTPUT_DIR as environment variables or CLI args and have the Python runtime
read them (e.g., os.environ.get("PARALLELISM") for numeric validation and
os.environ.get("OUTPUT_DIR") for path), validate/cast PARALLELISM to an int,
strip any leading slashes from OUTPUT_DIR or use os.path.join to construct
output_dir (avoid prepending a literal "/" in the heredoc), and ensure any
user-provided values are sanitized before use in functions that consume them.

In `@examples/evaluation/launch_evaluation_pipeline.py`:
- Around line 146-155: The code assumes run_id exists after checking runs; if
runs is empty this causes a NameError when calling wandb.init. Fix by explicitly
setting run_id = runs[0].id if runs else None (or log and exit) and pass the id
to wandb.init only when run_id is not None (e.g., build kwargs = {"project":
args.wandb_project_name, "entity": args.wandb_entity_name, "resume": "allow"};
if run_id: kwargs["id"] = run_id; wandb.init(**kwargs)). Update the block around
runs, run_id, and wandb.init accordingly and add a log message when no matching
run is found.
- Around line 1-14: Move the shebang '#!/usr/bin/env python3' to be the very
first line of the file (no preceding blank lines or comments) so it precedes the
copyright header; ensure the rest of the header and file contents remain
unchanged and that the shebang is exactly '#!/usr/bin/env python3'.
- Around line 133-134: The code opens results.yml directly which can raise
FileNotFoundError if the evaluation didn't produce it; update the block that
reads os.path.join(args.output_dir, "results", "results.yml") to first check
existence (os.path.exists) or wrap the open/yaml.safe_load call in a try/except
FileNotFoundError, then handle the missing file by logging a clear error via the
existing logger, exiting gracefully, or setting a sensible default for results;
ensure you reference args.output_dir, the "results.yml" path, and the variable
results/yaml.safe_load when implementing the check/exception handling.

In `@examples/evaluation/utils/executors.py`:
- Around line 78-79: The parameters hf_token and custom_env_vars in the function
signature are typed as str and Dict[...] but default to None; change their types
to Optional[str] and Optional[Dict[str, str]] and import typing.Optional (or use
from typing import Optional) so the annotations match the None default; update
the signature(s) that include hf_token and custom_env_vars (same pattern as the
earlier slurm_executor fix) and adjust any callers or type checks if needed to
handle Optional values.
- Line 100: Replace the hardcoded developer path for "HF_HOME":
"/nemo-workspace/pagaray/hf_cache" with a parameterized or environment-driven
value; update the code that builds the environment dict (where "HF_HOME" is set)
to read from an optional function/class parameter or os.environ.get("HF_HOME",
"/nemo-workspace/.cache/huggingface") so callers can override it while using
"/nemo-workspace/.cache/huggingface" as the sensible default.
- Around line 23-33: The slurm_executor function uses mutable default arguments
(custom_mounts: List[str] = [] and custom_env_vars: Dict[str, str] = {}) and an
implicit Optional for hf_token; change the defaults to None (e.g.,
custom_mounts: Optional[List[str]] = None, custom_env_vars:
Optional[Dict[str,str]] = None, hf_token: Optional[str] = None) and then inside
slurm_executor immediately initialize them with: custom_mounts = custom_mounts
or [] and custom_env_vars = custom_env_vars or {} so callers behave the same
while avoiding mutable default pitfalls and making the Optional intent explicit.
- Around line 132-135: The image_pull_secrets value is hardcoded; change the
function that builds spec_kwargs (referenced by spec_kwargs and
image_pull_secrets in examples/evaluation/utils/executors.py) to accept a new
parameter (e.g., image_pull_secret or image_pull_secrets) and use that parameter
inside spec_kwargs instead of the literal
"dockerregistry-dockerregistry-pagaray-ngc"; make the parameter optional with a
sensible default (None or empty list), convert a single string input into the
expected list format if necessary, and ensure callers of the function are
updated to pass the appropriate secret name where needed.
🧹 Nitpick comments (3)
examples/evaluation/argument_builder.py (1)

20-22: Duplicate function definition.

list_of_strings is defined identically in examples/evaluation/argument_parser.py (lines 17-19) and scripts/performance/argument_parser.py (lines 29-31). Consider extracting to a shared utility module to avoid DRY violations.

examples/evaluation/argument_parser.py (1)

17-24: Duplicate utility functions.

Both list_of_strings and to_dict are identical to implementations in scripts/performance/argument_parser.py (lines 29-31 and 46-48). Consider consolidating into a shared utility module.

examples/evaluation/launch_evaluation_pipeline.py (1)

113-116: Long command string reduces readability.

The command string spans multiple shell operations with pipes and background processes. Consider breaking it into a multi-line string or a separate script for maintainability.

Proposed refactor
    deploy_cmd = f"bash /opt/Megatron-Bridge/examples/evaluation/deploy.sh {args.megatron_checkpoint} {args.num_replicas} {args.num_gpus}"
    eval_cmd = f"bash /opt/Megatron-Bridge/examples/evaluation/eval.sh {args.output_dir} {args.parallelism}"
    
    command = (
        f"{deploy_cmd} | tee -a deploy.log & "
        f"sleep 120; "
        f"{eval_cmd} | tee -a eval.log"
    )
    
    job.start(command=command, workdir=None)

Comment on lines +168 to +173
logging_args.add_argument(
"--wandb_entity_name",
type=str,
help="wandb project name",
required=False,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copy-paste error in help text.

The help text for --wandb_entity_name says "wandb project name" but should describe the entity name.

Proposed fix
     logging_args.add_argument(
         "--wandb_entity_name",
         type=str,
-        help="wandb project name",
+        help="wandb entity name",
         required=False,
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging_args.add_argument(
"--wandb_entity_name",
type=str,
help="wandb project name",
required=False,
)
logging_args.add_argument(
"--wandb_entity_name",
type=str,
help="wandb entity name",
required=False,
)
🤖 Prompt for AI Agents
In `@examples/evaluation/argument_parser.py` around lines 168 - 173, The help text
for the argument added via logging_args.add_argument("--wandb_entity_name", ...)
is incorrect (it currently says "wandb project name"); update the help string to
accurately describe the entity (e.g., "wandb entity name" or "WandB
entity/username for the project") so the flag --wandb_entity_name clearly
documents its purpose in the argument parser.

Comment on lines +1 to +4
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add a shebang line to specify the shell interpreter.

The script lacks a shebang, which can cause portability issues and unexpected behavior. Static analysis (SC2148) correctly flags this.

Proposed fix
+#!/bin/bash
+
 # Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
 for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
 for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
 for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
#!/bin/bash
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🤖 Prompt for AI Agents
In `@examples/evaluation/deploy.sh` around lines 1 - 4, The shell script is
missing a shebang so its interpreter is ambiguous; add a shebang as the very
first line (for example "#!/usr/bin/env bash") to guarantee POSIX/Bash behavior
for the for-loop unsets (the lines using env | grep ^SLURM_/PMI_/PMIX_ and unset
-v), and optionally make the script executable (chmod +x) after updating the
file.

Comment on lines +6 to +19
MEGATRON_CHECKPOINT=$1
NUM_REPLICAS=$2
NUM_GPUS=$3
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--megatron_checkpoint $MEGATRON_CHECKPOINT \
--model_id megatron_model \
--host 0.0.0.0 \
--port 8000 \
--num_gpus $NUM_GPUS \
--num_replicas $NUM_REPLICAS \
--tensor_model_parallel_size 1 \
--pipeline_model_parallel_size 1 \
--context_parallel_size 1 No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote variables to prevent word splitting and glob expansion.

The variables $MEGATRON_CHECKPOINT, $NUM_REPLICAS, and $NUM_GPUS should be quoted to handle paths with spaces or special characters safely.

Proposed fix
 MEGATRON_CHECKPOINT=$1
 NUM_REPLICAS=$2
 NUM_GPUS=$3
 python \
   /opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
-  --megatron_checkpoint $MEGATRON_CHECKPOINT \
+  --megatron_checkpoint "$MEGATRON_CHECKPOINT" \
   --model_id megatron_model \
   --host 0.0.0.0 \
   --port 8000 \
-  --num_gpus $NUM_GPUS \
-  --num_replicas $NUM_REPLICAS \
+  --num_gpus "$NUM_GPUS" \
+  --num_replicas "$NUM_REPLICAS" \
   --tensor_model_parallel_size 1 \
   --pipeline_model_parallel_size 1 \
-  --context_parallel_size 1 
+  --context_parallel_size 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
MEGATRON_CHECKPOINT=$1
NUM_REPLICAS=$2
NUM_GPUS=$3
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--megatron_checkpoint $MEGATRON_CHECKPOINT \
--model_id megatron_model \
--host 0.0.0.0 \
--port 8000 \
--num_gpus $NUM_GPUS \
--num_replicas $NUM_REPLICAS \
--tensor_model_parallel_size 1 \
--pipeline_model_parallel_size 1 \
--context_parallel_size 1
MEGATRON_CHECKPOINT=$1
NUM_REPLICAS=$2
NUM_GPUS=$3
python \
/opt/Export-Deploy/scripts/deploy/nlp/deploy_ray_inframework.py \
--megatron_checkpoint "$MEGATRON_CHECKPOINT" \
--model_id megatron_model \
--host 0.0.0.0 \
--port 8000 \
--num_gpus "$NUM_GPUS" \
--num_replicas "$NUM_REPLICAS" \
--tensor_model_parallel_size 1 \
--pipeline_model_parallel_size 1 \
--context_parallel_size 1
🤖 Prompt for AI Agents
In `@examples/evaluation/deploy.sh` around lines 6 - 19, Wrap the positional
variables in quotes to prevent word splitting and glob expansion: update the
python invocation in deploy.sh to use "$MEGATRON_CHECKPOINT", "$NUM_REPLICAS",
and "$NUM_GPUS" wherever they are interpolated (the --megatron_checkpoint,
--num_gpus, and --num_replicas flags) so paths or values with spaces or glob
characters are passed safely to deploy_ray_inframework.py.

Comment on lines +1 to +4
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add a shebang line to specify the shell interpreter.

Same as deploy.sh, this script needs a shebang for portability and to ensure the correct shell is used.

Proposed fix
+#!/bin/bash
+
 # Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
 for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
 for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
 for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
#!/bin/bash
# Unset SLURM/PMI/PMIX env vars to prevent MPI initialization issues
for i in $(env | grep ^SLURM_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMI_ | cut -d"=" -f 1); do unset -v $i; done
for i in $(env | grep ^PMIX_ | cut -d"=" -f 1); do unset -v $i; done
🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🤖 Prompt for AI Agents
In `@examples/evaluation/eval.sh` around lines 1 - 4, Add a POSIX bash shebang as
the first line of the script (e.g. #!/usr/bin/env bash) so the interpreter is
explicit and portable; insert it above the existing for loops that unset
SLURM_/PMI_/PMIX_ (the lines beginning with for i in $(env | grep ^SLURM_, for i
in $(env | grep ^PMI_, and for i in $(env | grep ^PMIX_)) and ensure the file
remains executable (chmod +x) after the change.

Comment on lines +30 to +35
parallelism = $PARALLELISM
request_timeout = 1000
temperature = None
top_p = None
top_k = None
output_dir = "/$OUTPUT_DIR/results/"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Shell variable interpolation in heredoc could cause issues.

  1. $PARALLELISM and $OUTPUT_DIR are interpolated directly into Python code. If these come from untrusted sources, this could be a code injection vector.
  2. Line 35: The path "/$OUTPUT_DIR/results/" prepends / which will cause double slashes if OUTPUT_DIR is already an absolute path (e.g., /workspace becomes //workspace/results/).
Proposed fix for path construction
-output_dir = "/$OUTPUT_DIR/results/"
+output_dir = "${OUTPUT_DIR}/results/"

Or in Python, use os.path.join after passing the path as an environment variable instead of heredoc interpolation.

🤖 Prompt for AI Agents
In `@examples/evaluation/eval.sh` around lines 30 - 35, The heredoc directly
interpolates $PARALLELISM and $OUTPUT_DIR into generated Python config, creating
injection risk and path issues; instead, stop injecting raw vars into the Python
block—pass PARALLELISM and OUTPUT_DIR as environment variables or CLI args and
have the Python runtime read them (e.g., os.environ.get("PARALLELISM") for
numeric validation and os.environ.get("OUTPUT_DIR") for path), validate/cast
PARALLELISM to an int, strip any leading slashes from OUTPUT_DIR or use
os.path.join to construct output_dir (avoid prepending a literal "/" in the
heredoc), and ensure any user-provided values are sanitized before use in
functions that consume them.

Comment on lines +146 to +155
if runs:
run_id = runs[0].id
print(f"Found run with ID: {run_id}")

wandb_run = wandb.init(
project=args.wandb_project_name,
entity=args.wandb_entity_name,
id=run_id,
resume="allow",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential NameError if no matching wandb run is found.

If runs is empty (no matching run found), run_id will be undefined when used on line 153, causing a NameError. The code should handle this case.

Proposed fix
         if runs:
             run_id = runs[0].id
             print(f"Found run with ID: {run_id}")
+        else:
+            run_id = None
+            print("No existing run found, creating new run")

         wandb_run = wandb.init(
             project=args.wandb_project_name,
             entity=args.wandb_entity_name,
-            id=run_id,
+            id=run_id,  # Will create new run if None
             resume="allow",
         )
🤖 Prompt for AI Agents
In `@examples/evaluation/launch_evaluation_pipeline.py` around lines 146 - 155,
The code assumes run_id exists after checking runs; if runs is empty this causes
a NameError when calling wandb.init. Fix by explicitly setting run_id =
runs[0].id if runs else None (or log and exit) and pass the id to wandb.init
only when run_id is not None (e.g., build kwargs = {"project":
args.wandb_project_name, "entity": args.wandb_entity_name, "resume": "allow"};
if run_id: kwargs["id"] = run_id; wandb.init(**kwargs)). Update the block around
runs, run_id, and wandb.init accordingly and add a log message when no matching
run is found.

Comment on lines +23 to +33
def slurm_executor(
account: str,
partition: str,
nodes: int,
num_gpus_per_node: int,
time_limit: str = "00:30:00",
container_image: str = "nvcr.io/nvidia/nemo:dev",
custom_mounts: List[str] = [],
custom_env_vars: Dict[str, str] = {},
hf_token: str = None,
) -> run.SlurmExecutor:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mutable default arguments and implicit Optional types.

Lines 30-31 use mutable defaults ([] and {}), which is a classic Python pitfall. Line 32 uses implicit Optional. Static analysis (B006, RUF013) correctly flags these.

Proposed fix
 def slurm_executor(
     account: str,
     partition: str,
     nodes: int,
     num_gpus_per_node: int,
     time_limit: str = "00:30:00",
     container_image: str = "nvcr.io/nvidia/nemo:dev",
-    custom_mounts: List[str] = [],
-    custom_env_vars: Dict[str, str] = {},
-    hf_token: str = None,
+    custom_mounts: List[str] | None = None,
+    custom_env_vars: Dict[str, str] | None = None,
+    hf_token: str | None = None,
 ) -> run.SlurmExecutor:

Then initialize inside the function:

    custom_mounts = custom_mounts or []
    custom_env_vars = custom_env_vars or {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def slurm_executor(
account: str,
partition: str,
nodes: int,
num_gpus_per_node: int,
time_limit: str = "00:30:00",
container_image: str = "nvcr.io/nvidia/nemo:dev",
custom_mounts: List[str] = [],
custom_env_vars: Dict[str, str] = {},
hf_token: str = None,
) -> run.SlurmExecutor:
def slurm_executor(
account: str,
partition: str,
nodes: int,
num_gpus_per_node: int,
time_limit: str = "00:30:00",
container_image: str = "nvcr.io/nvidia/nemo:dev",
custom_mounts: List[str] | None = None,
custom_env_vars: Dict[str, str] | None = None,
hf_token: str | None = None,
) -> run.SlurmExecutor:
🧰 Tools
🪛 Ruff (0.14.13)

30-30: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


31-31: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


32-32: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In `@examples/evaluation/utils/executors.py` around lines 23 - 33, The
slurm_executor function uses mutable default arguments (custom_mounts: List[str]
= [] and custom_env_vars: Dict[str, str] = {}) and an implicit Optional for
hf_token; change the defaults to None (e.g., custom_mounts: Optional[List[str]]
= None, custom_env_vars: Optional[Dict[str,str]] = None, hf_token: Optional[str]
= None) and then inside slurm_executor immediately initialize them with:
custom_mounts = custom_mounts or [] and custom_env_vars = custom_env_vars or {}
so callers behave the same while avoiding mutable default pitfalls and making
the Optional intent explicit.

Comment on lines +78 to +79
hf_token: str = None,
custom_env_vars: Dict[str, str] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same implicit Optional issue.

Lines 78-79 have the same implicit Optional problem as the slurm_executor function.

Proposed fix
-    hf_token: str = None,
-    custom_env_vars: Dict[str, str] = None,
+    hf_token: str | None = None,
+    custom_env_vars: Dict[str, str] | None = None,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
hf_token: str = None,
custom_env_vars: Dict[str, str] = None,
hf_token: str | None = None,
custom_env_vars: Dict[str, str] | None = None,
🧰 Tools
🪛 Ruff (0.14.13)

78-78: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)


79-79: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

🤖 Prompt for AI Agents
In `@examples/evaluation/utils/executors.py` around lines 78 - 79, The parameters
hf_token and custom_env_vars in the function signature are typed as str and
Dict[...] but default to None; change their types to Optional[str] and
Optional[Dict[str, str]] and import typing.Optional (or use from typing import
Optional) so the annotations match the None default; update the signature(s)
that include hf_token and custom_env_vars (same pattern as the earlier
slurm_executor fix) and adjust any callers or type checks if needed to handle
Optional values.

"PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True",
"TOKENIZERS_PARALLELISM": "False",
"TRANSFORMERS_OFFLINE": "1",
"HF_HOME": "/nemo-workspace/pagaray/hf_cache",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded developer-specific path.

HF_HOME is set to /nemo-workspace/pagaray/hf_cache, which appears to be a developer-specific path. This should be parameterized or use a more generic default like /nemo-workspace/.cache/huggingface.

Proposed fix
-        "HF_HOME": "/nemo-workspace/pagaray/hf_cache",
+        "HF_HOME": "/nemo-workspace/.cache/huggingface",

Or make it a parameter with a sensible default.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"HF_HOME": "/nemo-workspace/pagaray/hf_cache",
"HF_HOME": "/nemo-workspace/.cache/huggingface",
🤖 Prompt for AI Agents
In `@examples/evaluation/utils/executors.py` at line 100, Replace the hardcoded
developer path for "HF_HOME": "/nemo-workspace/pagaray/hf_cache" with a
parameterized or environment-driven value; update the code that builds the
environment dict (where "HF_HOME" is set) to read from an optional
function/class parameter or os.environ.get("HF_HOME",
"/nemo-workspace/.cache/huggingface") so callers can override it while using
"/nemo-workspace/.cache/huggingface" as the sensible default.

Comment on lines +132 to +135
spec_kwargs={
"schedulerName": "runai-scheduler",
"image_pull_secrets": ["dockerregistry-dockerregistry-pagaray-ngc"],
}, # e.g. Run:ai
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded developer-specific image pull secret.

image_pull_secrets contains "dockerregistry-dockerregistry-pagaray-ngc", which includes a developer name. This should be parameterized.

Proposed fix

Add a parameter to the function:

 def kuberay_executor(
     nodes: int,
     num_gpus_per_node: int,
     dgxc_pvc_mount_path: str,
     dgxc_pvc_claim_name: str,
     namespace: str = "default",
     ray_version: str = "2.43.0",
     container_image: str = "",
     head_cpu: str = "8",
     head_memory: str = "32Gi",
     hf_token: str | None = None,
     custom_env_vars: Dict[str, str] | None = None,
+    image_pull_secrets: List[str] | None = None,
 ):

Then use it in spec_kwargs:

         spec_kwargs={
             "schedulerName": "runai-scheduler",
-            "image_pull_secrets": ["dockerregistry-dockerregistry-pagaray-ngc"],
+            "image_pull_secrets": image_pull_secrets or [],
         },
🤖 Prompt for AI Agents
In `@examples/evaluation/utils/executors.py` around lines 132 - 135, The
image_pull_secrets value is hardcoded; change the function that builds
spec_kwargs (referenced by spec_kwargs and image_pull_secrets in
examples/evaluation/utils/executors.py) to accept a new parameter (e.g.,
image_pull_secret or image_pull_secrets) and use that parameter inside
spec_kwargs instead of the literal "dockerregistry-dockerregistry-pagaray-ngc";
make the parameter optional with a sensible default (None or empty list),
convert a single string input into the expected list format if necessary, and
ensure callers of the function are updated to pass the appropriate secret name
where needed.

Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants