New slurm customization parameters (account, containers) by Kipok · Pull Request #1209 · NVIDIA-NeMo/Skills

Kipok · 2026-02-03T22:04:19Z

Summary by CodeRabbit

New Features
- Added a global --account option to specify a Slurm account for job submissions.
- Added container override options (--main-container, --sandbox-container, --judge-container, --judge-server-container, --container) to select non-default images; these overrides propagate across all job/task creation flows.
Tests
- Updated generation tests to accept the new account parameter.

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai · 2026-02-03T22:09:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds optional Slurm account and per-component container-image override CLI options across multiple pipeline commands and threads them through task creation, HardwareConfig, executor creation, and sbatch submission so tasks run with specified account/container choices while falling back to existing defaults.

Changes

Cohort / File(s)	Summary
Convert CLI `nemo_skills/pipeline/convert.py`	Added `account` and `container` CLI options; resolve container as `container or container_map[(convert_from, convert_to)]` and pass `account` into `add_task`.
Eval CLI & Judge Tasks `nemo_skills/pipeline/eval.py`	Added `account`, `main_container`, `sandbox_container`, `judge_container`, `judge_server_container` options; threaded into `_create_llm_judge_tasks` and downstream task creation so main/judge/server receive container and account overrides with fallbacks.
Generate pipeline `nemo_skills/pipeline/generate.py`	Extended `_create_job_unified` and `generate` CLI with `account`, `main_container`, `sandbox_container`; client and sandbox commands prefer overrides; `HardwareConfig` now populated with `account`.
Run command & Start server `nemo_skills/pipeline/run_cmd.py`, `nemo_skills/pipeline/start_server.py`	Added `account` and sandbox/main container CLI options; `launch_server`/`run_cmd` resolve container via override or defaults and pass `account` and `sandbox_container` into `add_task`/server launch.
Evaluator context & hardware `nemo_skills/pipeline/nemo_evaluator.py`	Added `account` to evaluator CLI and `_TaskCreationContext`; `_hardware_for_group` accepts `account` and includes it in `HardwareConfig` used for sbatch kwargs.
Declarative & Exec plumbing `nemo_skills/pipeline/utils/declarative.py`, `nemo_skills/pipeline/utils/exp.py`	Added `account` field to `HardwareConfig`; extended `get_executor` and `add_task` signatures to accept `account` and `sandbox_container`, resolving account with fallback to cluster config and passing it into executor/sbatch kwargs.
Tests & Inference tweak `tests/test_generation.py`, `nemo_skills/inference/model/tool_call.py`	Updated test to pass new `account=None` to `_create_job_unified`. Minor change in `generate_async` to decrement `tokens_to_generate` by produced tokens when integer.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as "User CLI"
    participant Pipeline as "Pipeline (convert/generate/eval/run_cmd/start_server)"
    participant AddTask as "add_task"
    participant Executor as "get_executor"
    participant Slurm as "Slurm/sbatch"

    CLI->>Pipeline: invoke command with account & container overrides
    Pipeline->>AddTask: build task params (container = override or default, account)
    AddTask->>Executor: request executor(container, account, hardware...)
    Executor->>Slurm: submit job (sbatch kwargs include account)
    Slurm-->>Executor: job id
    Executor-->>AddTask: executor handle
    AddTask-->>Pipeline: task registered
    Pipeline-->>CLI: return task info

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Evaluation on OJBench #848: Overlaps add_task and CLI wiring for sandbox/container/account parameters (same call paths modified).
ENH enable sandbox env overrides in generate #1107: Touches generate and sandbox-related CLI/options, overlapping _create_job_unified changes.
Switch to building containers on-the-fly for local runs #969: Modifies container resolution and executor creation logic that intersects with get_executor/add_task adjustments.

Suggested reviewers

activatedgeek
Kipok
i-vainn

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'New slurm customization parameters (account, containers)' accurately describes the main change: adding new CLI options for Slurm account and container overrides across multiple pipeline modules.
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch igitman/account-arg

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

nemo_skills/pipeline/nemo_evaluator.py (1)

560-590: ⚠️ Potential issue | 🟡 Minor

Don’t silently ignore exclusive.

The parameter is accepted and threaded through, but never applied. Either honor it via sbatch_kwargs or fail fast when it’s set so users don’t think they’re getting exclusive nodes.
Suggested fail-fast guard
 def _hardware_for_group(
     partition: Optional[str],
     account: Optional[str],
     num_gpus: Optional[int],
     num_nodes: int,
     qos: Optional[str],
     exclusive: bool,
 ) -> HardwareConfig:
+    if exclusive:
+        raise ValueError("exclusive is not supported for nemo_evaluator jobs yet; remove --exclusive.")
     return HardwareConfig(
         partition=partition,
         account=account,
         num_gpus=num_gpus,
         num_nodes=num_nodes,
As per coding guidelines, avoid silently ignoring unused user-passed parameters. The code should fail if a user specifies an unsupported argument or if a required argument is not provided.

nemo_skills/pipeline/eval.py (1)

816-866: ⚠️ Potential issue | 🟠 Major

Account override is missing for summarize/compute-score tasks.
When a user specifies --account, these Slurm tasks still run under the default account and can fail on clusters without a default. Please propagate account=account in both add_task calls.

🔧 Proposed fix

                 summarize_task = pipeline_utils.add_task(
                     exp,
                     cmd=command,
                     task_name=f"{expname}-{benchmark}-summarize-results",
                     log_dir=f"{output_dir}/{benchmark_args.eval_subfolder}/summarized-results",
                     container=cluster_config["containers"]["nemo-skills"],
                     cluster_config=cluster_config,
+                    account=account,
                     run_after=run_after,
                     reuse_code_exp=reuse_code_exp,
                     reuse_code=reuse_code,
                     task_dependencies=(
                         dependent_tasks if cluster_config["executor"] == "slurm" else all_tasks + _task_dependencies
                     ),
                     installation_command=installation_command,
                     skip_hf_home_check=skip_hf_home_check,
                     sbatch_kwargs=sbatch_kwargs,
                 )
@@
                 score_task = pipeline_utils.add_task(
                     exp,
                     cmd=command,
                     task_name=f"{expname}-{group}-compute-score",
                     log_dir=f"{output_dir}/eval-results/{group}/compute-score-logs",
                     container=cluster_config["containers"]["nemo-skills"],
                     cluster_config=cluster_config,
+                    account=account,
                     run_after=run_after,
                     reuse_code_exp=reuse_code_exp,
                     reuse_code=reuse_code,
                     task_dependencies=(
                         group_tasks[group] if cluster_config["executor"] == "slurm" else all_tasks + _task_dependencies
                     ),
                     installation_command=installation_command,
                     skip_hf_home_check=skip_hf_home_check,
                     sbatch_kwargs=sbatch_kwargs,
                 )

As per coding guidelines, Avoid silently ignoring unused user-passed parameters. The code should fail if a user specifies an unsupported argument or if a required argument is not provided. Use dataclasses or **kwargs syntax to handle this automatically.

gwarmstrong

In general looks good. Have a minor comment about goals for the future with this, but I don't think it requires action.

gwarmstrong · 2026-02-03T22:10:16Z

nemo_skills/pipeline/eval.py

+    main_container: str = typer.Option(None, help="Override container image for the main evaluation client"),
+    sandbox_container: str = typer.Option(None, help="Override container image for the sandbox"),
+    judge_container: str = typer.Option(None, help="Override container image for GPU-based judges (comet, nvembed)"),


I think it's a little bulky to have separate override arguments for each container everywhere. Not sure that there is a better solution though. If we wanted to have overrides like we do for tools, e.g.,

++container_overrides.sandbox = "..." ++container_overrides.judge = "..."

But then the choice of key is unclear--since our "job components", e.g., Judge, main, sandbox, ... don't map cleanly to a container name (e.g., "judge" -> containers[judge_server_type], main -> containers["nemo-skills"], sandbox -> containers["sandbox"]).

So I think with the current structure, what you've done the best choice, but maybe we can eventually work toward something a little more general here.

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/pipeline/start_server.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/pipeline/eval.py (1)

807-858: ⚠️ Potential issue | 🟠 Major

account not forwarded to summarize/score tasks — can cause job rejection on enforced-accounting clusters.

The account parameter accepted by eval is propagated to main eval tasks and judge tasks, but the summarize_results task (line 807) and compute_group_score task (line 841) silently omit it. On Slurm clusters that require account specification for every job submission, these tasks will fail or be billed to the wrong account.

🐛 Proposed fix

             summarize_task = pipeline_utils.add_task(
                 exp,
                 cmd=command,
                 task_name=f"{expname}-{benchmark}-summarize-results",
                 log_dir=f"{output_dir}/{benchmark_args.eval_subfolder}/summarized-results",
                 container=cluster_config["containers"]["nemo-skills"],
                 cluster_config=cluster_config,
+                account=account,
                 run_after=run_after,

             score_task = pipeline_utils.add_task(
                 exp,
                 cmd=command,
                 task_name=f"{expname}-{group}-compute-score",
                 log_dir=f"{output_dir}/eval-results/{group}/compute-score-logs",
                 container=cluster_config["containers"]["nemo-skills"],
                 cluster_config=cluster_config,
+                account=account,
                 run_after=run_after,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/eval.py` around lines 807 - 858, The summarize_results
and compute_group_score tasks omit the account setting; when creating
summarize_task and score_task via pipeline_utils.add_task (the calls that create
summarize_task and score_task), forward the account parameter (e.g.,
account=account) so the job runs under the correct Slurm account; update both
add_task invocations (the summarize_task and score_task calls) to include
account=account (or propagate account from the surrounding eval function/args)
and ensure any sbatch_kwargs/account merging logic remains consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemo_skills/pipeline/eval.py`:
- Around line 807-858: The summarize_results and compute_group_score tasks omit
the account setting; when creating summarize_task and score_task via
pipeline_utils.add_task (the calls that create summarize_task and score_task),
forward the account parameter (e.g., account=account) so the job runs under the
correct Slurm account; update both add_task invocations (the summarize_task and
score_task calls) to include account=account (or propagate account from the
surrounding eval function/args) and ensure any sbatch_kwargs/account merging
logic remains consistent.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a89b68 and 3b64aa7.

📒 Files selected for processing (3)

nemo_skills/pipeline/eval.py
nemo_skills/pipeline/start_server.py
nemo_skills/pipeline/utils/exp.py

🚧 Files skipped from review as they are similar to previous changes (1)

nemo_skills/pipeline/utils/exp.py

Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/pipeline/eval.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/pipeline/eval.py (1)

623-674: ⚠️ Potential issue | 🟠 Major

account is silently dropped for summarize_results and compute_group_score tasks.

Both add_task calls (lines 623 and 657) are missing account=account, so a user-specified Slurm account is ignored for these tasks while being respected everywhere else. As per coding guidelines, "avoid silently ignoring user-passed parameters."

🐛 Proposed fix

             summarize_task = pipeline_utils.add_task(
                 exp,
                 cmd=command,
                 task_name=f"{expname}-{benchmark}-summarize-results",
                 log_dir=f"{output_dir}/{benchmark_args.eval_subfolder}/summarized-results",
                 container=cluster_config["containers"]["nemo-skills"],
                 cluster_config=cluster_config,
+                account=account,
                 run_after=run_after,
                 ...
             )

             score_task = pipeline_utils.add_task(
                 exp,
                 cmd=command,
                 task_name=f"{expname}-{group}-compute-score",
                 log_dir=f"{output_dir}/eval-results/{group}/compute-score-logs",
                 container=cluster_config["containers"]["nemo-skills"],
                 cluster_config=cluster_config,
+                account=account,
                 run_after=run_after,
                 ...
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/eval.py` around lines 623 - 674, The summarize_results
and compute_group_score tasks created via pipeline_utils.add_task (referenced as
summarize_task and score_task) are missing the account parameter so a
user-specified Slurm account is ignored; fix by passing account=account into
both add_task calls that create summarize_task and score_task (the two
pipeline_utils.add_task invocations building the summarize-results and
compute-score tasks) so the Slurm account is honored like other tasks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@nemo_skills/pipeline/eval.py`:
- Around line 623-674: The summarize_results and compute_group_score tasks
created via pipeline_utils.add_task (referenced as summarize_task and
score_task) are missing the account parameter so a user-specified Slurm account
is ignored; fix by passing account=account into both add_task calls that create
summarize_task and score_task (the two pipeline_utils.add_task invocations
building the summarize-results and compute-score tasks) so the Slurm account is
honored like other tasks.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b64aa7 and bbf2ea7.

📒 Files selected for processing (1)

nemo_skills/pipeline/eval.py

The test calls _create_job_unified() which now requires account as a positional argument after the addition of the --account CLI option. Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

tests/test_generation.py (1)
176-188: Exercise the non-default account path in this test.

Line 184 currently passes account=None, so this only validates the default path and does not verify that user-provided account values are threaded into the generated job metadata/command. Consider using a sentinel account (e.g., "test-account") and asserting it propagates to the expected output object/args.

As per coding guidelines, "Avoid silently ignoring user-passed parameters; fail if a required parameter is not specified or an unsupported parameter is provided."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_generation.py` around lines 176 - 188, The test currently only
exercises the default account path because _create_job_unified is called with
account=None; change the test to pass a sentinel account string (e.g.,
"test-account") to the account parameter when calling _create_job_unified and
add an assertion that this value is propagated into the returned job
metadata/command (inspect the output object(s) in the test and assert the
account field or command arg equals "test-account"); update any related
assertions that assumed None/default to reflect the explicit account so the
non-default path is validated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/test_generation.py`:
- Around line 176-188: The test currently only exercises the default account
path because _create_job_unified is called with account=None; change the test to
pass a sentinel account string (e.g., "test-account") to the account parameter
when calling _create_job_unified and add an assertion that this value is
propagated into the returned job metadata/command (inspect the output object(s)
in the test and assert the account field or command arg equals "test-account");
update any related assertions that assumed None/default to reflect the explicit
account so the non-default path is validated.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbf2ea7 and dd0d94e.

📒 Files selected for processing (1)

tests/test_generation.py

…Skills into igitman/account-arg

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Per-request inference timeout (120s) and pytest-level test timeout (300s) for test_eval_gsm8k_api and test_eval_judge_api. Prevents external API hangs from blocking CI for 8+ minutes. Signed-off-by: George Armstrong <georgea@nvidia.com>

The judge step in test_eval_judge_api runs as a separate nemo-run job and doesn't inherit ++inference.timeout from the main generation step. Pass it via --extra_judge_args to prevent judge hangs too. Signed-off-by: George Armstrong <georgea@nvidia.com>

Root cause: litellm max_retries=3 (default) compounds with inference.timeout — a single hanging request can take up to timeout * (max_retries + 1) = 120s * 4 = 480s, exceeding the 300s pytest timeout. Setting max_retries=0 ensures a timeout fails immediately without silent retries. Signed-off-by: George Armstrong <georgea@nvidia.com>

…l name Signed-off-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com>

Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Kipok added 2 commits February 3, 2026 11:52

Add an option to specify non-default slurm account

b1c2066

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Add overrides for containers

5a89b68

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

gwarmstrong approved these changes Feb 3, 2026

View reviewed changes

Kipok added the run GPU tests label Feb 3, 2026

Kipok and others added 5 commits February 3, 2026 19:34

Debugging

be1b700

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Client side cuonting

3d7aebd

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Fixed toeknizer

ef451dd

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Debugging

9a9ba45

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Merge remote-tracking branch 'origin/main' into igitman/account-arg

3b64aa7

Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/pipeline/start_server.py

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into igitman/account-arg

bbf2ea7

Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/pipeline/eval.py

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

fix: add missing account= arg to test_server_metadata_from_num_tasks

dd0d94e

The test calls _create_job_unified() which now requires account as a positional argument after the addition of the --account CLI option. Signed-off-by: George Armstrong <georgea@nvidia.com>

coderabbitai bot reviewed Feb 25, 2026

View reviewed changes

gwarmstrong added run GPU tests and removed run GPU tests labels Feb 25, 2026

Kipok added 4 commits February 25, 2026 18:03

Merge branch 'igitman/account-arg' of https://github.com/NVIDIA/NeMo-…

51f059e

…Skills into igitman/account-arg

Roll-back tmp changes

316352b

Signed-off-by: Igor Gitman <igitman@nvidia.com>

.get -> []

dbea2ca

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Merge branch 'main' into igitman/account-arg

a6fe5ab

Kipok enabled auto-merge (squash) February 26, 2026 02:05

gwarmstrong added 3 commits February 26, 2026 09:36

gwarmstrong added the reviewed label Feb 27, 2026

tst: update test parameters to limit retries and use a different mode…

92600fb

…l name Signed-off-by: George Armstrong <georgea@nvidia.com>

Kipok merged commit c8abe5d into main Feb 27, 2026
5 checks passed

Kipok deleted the igitman/account-arg branch February 27, 2026 17:31

coderabbitai bot mentioned this pull request Mar 5, 2026

Eval kit support #1239

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New slurm customization parameters (account, containers)#1209

New slurm customization parameters (account, containers)#1209
Kipok merged 17 commits intomainfrom
igitman/account-arg

Kipok commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

gwarmstrong left a comment

Uh oh!

gwarmstrong Feb 3, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kipok commented Feb 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gwarmstrong left a comment

Choose a reason for hiding this comment

Uh oh!

gwarmstrong Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kipok commented Feb 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 3, 2026 •

edited

Loading

gwarmstrong Feb 3, 2026 •

edited

Loading