Conversation
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Greptile OverviewGreptile SummaryThis PR upgrades nemo-rl to the latest version with vllm 11.0, adds full async RL support, and enables GB200 cluster support. The changes align with upstream nemo-rl improvements and fix several configuration issues. Key Changes:
Critical Issue:
Minor Issues:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Main as start_grpo.py
participant Config as Config Validation
participant AsyncGRPO as async_grpo_train
participant SyncGRPO as grpo_train
participant Policy as Policy Model
participant Gen as Generation Engine
participant Env as Environment
Main->>Config: Check async_grpo.enabled
alt Async Mode Enabled
Config->>Config: Validate unsupported features
Config-->>Config: Check use_dynamic_sampling
Config-->>Config: Check reward_scaling
Config-->>Config: Check reward_shaping
alt Unsupported Feature Found
Config-->>Main: Raise NotImplementedError
else All Features Supported
Main->>AsyncGRPO: Initialize async training
AsyncGRPO->>Gen: Generate trajectories (async)
Gen->>Policy: Sample responses
Gen->>Env: Evaluate rewards
Note over AsyncGRPO,Gen: Trajectories can be aged<br/>(max_trajectory_age_steps)
AsyncGRPO->>Policy: Update weights (async)
Note over AsyncGRPO,Policy: Optional in-flight updates<br/>and KV cache recompute
AsyncGRPO->>AsyncGRPO: Train with aged trajectories
end
else Sync Mode (Default)
Main->>SyncGRPO: Initialize standard training
SyncGRPO->>Gen: Generate trajectories (blocking)
Gen->>Policy: Sample responses
Gen->>Env: Evaluate rewards
SyncGRPO->>Policy: Update weights (blocking)
SyncGRPO->>SyncGRPO: Train on fresh trajectories
end
|
📝 WalkthroughWalkthroughThis pull request introduces multi-stage Docker build infrastructure for the NeMo RL container with conditional VLLM compilation, upgrades UV tooling, and adds comprehensive training configurations for GRPO and SFT workflows including async GRPO support, LoRA configuration, and SwanLab logging integration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml`:
- Line 279: Remove the duplicate root-level key `checkpoint_must_save_by` and
rely on the existing `checkpointing.checkpoint_must_save_by` configuration; edit
the YAML to delete the top-level `checkpoint_must_save_by: null` entry so there
is only one authoritative `checkpointing.checkpoint_must_save_by` setting, and
scan for any other duplicate top-level keys to avoid OmegaConf resolution
conflicts.
In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Around line 96-108: The YAML key lora_dtype in the peft config is currently
set to the literal None which may be parsed as a string; update the value to an
explicit null if you mean "unset" (set lora_dtype: null) or to a quoted string
if you mean the string "None" (set lora_dtype: "None"); locate the peft block
(peft.enabled, peft.dim, etc.) and make the change to lora_dtype accordingly.
🧹 Nitpick comments (2)
dockerfiles/Dockerfile.nemo-rl (1)
149-149: Consider pinning NeMo-Skills to a specific commit for reproducibility.Unlike the NeMo-RL repository which uses
NEMO_RL_COMMITfor version pinning, NeMo-Skills is cloned from the default branch without a specific commit reference. This means builds at different times may include different NeMo-Skills code.If reproducibility is important, consider adding a similar
NEMO_SKILLS_COMMITbuild argument:♻️ Suggested improvement
+ARG NEMO_SKILLS_COMMIT=main -RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && uv pip install . +RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && git checkout ${NEMO_SKILLS_COMMIT} && uv pip install .nemo_skills/training/nemo_rl/start_grpo.py (1)
331-350: Async GRPO feature validation looks good, but verify importance sampling requirement.The feature compatibility checks correctly prevent unsupported features from being used with async GRPO. However, the config comment in
grpo-legacy-85eeb8d.yaml(lines 45-47) states:"Async GRPO requires importance sampling correction enabled. Set to true when async_grpo.enabled is true"
Consider adding a validation or warning if
async_grpo.enabledis true butloss_fn.use_importance_sampling_correctionis false, as this configuration dependency isn't enforced here.Suggested validation
for feature in unsupported_features: if feature not in config["grpo"]: continue if feature == "use_dynamic_sampling": if config["grpo"][feature]: raise NotImplementedError(f"{feature} is not supported with async GRPO") else: if config["grpo"][feature]["enabled"]: raise NotImplementedError(f"{feature} is not supported with async GRPO") + # Warn if importance sampling correction is not enabled for async GRPO + if not config["loss_fn"].get("use_importance_sampling_correction", False): + print("⚠️ Warning: Async GRPO typically requires use_importance_sampling_correction=true in loss_fn") + from nemo_rl.algorithms.grpo import async_grpo_train
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
dockerfiles/Dockerfile.nemo-rlnemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yamlnemo_skills/training/nemo_rl/configs/grpo.yamlnemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yamlnemo_skills/training/nemo_rl/configs/sft.yamlnemo_skills/training/nemo_rl/start_grpo.pynemo_skills/utils.py
🧰 Additional context used
🪛 Hadolint (2.14.0)
dockerfiles/Dockerfile.nemo-rl
[error] 10-10: invalid flag: --keep-git-dir
(DL1000)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Greptile Review
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (26)
dockerfiles/Dockerfile.nemo-rl (7)
7-11: BuildKit-specific ADD flag requires recent Docker versions.The
--keep-git-dir=trueflag on line 10 is a BuildKit feature forADDwith Git URLs. The# syntax=docker/dockerfile:1directive on line 1 properly enables this, but be aware that:
- Older Docker versions or builds without BuildKit may fail
- Hadolint's warning is a false positive—this flag is valid with BuildKit
The stage looks good for fetching the NeMo RL repository with git metadata preserved.
41-42: Good: CVE remediation for gnupg.Proactive security fix for CVE-2025-68973. The
--only-upgradeflag ensures it only upgrades the existing package without pulling new dependencies.
49-49: UV version upgrade from 0.7.2 to 0.9.7.This is a significant version bump. If any compatibility issues arise with dependency resolution or lock file handling, this would be a likely suspect.
79-80: CUDA architecture targeting for H100 and B200.Setting
TORCH_CUDA_ARCH_LIST="9.0 10.0"targets:
- 9.0: Hopper (H100)
- 10.0: Blackwell (B200/GB200)
This aligns with the PR objective of adding GB200 cluster support.
90-112: Conditional VLLM build and CVE mitigation look good.The conditional custom VLLM build is properly guarded, and the CVE mitigation for aiohttp (GHSA-mqqc-3gqh-h2x8) using
find -exec rm -rfis an effective approach.One minor note: if
BUILD_CUSTOM_VLLMis set but thenemo-rl.envfile doesn't exist (e.g., build script failure), line 94 will fail the build. This is likely the desired behavior, but worth being aware of.
134-138: COPY --exclude and git unshallow logic.The
--excludeflag is another BuildKit feature (properly enabled by the syntax directive). The conditional unshallow logic on line 138 is well-designed—it checks if the repo is shallow before attempting to fetch full history.Note:
git fetch --unshallowrequires network access. If the build environment is network-isolated after the initial clone, this could fail. The|| truefallback handles this gracefully, though the fingerprint generation may produce different results for shallow vs. full repos.
146-147: OSS attribution notice generation.Good compliance practice for NVIDIA open-source distribution requirements.
nemo_skills/utils.py (1)
508-512: LGTM! Lazy import pattern for optional dependency.Moving
fireandfire.decoratorsimports inside the function is a good approach for environments where fire is not installed. The comment clearly documents the rationale.nemo_skills/training/nemo_rl/start_grpo.py (2)
351-371: LGTM! Async GRPO training integration.The lazy import of
async_grpo_trainand the parameter passing looks correct. The additionalmax_trajectory_age_stepsparameter properly sources from the async config section.
372-389: LGTM!The synchronous GRPO training path is preserved correctly as the fallback when async mode is not enabled.
nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml (2)
1-14: LGTM! Legacy SFT configuration.The configuration provides sensible defaults. The large
max_num_epochsandmax_num_stepsvalues (100000000) ensure that one of them can be set to control training duration without the other being a limiting factor.
56-115: Configuration aligns with PR objectives.The Megatron configuration with
empty_unused_memory_level: 0is documented with the appropriate OOM warning. The commented-outclip_gradaligns with the PR description noting that "gradient clipping is not unified yet."nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml (2)
33-50: Async GRPO configuration is properly documented.The async GRPO section correctly defaults to disabled and includes helpful comments about importance sampling correction requirements. This aligns well with the validation in
start_grpo.py.
82-89: The configuration does not present a conflict. Although bothdtensor_cfg.enabledandmegatron_cfg.enabledare set totruein the YAML file, the nemo-rl pipeline code (nemo_skills/pipeline/nemo_rl/sft.pyandnemo_skills/pipeline/nemo_rl/grpo.py) explicitly enforces mutual exclusivity through CLI overrides: whenbackend=="megatron", dtensor is disabled; otherwise, megatron is disabled. Only one backend is active per training run by design.nemo_skills/training/nemo_rl/configs/sft.yaml (3)
56-68: LGTM! Well-documented LoRA configuration.The LoRA configuration section is well-structured with helpful comments explaining each parameter, including the important note about disabling Triton when
tensor_parallel_size > 1.
216-218: LGTM! SwanLab logger integration.The SwanLab configuration follows the established pattern used by wandb and other loggers, maintaining consistency across the logging options.
19-19: LGTM!The metric name format change to
"val:val_loss"with the explanatory comment about prefix options is clear and helpful.nemo_skills/training/nemo_rl/configs/grpo.yaml (9)
1-31: Good traceability with upstream commit reference.The source commit hash comment provides clear lineage for tracking configuration changes against upstream nemo-rl. The GRPO algorithm settings appear reasonable for math training workloads.
32-37: LGTM!Conservative defaults for async GRPO with clear inline documentation. The disabled state is appropriate for synchronous training mode.
39-57: LGTM!The KL divergence configuration with k3 approximation and clamping values is well-documented. The reference to the Joschu blog provides helpful context for understanding the KL type choices.
59-68: LGTM!The
safetensorsformat is a good choice for model serialization due to its safety and performance benefits. The metric naming convention with colon separator is consistent with the documented format.
70-100: LGTM!The policy configuration provides good flexibility with
hf_config_overridesand clear documentation for the optimizer offloading behavior. The dtensor_v2: trueflag aligns with the enhanced distributed tensor tooling.
154-161: Verify the warmup configuration values.
lr_warmup_iters: 13is an unusual number. Typically warmup iterations are round numbers or percentages of total training. If this is intentional (e.g., matching a specific upstream configuration), the value is fine. Otherwise, consider whether this should be a rounder number like 10 or 100.
175-218: LGTM!The optimizer configuration correctly disables
foreachandfusedfor DTensor compatibility with clear documentation. The sequential scheduler setup (LinearLR warmup → ConstantLR) with milestone at iteration 10 provides a clean warmup phase.
281-302: LGTM!Good observability defaults with GPU monitoring enabled and consistent naming across logging backends. The SwanLab integration provides additional logging flexibility.
227-254: vLLM 0.11 compatibility confirmed for these configuration settings.All cited vllm_cfg parameters (
kv_cache_dtype: "auto",enforce_eager: False,use_deep_gemm: False) are supported in vLLM 0.11.0 and correctly configured. The mcore_generation_config defaults for KV cache management are reasonable, and the inline comment referencing convergence considerations with torch.compile provides good context for understanding theenforce_eagersetting.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| gpus_per_node: 1 | ||
| num_nodes: 1 | ||
|
|
||
| checkpoint_must_save_by: null |
There was a problem hiding this comment.
Duplicate checkpoint_must_save_by key at root level.
This key already exists under checkpointing.checkpoint_must_save_by (line 59). The root-level duplicate may be unintentional or could cause configuration loading issues depending on how OmegaConf resolves it.
Suggested fix
cluster:
gpus_per_node: 1
num_nodes: 1
-
-checkpoint_must_save_by: null🤖 Prompt for AI Agents
In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml` at line 279,
Remove the duplicate root-level key `checkpoint_must_save_by` and rely on the
existing `checkpointing.checkpoint_must_save_by` configuration; edit the YAML to
delete the top-level `checkpoint_must_save_by: null` entry so there is only one
authoritative `checkpointing.checkpoint_must_save_by` setting, and scan for any
other duplicate top-level keys to avoid OmegaConf resolution conflicts.
| peft: | ||
| enabled: false | ||
| target_modules: [] | ||
| exclude_modules: [] | ||
| dim: 8 | ||
| alpha: 32 | ||
| dropout: 0.0 | ||
| dropout_position: "post" | ||
| lora_A_init_method: "xavier" | ||
| lora_B_init_method: "zero" | ||
| a2a_experimental: false | ||
| lora_dtype: None | ||
|
|
There was a problem hiding this comment.
lora_dtype: None may not parse as expected.
In YAML, None is typically a string literal, not a null value. If the intention is to represent a null/unset value, use null instead. If it's meant to be the string "None", wrap it in quotes.
Suggested fix (if null is intended)
a2a_experimental: false
- lora_dtype: None
+ lora_dtype: null📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| peft: | |
| enabled: false | |
| target_modules: [] | |
| exclude_modules: [] | |
| dim: 8 | |
| alpha: 32 | |
| dropout: 0.0 | |
| dropout_position: "post" | |
| lora_A_init_method: "xavier" | |
| lora_B_init_method: "zero" | |
| a2a_experimental: false | |
| lora_dtype: None | |
| peft: | |
| enabled: false | |
| target_modules: [] | |
| exclude_modules: [] | |
| dim: 8 | |
| alpha: 32 | |
| dropout: 0.0 | |
| dropout_position: "post" | |
| lora_A_init_method: "xavier" | |
| lora_B_init_method: "zero" | |
| a2a_experimental: false | |
| lora_dtype: null |
🤖 Prompt for AI Agents
In `@nemo_skills/training/nemo_rl/configs/sft.yaml` around lines 96 - 108, The
YAML key lora_dtype in the peft config is currently set to the literal None
which may be parsed as a string; update the value to an explicit null if you
mean "unset" (set lora_dtype: null) or to a quoted string if you mean the string
"None" (set lora_dtype: "None"); locate the peft block (peft.enabled, peft.dim,
etc.) and make the change to lora_dtype accordingly.
| lr_decay_iters: ${grpo.max_num_steps} | ||
| lr_warmup_iters: 0 | ||
| lr_warmup_init: 1.0e-6 | ||
| lr_decay_style: "constant" |
There was a problem hiding this comment.
let's maybe keep original defaults? Otherwise it will be harder to set LR decay. While cosine can work as both constant schedule as well as decay by just adjusting min lr
| foreach: False | ||
| fused: False | ||
|
|
||
| scheduler: |
There was a problem hiding this comment.
and same here, ideally keep original
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
| # makes the training sequence length divisible by the tensor parallel size | ||
| # this is useful for sequence parallel training | ||
| make_sequence_length_divisible_by: ${policy.dtensor_cfg.tensor_parallel_size} | ||
| max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping |
There was a problem hiding this comment.
max_grad_norm: 0.0 will set clip_grad: 0.0 in Megatron config (line 130), causing all gradients to be clipped to 0 and breaking training. The comment says "Zero means no clipping" but Megatron's clip_grad parameter interprets 0 as a clipping threshold. Should be null to disable clipping, or a positive value like 1.0.
| max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping | |
| max_grad_norm: null # megatron: null means no clipping, FSDP: null means no clipping |
| # To fix CVE-2025-68973 | ||
| apt install -y --only-upgrade gnupg |
There was a problem hiding this comment.
CVE-2025-68973 doesn't exist yet. CVE IDs follow the format CVE-YYYY-NNNNN where YYYY is the year of assignment. 2025 IDs would only be assigned in 2025. Verify this is the correct CVE number.
| scheduler: | ||
| - name: "torch.optim.lr_scheduler.LinearLR" | ||
| kwargs: | ||
| start_factor: 0.1 | ||
| end_factor: 1.0 | ||
| total_iters: 10 | ||
| - name: "torch.optim.lr_scheduler.CosineAnnealingLR" |
There was a problem hiding this comment.
Changed from constant LR (start_factor=1.0, end_factor=1.0, total_iters=1) to warmup schedule (start_factor=0.1 for 10 steps). This significantly changes training behavior - LR now starts at 10% and warms up over 10 steps.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
tests/gpu-tests/test_train.py
Outdated
| ctx=wrap_arguments( | ||
| "++data.prompt.prompt_config=qwen/math-cot " | ||
| "++grpo.max_num_steps=5 " | ||
| "++grpo.lr_warmup_steps=2 " |
There was a problem hiding this comment.
grpo.lr_warmup_steps doesn't exist in nemo_skills/training/nemo_rl/configs/grpo.yaml. The test will fail with this parameter.
| "++grpo.lr_warmup_steps=2 " | |
| "++grpo.num_prompts_per_step=2 " |
Signed-off-by: Igor Gitman <igitman@nvidia.com>
…eMo-Skills into smahdavi/nemo-rl-update
|
|
||
| hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata") | ||
|
|
||
| if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir): |
There was a problem hiding this comment.
logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.
| if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir): | |
| if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir): |
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
|
|
||
| hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata") | ||
|
|
||
| if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir): |
There was a problem hiding this comment.
logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.
| if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir): | |
| if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir): |
Signed-off-by: Igor Gitman <igitman@nvidia.com>
| """ | ||
| tokenizer_files = [ | ||
| "tokenizer.json", | ||
| "tokenizer_config.json", | ||
| "special_tokens_map.json", | ||
| "vocab.json", | ||
| "merges.txt", | ||
| "added_tokens.json", | ||
| "chat_template.jinja", | ||
| ] | ||
| for fname in tokenizer_files: | ||
| src = os.path.join(tokenizer_path, fname) |
There was a problem hiding this comment.
missing tokenizer files when tokenizer_path is not a local directory
When tokenizer_path is a HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B), os.path.exists(src) will fail for all files since it's not a local path. This will silently skip copying tokenizer files, potentially breaking the converted checkpoint.
Consider downloading tokenizer files from HF first or handling the case where tokenizer_path is a model ID.
| hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata") | ||
| return os.path.isdir(hf_metadata_path) |
There was a problem hiding this comment.
os.path.isdir() check is insufficient
If .hf_metadata exists as a file (not directory), this will incorrectly return False and use the wrong conversion path. Use:
| hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata") | |
| return os.path.isdir(hf_metadata_path) | |
| return os.path.exists(hf_metadata_path) and os.path.isdir(hf_metadata_path) |
Signed-off-by: Igor Gitman <igitman@nvidia.com>
commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Upgrading nemo-rl to the latest main:
Limitations:
The gradient clipping is not unified yet, we can merge after this is resolved.
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.