Update nemo-rl to latest by smahdavi4 · Pull Request #1087 · NVIDIA-NeMo/Skills

smahdavi4 · 2025-12-10T01:20:38Z

Upgrading nemo-rl to the latest main:

Vllm version 11.0
Full support for async RL
RL support with GB200 clusters.
Config changes to align with nemo-rl and fix some of the bugs on our side (such as clipping)

Limitations:
The gradient clipping is not unified yet, we can merge after this is resolved.

Summary by CodeRabbit

New Features
- Asynchronous GRPO training mode with configurable weight update controls
- LoRA and PEFT support for parameter-efficient fine-tuning
Improvements
- Updated build infrastructure with enhanced CUDA architecture targeting
- Extended GRPO and SFT training configurations with expanded hyperparameter options
- Enhanced generation and optimization parameter controls for improved performance

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

…mo-rl-update

greptile-apps · 2026-01-15T20:08:07Z

Greptile Overview

Greptile Summary

This PR upgrades nemo-rl to the latest version with vllm 11.0, adds full async RL support, and enables GB200 cluster support. The changes align with upstream nemo-rl improvements and fix several configuration issues.

Key Changes:

Upgraded to vllm 11.0 with improved Docker build process using multi-stage builds
Added CUDA architecture 10.0 support for GB200 clusters (TORCH_CUDA_ARCH_LIST="9.0 10.0")
Implemented async GRPO training mode with configurable trajectory age and optional in-flight weight updates
Added LoRA/PEFT support for parameter-efficient fine-tuning in SFT config
Enhanced KL divergence controls with configurable types (k1, k2, k3) and clamping values
Updated learning rate scheduler to include warmup period (10 steps) instead of constant LR
Fixed import issue in utils.py by moving fire imports into function scope

Critical Issue:

SFT config sets max_grad_norm: 0.0 which causes Megatron's clip_grad to be 0.0, potentially breaking training by clipping all gradients to zero

Minor Issues:

CVE-2025-68973 in Dockerfile appears to be a non-existent or future CVE ID that should be verified

Confidence Score: 2/5

This PR has a critical gradient clipping issue in SFT config that will break training
The SFT config sets max_grad_norm to 0.0 which will cause clip_grad=0.0 in Megatron, clipping all gradients to zero and preventing the model from learning. This is a blocking issue that must be fixed before merge.
Pay critical attention to nemo_skills/training/nemo_rl/configs/sft.yaml - the gradient clipping configuration will break training

Important Files Changed

Filename	Overview
dockerfiles/Dockerfile.nemo-rl	Updated to vllm 11.0, added GB200 support (CUDA arch 10.0), improved build process with multi-stage build, added CVE fixes
nemo_skills/training/nemo_rl/configs/grpo.yaml	Added async GRPO support, expanded generation configs, added KL divergence controls, updated scheduler warmup settings
nemo_skills/training/nemo_rl/configs/sft.yaml	Added LoRA/PEFT support, enabled gradient clipping with max_grad_norm=0.0, which may break training for Megatron backend

Sequence Diagram

sequenceDiagram
    participant Main as start_grpo.py
    participant Config as Config Validation
    participant AsyncGRPO as async_grpo_train
    participant SyncGRPO as grpo_train
    participant Policy as Policy Model
    participant Gen as Generation Engine
    participant Env as Environment

    Main->>Config: Check async_grpo.enabled
    
    alt Async Mode Enabled
        Config->>Config: Validate unsupported features
        Config-->>Config: Check use_dynamic_sampling
        Config-->>Config: Check reward_scaling
        Config-->>Config: Check reward_shaping
        
        alt Unsupported Feature Found
            Config-->>Main: Raise NotImplementedError
        else All Features Supported
            Main->>AsyncGRPO: Initialize async training
            AsyncGRPO->>Gen: Generate trajectories (async)
            Gen->>Policy: Sample responses
            Gen->>Env: Evaluate rewards
            Note over AsyncGRPO,Gen: Trajectories can be aged<br/>(max_trajectory_age_steps)
            AsyncGRPO->>Policy: Update weights (async)
            Note over AsyncGRPO,Policy: Optional in-flight updates<br/>and KV cache recompute
            AsyncGRPO->>AsyncGRPO: Train with aged trajectories
        end
    else Sync Mode (Default)
        Main->>SyncGRPO: Initialize standard training
        SyncGRPO->>Gen: Generate trajectories (blocking)
        Gen->>Policy: Sample responses
        Gen->>Env: Evaluate rewards
        SyncGRPO->>Policy: Update weights (blocking)
        SyncGRPO->>SyncGRPO: Train on fresh trajectories
    end

coderabbitai · 2026-01-15T20:12:18Z

📝 Walkthrough

Walkthrough

This pull request introduces multi-stage Docker build infrastructure for the NeMo RL container with conditional VLLM compilation, upgrades UV tooling, and adds comprehensive training configurations for GRPO and SFT workflows including async GRPO support, LoRA configuration, and SwanLab logging integration.

Changes

Cohort / File(s)	Summary
Docker multi-stage build refactoring `dockerfiles/Dockerfile.nemo-rl`	Introduces nemo-rl, hermetic, and release build stages with commit-driven NeMo RL fetching, UV 0.7.2→0.9.7 upgrade, conditional custom VLLM builds, CUDA architecture targeting (9.0, 10.0), and metadata/fingerprinting support.
GRPO training configuration `nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml`, `nemo_skills/training/nemo_rl/configs/grpo.yaml`	Adds legacy GRPO config template and updates primary config with async GRPO flags, expanded loss/checkpointing settings, KL clamping, refined generation configs (mcore + vllm), Megatron memory optimization, optimizer/scheduler refinement, and SwanLab logger support.
SFT training configuration `nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml`, `nemo_skills/training/nemo_rl/configs/sft.yaml`	Introduces legacy SFT config template and updates primary config with metric tracking changes, dtensor v2 enhancements, LoRA/PEFT configuration blocks, dynamic batching refinement, sequence packing adjustments, optimizer/scheduler updates, and SwanLab logger support.
Async GRPO training support `nemo_skills/training/nemo_rl/start_grpo.py`	Adds conditional branching to invoke async GRPO training path when enabled, with validation for unsupported feature combinations and extended parameter passing.
Deferred fire import `nemo_skills/utils.py`	Moves fire and fire_decorators imports from module scope to function scope in `check_no_extra_args_fire` to support environments without fire installed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

update nemo-rl #972 — Modifies the same NEMO_RL_COMMIT build argument in dockerfiles/Dockerfile.nemo-rl
Switch to building containers on-the-fly for local runs #969 — Adds dockerfile container reference resolution that pairs with updated nemo-rl Dockerfile and build stages
Remove nemo-aligner and change all commands to nemo-rl #891 — Aligns with container renaming and widespread nemo-to-nemo-rl migration with environment variable support

Suggested labels

run GPU tests

Suggested reviewers

Kipok
wedu-nvidia

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Update nemo-rl to latest' is generic and vague, failing to convey specific information about what was actually updated or why.	Consider a more descriptive title like 'Update nemo-rl with async RL support and vllm 11.0' that highlights the main changes and value of the update.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml`:
- Line 279: Remove the duplicate root-level key `checkpoint_must_save_by` and
rely on the existing `checkpointing.checkpoint_must_save_by` configuration; edit
the YAML to delete the top-level `checkpoint_must_save_by: null` entry so there
is only one authoritative `checkpointing.checkpoint_must_save_by` setting, and
scan for any other duplicate top-level keys to avoid OmegaConf resolution
conflicts.

In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Around line 96-108: The YAML key lora_dtype in the peft config is currently
set to the literal None which may be parsed as a string; update the value to an
explicit null if you mean "unset" (set lora_dtype: null) or to a quoted string
if you mean the string "None" (set lora_dtype: "None"); locate the peft block
(peft.enabled, peft.dim, etc.) and make the change to lora_dtype accordingly.

🧹 Nitpick comments (2)

dockerfiles/Dockerfile.nemo-rl (1)
149-149: Consider pinning NeMo-Skills to a specific commit for reproducibility.

Unlike the NeMo-RL repository which uses NEMO_RL_COMMIT for version pinning, NeMo-Skills is cloned from the default branch without a specific commit reference. This means builds at different times may include different NeMo-Skills code.

If reproducibility is important, consider adding a similar NEMO_SKILLS_COMMIT build argument:
♻️ Suggested improvement
+ARG NEMO_SKILLS_COMMIT=main
-RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && uv pip install .
+RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && git checkout ${NEMO_SKILLS_COMMIT} && uv pip install .
nemo_skills/training/nemo_rl/start_grpo.py (1)
331-350: Async GRPO feature validation looks good, but verify importance sampling requirement.

The feature compatibility checks correctly prevent unsupported features from being used with async GRPO. However, the config comment in grpo-legacy-85eeb8d.yaml (lines 45-47) states:

"Async GRPO requires importance sampling correction enabled. Set to true when async_grpo.enabled is true"

Consider adding a validation or warning if async_grpo.enabled is true but loss_fn.use_importance_sampling_correction is false, as this configuration dependency isn't enforced here.
Suggested validation
         for feature in unsupported_features:
             if feature not in config["grpo"]:
                 continue

             if feature == "use_dynamic_sampling":
                 if config["grpo"][feature]:
                     raise NotImplementedError(f"{feature} is not supported with async GRPO")
             else:
                 if config["grpo"][feature]["enabled"]:
                     raise NotImplementedError(f"{feature} is not supported with async GRPO")

+        # Warn if importance sampling correction is not enabled for async GRPO
+        if not config["loss_fn"].get("use_importance_sampling_correction", False):
+            print("⚠️  Warning: Async GRPO typically requires use_importance_sampling_correction=true in loss_fn")
+
         from nemo_rl.algorithms.grpo import async_grpo_train

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2571d0 and b8c0aaf.

📒 Files selected for processing (7)

dockerfiles/Dockerfile.nemo-rl
nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml
nemo_skills/training/nemo_rl/configs/grpo.yaml
nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml
nemo_skills/training/nemo_rl/configs/sft.yaml
nemo_skills/training/nemo_rl/start_grpo.py
nemo_skills/utils.py

🧰 Additional context used

🪛 Hadolint (2.14.0)

dockerfiles/Dockerfile.nemo-rl

[error] 10-10: invalid flag: --keep-git-dir

(DL1000)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Greptile Review
GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (26)

dockerfiles/Dockerfile.nemo-rl (7)

7-11: BuildKit-specific ADD flag requires recent Docker versions.

The --keep-git-dir=true flag on line 10 is a BuildKit feature for ADD with Git URLs. The # syntax=docker/dockerfile:1 directive on line 1 properly enables this, but be aware that:

Older Docker versions or builds without BuildKit may fail

Hadolint's warning is a false positive—this flag is valid with BuildKit

The stage looks good for fetching the NeMo RL repository with git metadata preserved.

41-42: Good: CVE remediation for gnupg.

Proactive security fix for CVE-2025-68973. The --only-upgrade flag ensures it only upgrades the existing package without pulling new dependencies.

49-49: UV version upgrade from 0.7.2 to 0.9.7.

This is a significant version bump. If any compatibility issues arise with dependency resolution or lock file handling, this would be a likely suspect.

79-80: CUDA architecture targeting for H100 and B200.

Setting TORCH_CUDA_ARCH_LIST="9.0 10.0" targets:

9.0: Hopper (H100)

10.0: Blackwell (B200/GB200)

This aligns with the PR objective of adding GB200 cluster support.

90-112: Conditional VLLM build and CVE mitigation look good.

The conditional custom VLLM build is properly guarded, and the CVE mitigation for aiohttp (GHSA-mqqc-3gqh-h2x8) using find -exec rm -rf is an effective approach.

One minor note: if BUILD_CUSTOM_VLLM is set but the nemo-rl.env file doesn't exist (e.g., build script failure), line 94 will fail the build. This is likely the desired behavior, but worth being aware of.

134-138: COPY --exclude and git unshallow logic.

The --exclude flag is another BuildKit feature (properly enabled by the syntax directive). The conditional unshallow logic on line 138 is well-designed—it checks if the repo is shallow before attempting to fetch full history.

Note: git fetch --unshallow requires network access. If the build environment is network-isolated after the initial clone, this could fail. The || true fallback handles this gracefully, though the fingerprint generation may produce different results for shallow vs. full repos.

146-147: OSS attribution notice generation.

Good compliance practice for NVIDIA open-source distribution requirements.

nemo_skills/utils.py (1)

508-512: LGTM! Lazy import pattern for optional dependency.

Moving fire and fire.decorators imports inside the function is a good approach for environments where fire is not installed. The comment clearly documents the rationale.

nemo_skills/training/nemo_rl/start_grpo.py (2)

351-371: LGTM! Async GRPO training integration.

The lazy import of async_grpo_train and the parameter passing looks correct. The additional max_trajectory_age_steps parameter properly sources from the async config section.

372-389: LGTM!

The synchronous GRPO training path is preserved correctly as the fallback when async mode is not enabled.

nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml (2)

1-14: LGTM! Legacy SFT configuration.

The configuration provides sensible defaults. The large max_num_epochs and max_num_steps values (100000000) ensure that one of them can be set to control training duration without the other being a limiting factor.

56-115: Configuration aligns with PR objectives.

The Megatron configuration with empty_unused_memory_level: 0 is documented with the appropriate OOM warning. The commented-out clip_grad aligns with the PR description noting that "gradient clipping is not unified yet."

nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml (2)

33-50: Async GRPO configuration is properly documented.

The async GRPO section correctly defaults to disabled and includes helpful comments about importance sampling correction requirements. This aligns well with the validation in start_grpo.py.

82-89: The configuration does not present a conflict. Although both dtensor_cfg.enabled and megatron_cfg.enabled are set to true in the YAML file, the nemo-rl pipeline code (nemo_skills/pipeline/nemo_rl/sft.py and nemo_skills/pipeline/nemo_rl/grpo.py) explicitly enforces mutual exclusivity through CLI overrides: when backend=="megatron", dtensor is disabled; otherwise, megatron is disabled. Only one backend is active per training run by design.

nemo_skills/training/nemo_rl/configs/sft.yaml (3)

56-68: LGTM! Well-documented LoRA configuration.

The LoRA configuration section is well-structured with helpful comments explaining each parameter, including the important note about disabling Triton when tensor_parallel_size > 1.

216-218: LGTM! SwanLab logger integration.

The SwanLab configuration follows the established pattern used by wandb and other loggers, maintaining consistency across the logging options.

19-19: LGTM!

The metric name format change to "val:val_loss" with the explanatory comment about prefix options is clear and helpful.

nemo_skills/training/nemo_rl/configs/grpo.yaml (9)

1-31: Good traceability with upstream commit reference.

The source commit hash comment provides clear lineage for tracking configuration changes against upstream nemo-rl. The GRPO algorithm settings appear reasonable for math training workloads.

32-37: LGTM!

Conservative defaults for async GRPO with clear inline documentation. The disabled state is appropriate for synchronous training mode.

39-57: LGTM!

The KL divergence configuration with k3 approximation and clamping values is well-documented. The reference to the Joschu blog provides helpful context for understanding the KL type choices.

59-68: LGTM!

The safetensors format is a good choice for model serialization due to its safety and performance benefits. The metric naming convention with colon separator is consistent with the documented format.

70-100: LGTM!

The policy configuration provides good flexibility with hf_config_overrides and clear documentation for the optimizer offloading behavior. The dtensor _v2: true flag aligns with the enhanced distributed tensor tooling.

154-161: Verify the warmup configuration values.

lr_warmup_iters: 13 is an unusual number. Typically warmup iterations are round numbers or percentages of total training. If this is intentional (e.g., matching a specific upstream configuration), the value is fine. Otherwise, consider whether this should be a rounder number like 10 or 100.

175-218: LGTM!

The optimizer configuration correctly disables foreach and fused for DTensor compatibility with clear documentation. The sequential scheduler setup (LinearLR warmup → ConstantLR) with milestone at iteration 10 provides a clean warmup phase.

281-302: LGTM!

Good observability defaults with GPU monitoring enabled and consistent naming across logging backends. The SwanLab integration provides additional logging flexibility.

227-254: vLLM 0.11 compatibility confirmed for these configuration settings.

All cited vllm_cfg parameters (kv_cache_dtype: "auto", enforce_eager: False, use_deep_gemm: False) are supported in vLLM 0.11.0 and correctly configured. The mcore_generation_config defaults for KV cache management are reasonable, and the inline comment referencing convergence considerations with torch.compile provides good context for understanding the enforce_eager setting.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-15T20:12:21Z

nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml

+  gpus_per_node: 1
+  num_nodes: 1
+
+checkpoint_must_save_by: null


⚠️ Potential issue | 🟡 Minor

Duplicate checkpoint_must_save_by key at root level.

This key already exists under checkpointing.checkpoint_must_save_by (line 59). The root-level duplicate may be unintentional or could cause configuration loading issues depending on how OmegaConf resolves it.

Suggested fix

cluster: gpus_per_node: 1 num_nodes: 1 - -checkpoint_must_save_by: null

🤖 Prompt for AI Agents

In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml` at line 279, Remove the duplicate root-level key `checkpoint_must_save_by` and rely on the existing `checkpointing.checkpoint_must_save_by` configuration; edit the YAML to delete the top-level `checkpoint_must_save_by: null` entry so there is only one authoritative `checkpointing.checkpoint_must_save_by` setting, and scan for any other duplicate top-level keys to avoid OmegaConf resolution conflicts.

coderabbitai · 2026-01-15T20:12:21Z

nemo_skills/training/nemo_rl/configs/sft.yaml

+    peft:
+      enabled: false
+      target_modules: []
+      exclude_modules: []
+      dim: 8
+      alpha: 32
+      dropout: 0.0
+      dropout_position: "post"
+      lora_A_init_method: "xavier"
+      lora_B_init_method: "zero"
+      a2a_experimental: false
+      lora_dtype: None



⚠️ Potential issue | 🟡 Minor

lora_dtype: None may not parse as expected.

In YAML, None is typically a string literal, not a null value. If the intention is to represent a null/unset value, use null instead. If it's meant to be the string "None", wrap it in quotes.

Suggested fix (if null is intended)

a2a_experimental: false - lora_dtype: None + lora_dtype: null

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

peft:

enabled: false

target_modules: []

exclude_modules: []

dim: 8

alpha: 32

dropout: 0.0

dropout_position: "post"

lora_A_init_method: "xavier"

lora_B_init_method: "zero"

a2a_experimental: false

lora_dtype: None

peft:

enabled: false

target_modules: []

exclude_modules: []

dim: 8

alpha: 32

dropout: 0.0

dropout_position: "post"

lora_A_init_method: "xavier"

lora_B_init_method: "zero"

a2a_experimental: false

lora_dtype: null

🤖 Prompt for AI Agents

In `@nemo_skills/training/nemo_rl/configs/sft.yaml` around lines 96 - 108, The YAML key lora_dtype in the peft config is currently set to the literal None which may be parsed as a string; update the value to an explicit null if you mean "unset" (set lora_dtype: null) or to a quoted string if you mean the string "None" (set lora_dtype: "None"); locate the peft block (peft.enabled, peft.dim, etc.) and make the change to lora_dtype accordingly.

Kipok · 2026-01-27T21:58:03Z

nemo_skills/training/nemo_rl/configs/grpo.yaml

-      lr_decay_iters: ${grpo.max_num_steps}
-      lr_warmup_iters: 0
-      lr_warmup_init: 1.0e-6
+      lr_decay_style: "constant"


let's maybe keep original defaults? Otherwise it will be harder to set LR decay. While cosine can work as both constant schedule as well as decay by just adjusting min lr

Kipok · 2026-01-27T21:58:59Z

nemo_skills/training/nemo_rl/configs/grpo.yaml

+      foreach: False
+      fused: False
+
+  scheduler:


and same here, ideally keep original

nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-27T22:25:13Z

nemo_skills/training/nemo_rl/configs/sft.yaml

+  # makes the training sequence length divisible by the tensor parallel size
+  # this is useful for sequence parallel training
+  make_sequence_length_divisible_by: ${policy.dtensor_cfg.tensor_parallel_size}
+  max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping


max_grad_norm: 0.0 will set clip_grad: 0.0 in Megatron config (line 130), causing all gradients to be clipped to 0 and breaking training. The comment says "Zero means no clipping" but Megatron's clip_grad parameter interprets 0 as a clipping threshold. Should be null to disable clipping, or a positive value like 1.0.

Suggested change

max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping

max_grad_norm: null # megatron: null means no clipping, FSDP: null means no clipping

greptile-apps · 2026-01-27T22:25:14Z

dockerfiles/Dockerfile.nemo-rl

+# To fix CVE-2025-68973
+apt install -y --only-upgrade gnupg


CVE-2025-68973 doesn't exist yet. CVE IDs follow the format CVE-YYYY-NNNNN where YYYY is the year of assignment. 2025 IDs would only be assigned in 2025. Verify this is the correct CVE number.

greptile-apps · 2026-01-27T22:25:15Z

nemo_skills/training/nemo_rl/configs/grpo.yaml

+  scheduler:
+    - name: "torch.optim.lr_scheduler.LinearLR"
+      kwargs:
+        start_factor: 0.1
+        end_factor: 1.0
+        total_iters: 10
+    - name: "torch.optim.lr_scheduler.CosineAnnealingLR"


Changed from constant LR (start_factor=1.0, end_factor=1.0, total_iters=1) to warmup schedule (start_factor=0.1 for 10 steps). This significantly changes training behavior - LR now starts at 10% and warms up over 10 steps.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-28T05:15:24Z

tests/gpu-tests/test_train.py

        ctx=wrap_arguments(
            "++data.prompt.prompt_config=qwen/math-cot "
-            "++grpo.max_num_steps=5 "
+            "++grpo.lr_warmup_steps=2 "


grpo.lr_warmup_steps doesn't exist in nemo_skills/training/nemo_rl/configs/grpo.yaml. The test will fail with this parameter.

Suggested change

"++grpo.lr_warmup_steps=2 "

"++grpo.num_prompts_per_step=2 "

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

tests fail

Signed-off-by: Igor Gitman <igitman@nvidia.com>

…eMo-Skills into smahdavi/nemo-rl-update

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-28T05:40:37Z

nemo_skills/training/nemo_rl/offline_hf_consolidation.py

+
+    hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata")
+
+    if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):


logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.

Suggested change

if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):

if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir):

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-28T06:34:57Z

nemo_skills/training/nemo_rl/offline_hf_consolidation.py

+
+    hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata")
+
+    if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):


logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.

Suggested change

if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):

if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir):

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-29T06:05:24Z

nemo_skills/training/nemo_rl/convert_dcp_to_hf.py

+    """
+    tokenizer_files = [
+        "tokenizer.json",
+        "tokenizer_config.json",
+        "special_tokens_map.json",
+        "vocab.json",
+        "merges.txt",
+        "added_tokens.json",
+        "chat_template.jinja",
+    ]
+    for fname in tokenizer_files:
+        src = os.path.join(tokenizer_path, fname)


missing tokenizer files when tokenizer_path is not a local directory

When tokenizer_path is a HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B), os.path.exists(src) will fail for all files since it's not a local path. This will silently skip copying tokenizer files, potentially breaking the converted checkpoint.

Consider downloading tokenizer files from HF first or handling the case where tokenizer_path is a model ID.

greptile-apps · 2026-01-29T06:05:25Z

nemo_skills/training/nemo_rl/convert_dcp_to_hf.py

+    hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")
+    return os.path.isdir(hf_metadata_path)


os.path.isdir() check is insufficient

If .hf_metadata exists as a file (not directory), this will incorrectly return False and use the wrong conversion path. Use:

Suggested change

hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")

return os.path.isdir(hf_metadata_path)

return os.path.exists(hf_metadata_path) and os.path.isdir(hf_metadata_path)

Signed-off-by: Igor Gitman <igitman@nvidia.com>

…pdate

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

smahdavi4 added 7 commits December 9, 2025 15:30

Update nemo-rl to latest

14e9db7

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Update start_grpo

ee566dc

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

legacy configs for nemo-rl

33cf653

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

update dockerfile

f00cc8e

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

update nemo-rl to latest commit

cf9c856

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

add one more comment

53a9e93

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Merge branch 'main' of github.com:NVIDIA-NeMo/Skills into smahdavi/ne…

b8c0aaf

…mo-rl-update

smahdavi4 requested a review from Kipok January 15, 2026 20:05

smahdavi4 marked this pull request as ready for review January 15, 2026 20:05

coderabbitai bot reviewed Jan 15, 2026

View reviewed changes

Kipok reviewed Jan 27, 2026

View reviewed changes

smahdavi4 added 2 commits January 27, 2026 14:20

Remove legacy and rollback grpo configs

95b50a1

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

Remove legacy and rollback grpo configs

502df62

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

greptile-apps bot reviewed Jan 27, 2026

View reviewed changes

Kipok added 4 commits January 27, 2026 17:35

Update conversion script

02f5e89

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Adjust test for warmup

0ee9c77

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Switch to a proper conversion script

6dcbd91

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Remove unused parameter

4f4881d

Signed-off-by: Igor Gitman <igitman@nvidia.com>

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

Kipok previously approved these changes Jan 28, 2026

View reviewed changes

Kipok added the run GPU tests label Jan 28, 2026

Merge branch 'main' into smahdavi/nemo-rl-update

a977a5f

Kipok added run GPU tests and removed run GPU tests labels Jan 28, 2026

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

Kipok removed the run GPU tests label Jan 28, 2026

Fix for import

265228e

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Merge branch 'smahdavi/nemo-rl-update' of https://github.com/NVIDIA/N…

4a6833d

…eMo-Skills into smahdavi/nemo-rl-update

Kipok added the run GPU tests label Jan 28, 2026

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

Kipok added 2 commits January 27, 2026 22:06

Add extra automodel

4192d30

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Add copy for tokenizer files

9804714

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Kipok added run GPU tests and removed run GPU tests labels Jan 28, 2026

greptile-apps bot reviewed Jan 28, 2026

View reviewed changes

Kipok added 2 commits January 28, 2026 21:44

Merge branch 'main' into smahdavi/nemo-rl-update

44fb424

Fix tokenizer files logic

3f66a6f

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Kipok added run GPU tests and removed run GPU tests labels Jan 29, 2026

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

Kipok added 2 commits January 29, 2026 10:37

Use 10 samples for bfcl in gpu ci

a9a80f5

Signed-off-by: Igor Gitman <igitman@nvidia.com>

Merge branch 'igitman/fix-unstable-gpu-tests' into smahdavi/nemo-rl-u…

182c562

…pdate

Kipok added run GPU tests and removed run GPU tests labels Jan 29, 2026

greptile-apps bot reviewed Jan 29, 2026

View reviewed changes

Kipok approved these changes Jan 29, 2026

View reviewed changes

Kipok merged commit c4854b8 into main Jan 29, 2026
6 checks passed

Kipok deleted the smahdavi/nemo-rl-update branch January 29, 2026 21:43

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Update nemo-rl to latest (#1087)

7b0a946

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

	max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping
	max_grad_norm: null # megatron: null means no clipping, FSDP: null means no clipping


		hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata")

		if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):

	if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):
	if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir):

		hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")
		return os.path.isdir(hf_metadata_path)

	hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")
	return os.path.isdir(hf_metadata_path)
	return os.path.exists(hf_metadata_path) and os.path.isdir(hf_metadata_path)

Conversation

smahdavi4 commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

greptile-apps bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

coderabbitai bot commented Jan 15, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 15, 2026

Choose a reason for hiding this comment

Uh oh!

Kipok Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Kipok Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

smahdavi4 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

greptile-apps bot commented Jan 15, 2026 •

edited

Loading