Skip to content

Update nemo-rl to latest#1087

Merged
Kipok merged 22 commits intomainfrom
smahdavi/nemo-rl-update
Jan 29, 2026
Merged

Update nemo-rl to latest#1087
Kipok merged 22 commits intomainfrom
smahdavi/nemo-rl-update

Conversation

@smahdavi4
Copy link
Collaborator

@smahdavi4 smahdavi4 commented Dec 10, 2025

Upgrading nemo-rl to the latest main:

  • Vllm version 11.0
  • Full support for async RL
  • RL support with GB200 clusters.
  • Config changes to align with nemo-rl and fix some of the bugs on our side (such as clipping)

Limitations:
The gradient clipping is not unified yet, we can merge after this is resolved.

Summary by CodeRabbit

  • New Features

    • Asynchronous GRPO training mode with configurable weight update controls
    • LoRA and PEFT support for parameter-efficient fine-tuning
  • Improvements

    • Updated build infrastructure with enhanced CUDA architecture targeting
    • Extended GRPO and SFT training configurations with expanded hyperparameter options
    • Enhanced generation and optimization parameter controls for improved performance

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
@smahdavi4 smahdavi4 requested a review from Kipok January 15, 2026 20:05
@smahdavi4 smahdavi4 marked this pull request as ready for review January 15, 2026 20:05
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 15, 2026

Greptile Overview

Greptile Summary

This PR upgrades nemo-rl to the latest version with vllm 11.0, adds full async RL support, and enables GB200 cluster support. The changes align with upstream nemo-rl improvements and fix several configuration issues.

Key Changes:

  • Upgraded to vllm 11.0 with improved Docker build process using multi-stage builds
  • Added CUDA architecture 10.0 support for GB200 clusters (TORCH_CUDA_ARCH_LIST="9.0 10.0")
  • Implemented async GRPO training mode with configurable trajectory age and optional in-flight weight updates
  • Added LoRA/PEFT support for parameter-efficient fine-tuning in SFT config
  • Enhanced KL divergence controls with configurable types (k1, k2, k3) and clamping values
  • Updated learning rate scheduler to include warmup period (10 steps) instead of constant LR
  • Fixed import issue in utils.py by moving fire imports into function scope

Critical Issue:

  • SFT config sets max_grad_norm: 0.0 which causes Megatron's clip_grad to be 0.0, potentially breaking training by clipping all gradients to zero

Minor Issues:

  • CVE-2025-68973 in Dockerfile appears to be a non-existent or future CVE ID that should be verified

Confidence Score: 2/5

  • This PR has a critical gradient clipping issue in SFT config that will break training
  • The SFT config sets max_grad_norm to 0.0 which will cause clip_grad=0.0 in Megatron, clipping all gradients to zero and preventing the model from learning. This is a blocking issue that must be fixed before merge.
  • Pay critical attention to nemo_skills/training/nemo_rl/configs/sft.yaml - the gradient clipping configuration will break training

Important Files Changed

Filename Overview
dockerfiles/Dockerfile.nemo-rl Updated to vllm 11.0, added GB200 support (CUDA arch 10.0), improved build process with multi-stage build, added CVE fixes
nemo_skills/training/nemo_rl/configs/grpo.yaml Added async GRPO support, expanded generation configs, added KL divergence controls, updated scheduler warmup settings
nemo_skills/training/nemo_rl/configs/sft.yaml Added LoRA/PEFT support, enabled gradient clipping with max_grad_norm=0.0, which may break training for Megatron backend

Sequence Diagram

sequenceDiagram
    participant Main as start_grpo.py
    participant Config as Config Validation
    participant AsyncGRPO as async_grpo_train
    participant SyncGRPO as grpo_train
    participant Policy as Policy Model
    participant Gen as Generation Engine
    participant Env as Environment

    Main->>Config: Check async_grpo.enabled
    
    alt Async Mode Enabled
        Config->>Config: Validate unsupported features
        Config-->>Config: Check use_dynamic_sampling
        Config-->>Config: Check reward_scaling
        Config-->>Config: Check reward_shaping
        
        alt Unsupported Feature Found
            Config-->>Main: Raise NotImplementedError
        else All Features Supported
            Main->>AsyncGRPO: Initialize async training
            AsyncGRPO->>Gen: Generate trajectories (async)
            Gen->>Policy: Sample responses
            Gen->>Env: Evaluate rewards
            Note over AsyncGRPO,Gen: Trajectories can be aged<br/>(max_trajectory_age_steps)
            AsyncGRPO->>Policy: Update weights (async)
            Note over AsyncGRPO,Policy: Optional in-flight updates<br/>and KV cache recompute
            AsyncGRPO->>AsyncGRPO: Train with aged trajectories
        end
    else Sync Mode (Default)
        Main->>SyncGRPO: Initialize standard training
        SyncGRPO->>Gen: Generate trajectories (blocking)
        Gen->>Policy: Sample responses
        Gen->>Env: Evaluate rewards
        SyncGRPO->>Policy: Update weights (blocking)
        SyncGRPO->>SyncGRPO: Train on fresh trajectories
    end
Loading

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

📝 Walkthrough

Walkthrough

This pull request introduces multi-stage Docker build infrastructure for the NeMo RL container with conditional VLLM compilation, upgrades UV tooling, and adds comprehensive training configurations for GRPO and SFT workflows including async GRPO support, LoRA configuration, and SwanLab logging integration.

Changes

Cohort / File(s) Summary
Docker multi-stage build refactoring
dockerfiles/Dockerfile.nemo-rl
Introduces nemo-rl, hermetic, and release build stages with commit-driven NeMo RL fetching, UV 0.7.2→0.9.7 upgrade, conditional custom VLLM builds, CUDA architecture targeting (9.0, 10.0), and metadata/fingerprinting support.
GRPO training configuration
nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml, nemo_skills/training/nemo_rl/configs/grpo.yaml
Adds legacy GRPO config template and updates primary config with async GRPO flags, expanded loss/checkpointing settings, KL clamping, refined generation configs (mcore + vllm), Megatron memory optimization, optimizer/scheduler refinement, and SwanLab logger support.
SFT training configuration
nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml, nemo_skills/training/nemo_rl/configs/sft.yaml
Introduces legacy SFT config template and updates primary config with metric tracking changes, dtensor v2 enhancements, LoRA/PEFT configuration blocks, dynamic batching refinement, sequence packing adjustments, optimizer/scheduler updates, and SwanLab logger support.
Async GRPO training support
nemo_skills/training/nemo_rl/start_grpo.py
Adds conditional branching to invoke async GRPO training path when enabled, with validation for unsupported feature combinations and extended parameter passing.
Deferred fire import
nemo_skills/utils.py
Moves fire and fire_decorators imports from module scope to function scope in check_no_extra_args_fire to support environments without fire installed.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

run GPU tests

Suggested reviewers

  • Kipok
  • wedu-nvidia
🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Update nemo-rl to latest' is generic and vague, failing to convey specific information about what was actually updated or why. Consider a more descriptive title like 'Update nemo-rl with async RL support and vllm 11.0' that highlights the main changes and value of the update.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml`:
- Line 279: Remove the duplicate root-level key `checkpoint_must_save_by` and
rely on the existing `checkpointing.checkpoint_must_save_by` configuration; edit
the YAML to delete the top-level `checkpoint_must_save_by: null` entry so there
is only one authoritative `checkpointing.checkpoint_must_save_by` setting, and
scan for any other duplicate top-level keys to avoid OmegaConf resolution
conflicts.

In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Around line 96-108: The YAML key lora_dtype in the peft config is currently
set to the literal None which may be parsed as a string; update the value to an
explicit null if you mean "unset" (set lora_dtype: null) or to a quoted string
if you mean the string "None" (set lora_dtype: "None"); locate the peft block
(peft.enabled, peft.dim, etc.) and make the change to lora_dtype accordingly.
🧹 Nitpick comments (2)
dockerfiles/Dockerfile.nemo-rl (1)

149-149: Consider pinning NeMo-Skills to a specific commit for reproducibility.

Unlike the NeMo-RL repository which uses NEMO_RL_COMMIT for version pinning, NeMo-Skills is cloned from the default branch without a specific commit reference. This means builds at different times may include different NeMo-Skills code.

If reproducibility is important, consider adding a similar NEMO_SKILLS_COMMIT build argument:

♻️ Suggested improvement
+ARG NEMO_SKILLS_COMMIT=main
-RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && uv pip install .
+RUN git clone https://github.com/NVIDIA-NeMo/Skills.git /opt/NeMo-Skills && cd /opt/NeMo-Skills && git checkout ${NEMO_SKILLS_COMMIT} && uv pip install .
nemo_skills/training/nemo_rl/start_grpo.py (1)

331-350: Async GRPO feature validation looks good, but verify importance sampling requirement.

The feature compatibility checks correctly prevent unsupported features from being used with async GRPO. However, the config comment in grpo-legacy-85eeb8d.yaml (lines 45-47) states:

"Async GRPO requires importance sampling correction enabled. Set to true when async_grpo.enabled is true"

Consider adding a validation or warning if async_grpo.enabled is true but loss_fn.use_importance_sampling_correction is false, as this configuration dependency isn't enforced here.

Suggested validation
         for feature in unsupported_features:
             if feature not in config["grpo"]:
                 continue

             if feature == "use_dynamic_sampling":
                 if config["grpo"][feature]:
                     raise NotImplementedError(f"{feature} is not supported with async GRPO")
             else:
                 if config["grpo"][feature]["enabled"]:
                     raise NotImplementedError(f"{feature} is not supported with async GRPO")

+        # Warn if importance sampling correction is not enabled for async GRPO
+        if not config["loss_fn"].get("use_importance_sampling_correction", False):
+            print("⚠️  Warning: Async GRPO typically requires use_importance_sampling_correction=true in loss_fn")
+
         from nemo_rl.algorithms.grpo import async_grpo_train
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2571d0 and b8c0aaf.

📒 Files selected for processing (7)
  • dockerfiles/Dockerfile.nemo-rl
  • nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml
  • nemo_skills/training/nemo_rl/configs/grpo.yaml
  • nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml
  • nemo_skills/training/nemo_rl/configs/sft.yaml
  • nemo_skills/training/nemo_rl/start_grpo.py
  • nemo_skills/utils.py
🧰 Additional context used
🪛 Hadolint (2.14.0)
dockerfiles/Dockerfile.nemo-rl

[error] 10-10: invalid flag: --keep-git-dir

(DL1000)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Greptile Review
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (26)
dockerfiles/Dockerfile.nemo-rl (7)

7-11: BuildKit-specific ADD flag requires recent Docker versions.

The --keep-git-dir=true flag on line 10 is a BuildKit feature for ADD with Git URLs. The # syntax=docker/dockerfile:1 directive on line 1 properly enables this, but be aware that:

  1. Older Docker versions or builds without BuildKit may fail
  2. Hadolint's warning is a false positive—this flag is valid with BuildKit

The stage looks good for fetching the NeMo RL repository with git metadata preserved.


41-42: Good: CVE remediation for gnupg.

Proactive security fix for CVE-2025-68973. The --only-upgrade flag ensures it only upgrades the existing package without pulling new dependencies.


49-49: UV version upgrade from 0.7.2 to 0.9.7.

This is a significant version bump. If any compatibility issues arise with dependency resolution or lock file handling, this would be a likely suspect.


79-80: CUDA architecture targeting for H100 and B200.

Setting TORCH_CUDA_ARCH_LIST="9.0 10.0" targets:

  • 9.0: Hopper (H100)
  • 10.0: Blackwell (B200/GB200)

This aligns with the PR objective of adding GB200 cluster support.


90-112: Conditional VLLM build and CVE mitigation look good.

The conditional custom VLLM build is properly guarded, and the CVE mitigation for aiohttp (GHSA-mqqc-3gqh-h2x8) using find -exec rm -rf is an effective approach.

One minor note: if BUILD_CUSTOM_VLLM is set but the nemo-rl.env file doesn't exist (e.g., build script failure), line 94 will fail the build. This is likely the desired behavior, but worth being aware of.


134-138: COPY --exclude and git unshallow logic.

The --exclude flag is another BuildKit feature (properly enabled by the syntax directive). The conditional unshallow logic on line 138 is well-designed—it checks if the repo is shallow before attempting to fetch full history.

Note: git fetch --unshallow requires network access. If the build environment is network-isolated after the initial clone, this could fail. The || true fallback handles this gracefully, though the fingerprint generation may produce different results for shallow vs. full repos.


146-147: OSS attribution notice generation.

Good compliance practice for NVIDIA open-source distribution requirements.

nemo_skills/utils.py (1)

508-512: LGTM! Lazy import pattern for optional dependency.

Moving fire and fire.decorators imports inside the function is a good approach for environments where fire is not installed. The comment clearly documents the rationale.

nemo_skills/training/nemo_rl/start_grpo.py (2)

351-371: LGTM! Async GRPO training integration.

The lazy import of async_grpo_train and the parameter passing looks correct. The additional max_trajectory_age_steps parameter properly sources from the async config section.


372-389: LGTM!

The synchronous GRPO training path is preserved correctly as the fallback when async mode is not enabled.

nemo_skills/training/nemo_rl/configs/sft-legacy-85eeb8d.yaml (2)

1-14: LGTM! Legacy SFT configuration.

The configuration provides sensible defaults. The large max_num_epochs and max_num_steps values (100000000) ensure that one of them can be set to control training duration without the other being a limiting factor.


56-115: Configuration aligns with PR objectives.

The Megatron configuration with empty_unused_memory_level: 0 is documented with the appropriate OOM warning. The commented-out clip_grad aligns with the PR description noting that "gradient clipping is not unified yet."

nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml (2)

33-50: Async GRPO configuration is properly documented.

The async GRPO section correctly defaults to disabled and includes helpful comments about importance sampling correction requirements. This aligns well with the validation in start_grpo.py.


82-89: The configuration does not present a conflict. Although both dtensor_cfg.enabled and megatron_cfg.enabled are set to true in the YAML file, the nemo-rl pipeline code (nemo_skills/pipeline/nemo_rl/sft.py and nemo_skills/pipeline/nemo_rl/grpo.py) explicitly enforces mutual exclusivity through CLI overrides: when backend=="megatron", dtensor is disabled; otherwise, megatron is disabled. Only one backend is active per training run by design.

nemo_skills/training/nemo_rl/configs/sft.yaml (3)

56-68: LGTM! Well-documented LoRA configuration.

The LoRA configuration section is well-structured with helpful comments explaining each parameter, including the important note about disabling Triton when tensor_parallel_size > 1.


216-218: LGTM! SwanLab logger integration.

The SwanLab configuration follows the established pattern used by wandb and other loggers, maintaining consistency across the logging options.


19-19: LGTM!

The metric name format change to "val:val_loss" with the explanatory comment about prefix options is clear and helpful.

nemo_skills/training/nemo_rl/configs/grpo.yaml (9)

1-31: Good traceability with upstream commit reference.

The source commit hash comment provides clear lineage for tracking configuration changes against upstream nemo-rl. The GRPO algorithm settings appear reasonable for math training workloads.


32-37: LGTM!

Conservative defaults for async GRPO with clear inline documentation. The disabled state is appropriate for synchronous training mode.


39-57: LGTM!

The KL divergence configuration with k3 approximation and clamping values is well-documented. The reference to the Joschu blog provides helpful context for understanding the KL type choices.


59-68: LGTM!

The safetensors format is a good choice for model serialization due to its safety and performance benefits. The metric naming convention with colon separator is consistent with the documented format.


70-100: LGTM!

The policy configuration provides good flexibility with hf_config_overrides and clear documentation for the optimizer offloading behavior. The dtensor _v2: true flag aligns with the enhanced distributed tensor tooling.


154-161: Verify the warmup configuration values.

lr_warmup_iters: 13 is an unusual number. Typically warmup iterations are round numbers or percentages of total training. If this is intentional (e.g., matching a specific upstream configuration), the value is fine. Otherwise, consider whether this should be a rounder number like 10 or 100.


175-218: LGTM!

The optimizer configuration correctly disables foreach and fused for DTensor compatibility with clear documentation. The sequential scheduler setup (LinearLR warmup → ConstantLR) with milestone at iteration 10 provides a clean warmup phase.


281-302: LGTM!

Good observability defaults with GPU monitoring enabled and consistent naming across logging backends. The SwanLab integration provides additional logging flexibility.


227-254: vLLM 0.11 compatibility confirmed for these configuration settings.

All cited vllm_cfg parameters (kv_cache_dtype: "auto", enforce_eager: False, use_deep_gemm: False) are supported in vLLM 0.11.0 and correctly configured. The mcore_generation_config defaults for KV cache management are reasonable, and the inline comment referencing convergence considerations with torch.compile provides good context for understanding the enforce_eager setting.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

gpus_per_node: 1
num_nodes: 1

checkpoint_must_save_by: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate checkpoint_must_save_by key at root level.

This key already exists under checkpointing.checkpoint_must_save_by (line 59). The root-level duplicate may be unintentional or could cause configuration loading issues depending on how OmegaConf resolves it.

Suggested fix
 cluster:
   gpus_per_node: 1
   num_nodes: 1
-
-checkpoint_must_save_by: null
🤖 Prompt for AI Agents
In `@nemo_skills/training/nemo_rl/configs/grpo-legacy-85eeb8d.yaml` at line 279,
Remove the duplicate root-level key `checkpoint_must_save_by` and rely on the
existing `checkpointing.checkpoint_must_save_by` configuration; edit the YAML to
delete the top-level `checkpoint_must_save_by: null` entry so there is only one
authoritative `checkpointing.checkpoint_must_save_by` setting, and scan for any
other duplicate top-level keys to avoid OmegaConf resolution conflicts.

Comment on lines +96 to 108
peft:
enabled: false
target_modules: []
exclude_modules: []
dim: 8
alpha: 32
dropout: 0.0
dropout_position: "post"
lora_A_init_method: "xavier"
lora_B_init_method: "zero"
a2a_experimental: false
lora_dtype: None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

lora_dtype: None may not parse as expected.

In YAML, None is typically a string literal, not a null value. If the intention is to represent a null/unset value, use null instead. If it's meant to be the string "None", wrap it in quotes.

Suggested fix (if null is intended)
       a2a_experimental: false
-      lora_dtype: None
+      lora_dtype: null
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
peft:
enabled: false
target_modules: []
exclude_modules: []
dim: 8
alpha: 32
dropout: 0.0
dropout_position: "post"
lora_A_init_method: "xavier"
lora_B_init_method: "zero"
a2a_experimental: false
lora_dtype: None
peft:
enabled: false
target_modules: []
exclude_modules: []
dim: 8
alpha: 32
dropout: 0.0
dropout_position: "post"
lora_A_init_method: "xavier"
lora_B_init_method: "zero"
a2a_experimental: false
lora_dtype: null
🤖 Prompt for AI Agents
In `@nemo_skills/training/nemo_rl/configs/sft.yaml` around lines 96 - 108, The
YAML key lora_dtype in the peft config is currently set to the literal None
which may be parsed as a string; update the value to an explicit null if you
mean "unset" (set lora_dtype: null) or to a quoted string if you mean the string
"None" (set lora_dtype: "None"); locate the peft block (peft.enabled, peft.dim,
etc.) and make the change to lora_dtype accordingly.

lr_decay_iters: ${grpo.max_num_steps}
lr_warmup_iters: 0
lr_warmup_init: 1.0e-6
lr_decay_style: "constant"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe keep original defaults? Otherwise it will be harder to set LR decay. While cosine can work as both constant schedule as well as decay by just adjusting min lr

foreach: False
fused: False

scheduler:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and same here, ideally keep original

Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

# makes the training sequence length divisible by the tensor parallel size
# this is useful for sequence parallel training
make_sequence_length_divisible_by: ${policy.dtensor_cfg.tensor_parallel_size}
max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_grad_norm: 0.0 will set clip_grad: 0.0 in Megatron config (line 130), causing all gradients to be clipped to 0 and breaking training. The comment says "Zero means no clipping" but Megatron's clip_grad parameter interprets 0 as a clipping threshold. Should be null to disable clipping, or a positive value like 1.0.

Suggested change
max_grad_norm: 0.0 # megatron: Zero means no clipping, FSDP: null means no clipping
max_grad_norm: null # megatron: null means no clipping, FSDP: null means no clipping

Comment on lines +41 to +42
# To fix CVE-2025-68973
apt install -y --only-upgrade gnupg
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CVE-2025-68973 doesn't exist yet. CVE IDs follow the format CVE-YYYY-NNNNN where YYYY is the year of assignment. 2025 IDs would only be assigned in 2025. Verify this is the correct CVE number.

Comment on lines +207 to +213
scheduler:
- name: "torch.optim.lr_scheduler.LinearLR"
kwargs:
start_factor: 0.1
end_factor: 1.0
total_iters: 10
- name: "torch.optim.lr_scheduler.CosineAnnealingLR"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed from constant LR (start_factor=1.0, end_factor=1.0, total_iters=1) to warmup schedule (start_factor=0.1 for 10 steps). This significantly changes training behavior - LR now starts at 10% and warms up over 10 steps.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Kipok added 4 commits January 27, 2026 17:35
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

ctx=wrap_arguments(
"++data.prompt.prompt_config=qwen/math-cot "
"++grpo.max_num_steps=5 "
"++grpo.lr_warmup_steps=2 "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grpo.lr_warmup_steps doesn't exist in nemo_skills/training/nemo_rl/configs/grpo.yaml. The test will fail with this parameter.

Suggested change
"++grpo.lr_warmup_steps=2 "
"++grpo.num_prompts_per_step=2 "

Kipok
Kipok previously approved these changes Jan 28, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Kipok Kipok dismissed their stale review January 28, 2026 05:23

tests fail

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile


hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata")

if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.

Suggested change
if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):
if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir):

Kipok added 2 commits January 27, 2026 22:06
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile


hf_metadata_dir = os.path.join(args.input_dir, ".hf_metadata")

if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic error: should be or not and. If .hf_metadata exists as a file (not directory), this check won't catch it.

Suggested change
if not os.path.exists(hf_metadata_dir) and not os.path.isdir(hf_metadata_dir):
if not os.path.exists(hf_metadata_dir) or not os.path.isdir(hf_metadata_dir):

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +97 to +108
"""
tokenizer_files = [
"tokenizer.json",
"tokenizer_config.json",
"special_tokens_map.json",
"vocab.json",
"merges.txt",
"added_tokens.json",
"chat_template.jinja",
]
for fname in tokenizer_files:
src = os.path.join(tokenizer_path, fname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing tokenizer files when tokenizer_path is not a local directory

When tokenizer_path is a HuggingFace model ID (e.g., meta-llama/Llama-3.2-1B), os.path.exists(src) will fail for all files since it's not a local path. This will silently skip copying tokenizer files, potentially breaking the converted checkpoint.

Consider downloading tokenizer files from HF first or handling the case where tokenizer_path is a model ID.

Comment on lines +87 to +88
hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")
return os.path.isdir(hf_metadata_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.isdir() check is insufficient

If .hf_metadata exists as a file (not directory), this will incorrectly return False and use the wrong conversion path. Use:

Suggested change
hf_metadata_path = os.path.join(weights_path, "model", ".hf_metadata")
return os.path.isdir(hf_metadata_path)
return os.path.exists(hf_metadata_path) and os.path.isdir(hf_metadata_path)

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@Kipok Kipok merged commit c4854b8 into main Jan 29, 2026
6 checks passed
@Kipok Kipok deleted the smahdavi/nemo-rl-update branch January 29, 2026 21:43
sgunasekar added a commit that referenced this pull request Mar 11, 2026
commit a5da597
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Mar 6 12:13:36 2026 -0800

    Revert "Eval kit support  (#1239)" (#1294)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit b237e33
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Mar 6 20:25:37 2026 +0400

    Eval kit support  (#1239)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

commit dc28bbf
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Mar 5 10:17:44 2026 -0800

    Python direct tool calling without MCP (#1286)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 12454dd
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Mar 4 13:06:21 2026 -0800

    Allow het servers for nemo-rl jobs (#1223)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 8884a68
Author: Prasoon Varshney <prasoon1995@gmail.com>
Date:   Wed Mar 4 10:24:02 2026 -0800

    Support source_lang param for translation recipe (#1290)

    Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 4618b19
Author: Meriem B. <113170426+ka00ri@users.noreply.github.com>
Date:   Wed Mar 4 18:59:28 2026 +0100

    Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285)

    Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 5ac8609
Author: Talor Abramovich <talor19@gmail.com>
Date:   Wed Mar 4 02:30:06 2026 +0200

    Add SPEED-Bench (within repo) (#1279)

    Signed-off-by: Talor Abramovich <talora@nvidia.com>
    Signed-off-by: talora <talora@nvidia.com>
    Signed-off-by: Talor Abramovich <talor19@gmail.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>

commit c31eec5
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 12:18:15 2026 -0800

    Fix os.getlogin() crash in ns setup (#1289)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit c228e66
Author: George Armstrong <georgea@nvidia.com>
Date:   Tue Mar 3 11:04:54 2026 -0800

    Fix streaming TypeError when delta.content is None (#1267) (#1288)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit aa47923
Author: Matvei Novikov <mnovikov@nvidia.com>
Date:   Mon Mar 2 16:28:41 2026 -0800

    Add LibTrace recipe for generating domain-specific reasoning data (#1224)

    Signed-off-by: jubick1337 <mnovikov@nvidia.com>
    Signed-off-by: mnovikov <mnovikov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 313cad7
Author: Stephen Ge <stepheng@nvidia.com>
Date:   Mon Mar 2 18:28:49 2026 -0500

    fix: clean parse-failure retries in prover (#1284)

    Signed-off-by: Stephen Ge <stepheng@nvidia.com>

commit 813cfa3
Author: George Armstrong <georgea@nvidia.com>
Date:   Mon Mar 2 15:10:08 2026 -0800

    tst: rollback inference-api to integrate (#1287)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 31735f9
Author: Valentin Mendelev <vmendelev@nvidia.com>
Date:   Mon Mar 2 23:11:25 2026 +0100

    Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250)

    Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com>

commit d4ef8c0
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Fri Feb 27 23:58:54 2026 +0400

    Update promt_config to working with openai format + inline setup (#1210)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit e879cbc
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:41:23 2026 -0800

    Update noc tutorial (#1282)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit f6e3505
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 27 10:17:33 2026 -0800

    Add noc reasoning tutorial (#1278)

    Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com>
    Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com>
    Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com>
    Co-authored-by: Cursor <cursoragent@cursor.com>
    Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com>
    Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com>

commit fc2072a
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 27 10:10:25 2026 -0800

    CritPt generation add prompt_format=None (#1280)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit c8abe5d
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 27 09:31:26 2026 -0800

    New slurm customization parameters (account, containers) (#1209)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 2b38cce
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 25 17:59:52 2026 -0800

    Add nemo-skills-core subpackage for lightweight installs (#1229)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 9fa8e83
Author: Dheeraj Peri <peri.dheeraj@gmail.com>
Date:   Wed Feb 25 12:56:35 2026 -0800

    feat: add custom judge type support for external repo integration (#1274)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Dheeraj Peri <dperi@nvidia.com>
    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com>
    Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>

commit 8a32b13
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 24 15:24:42 2026 -0800

    Exclude numb3rs form test_eval.py (#1275)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6da2219
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Mon Feb 23 18:37:46 2026 +0400

    Numb3rs ds addition (#1174)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

commit ad034b5
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Sun Feb 22 11:55:24 2026 -0800

    Add DSBench-DA evaluation (#1254)

    Squash merge of changes during code-review.
    Signed-off-by: suriya <sgunasekar@nvidia.com>

commit 7593ab3
Author: Jiacheng Xu <jcxu@utexas.edu>
Date:   Fri Feb 20 16:42:01 2026 -0800

    Add CritPt benchmark (#1200)

    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 58c31b2
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 20 16:19:22 2026 -0800

    Fix no_answer metric overcounting in _compute_pass_at_k (#1245)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 1f1a2e7
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 15:58:40 2026 -0800

    Fix incorrect prompt tokens count due to HF api update (#1264)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8ebc6f5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 20 09:05:33 2026 -0800

    Remove deprecated dataset group (#1263)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit ea4177f
Author: Yongqiang Wang <yongqiang.seagull@gmail.com>
Date:   Thu Feb 19 19:57:25 2026 -0500

    fix deps (#1258)

commit 60905a7
Author: Minho Ryu <ryumin93@gmail.com>
Date:   Fri Feb 20 09:39:39 2026 +0900

    Add aime26 (#1256)

    Signed-off-by: bzantium <ryumin93@gmail.com>

commit b28afc5
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:18:25 2026 -0800

    Rename custom -> external benchmarks (#1262)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 6cc9c45
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:10:33 2026 -0800

    Add reference to internal benchmarks repo (#1261)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 5202af6
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 16:08:05 2026 -0800

    Remove incorrect presence-penalty setting (#1259)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 144c70b
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 19 15:26:33 2026 -0800

    Adding an option to store benchmarks in external repo (#1240)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>

commit 10e6e39
Author: George <37293288+Jorjeous@users.noreply.github.com>
Date:   Thu Feb 19 19:57:21 2026 +0400

    update vllm miltimodal for api calls convenience (#1213)

    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com>

commit 1ba4219
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Wed Feb 18 03:28:23 2026 +0400

    Fix --server_container not being applied to dependent jobs (#1244)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit 9517614
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Mon Feb 16 11:13:24 2026 -0800

    Support mini-swe-agent as agent harness (#1212)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Signed-off-by: George Armstrong <georgea@nvidia.com>
    Signed-off-by: Charlie Truong <chtruong@nvidia.com>
    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Stephen Ge <stepheng@nvidia.com>
    Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com>
    Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
    Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com>
    Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com>
    Signed-off-by: Wei Du <wedu@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com>
    Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>
    Co-authored-by: Ivan <imoshkov@nvidia.com>
    Co-authored-by: George Armstrong <georgea@nvidia.com>
    Co-authored-by: Charlie Truong <chtruong@nvidia.com>
    Co-authored-by: Nick Ludwig <nliudvig@nvidia.com>
    Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com>
    Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
    Co-authored-by: Minho Ryu <ryumin93@gmail.com>
    Co-authored-by: Stephen Ge <stepheng@nvidia.com>
    Co-authored-by: Jiacheng Xu <jcxu@utexas.edu>
    Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com>
    Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com>
    Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com>
    Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com>
    Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
    Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com>
    Co-authored-by: Wei Du <wedu@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sean Naren <snarenthiran@nvidia.com>
    Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com>
    Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com>
    Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com>

commit a3d44dc
Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com>
Date:   Fri Feb 13 22:32:15 2026 -0800

    Add --installation_command support to prepare_data (#1243)

    Signed-off-by: suriya <sgunasekar@nvidia.com>
    Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

commit e80d524
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 12 17:26:00 2026 -0800

    Fix CI disk space for Docker image builds (#1241)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit d22236c
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Wed Feb 11 17:55:00 2026 -0800

    Fix answerbench prompt parsing (#1235)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 2401628
Author: George Armstrong <georgea@nvidia.com>
Date:   Wed Feb 11 14:56:43 2026 -0800

    feat: add lockfiles for reproducible sandbox builds (#1233)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5a0a84d
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Wed Feb 11 13:30:03 2026 -0800

    removing datasets version restriction for LCB eval (#1230)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit ef0a890
Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>
Date:   Wed Feb 11 12:03:16 2026 +0400

    Gnalbandyan/add physics (#1214)

    Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com>
    Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com>

commit bd9d30c
Author: Wasi Ahmad <wasiahmad@ucla.edu>
Date:   Tue Feb 10 15:13:27 2026 -0800

    LCB generic prompting (#1215)

    Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

commit 7d6c49a
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Sat Feb 7 08:45:46 2026 -0800

    Add support for different variations of nemo-rl (#1220)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit b19ba96
Author: George Armstrong <georgea@nvidia.com>
Date:   Fri Feb 6 21:40:56 2026 -0800

    Add multi-node sandbox support for SLURM clusters (#1218)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 8950bb0
Author: anowaczynski-nvidia <anowaczynski@nvidia.com>
Date:   Sat Feb 7 01:38:00 2026 +0100

    support structured outputs in hle judge for optional AA compatibility (#1186)

    Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b84f7a2
Author: Igor Gitman <igitman@nvidia.com>
Date:   Fri Feb 6 14:51:02 2026 -0800

    A small update on running tests docs (#1219)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 8e838e1
Author: George Armstrong <georgea@nvidia.com>
Date:   Thu Feb 5 18:01:35 2026 -0800

    feat: add flag to disable sandbox replay (#1217)

    Signed-off-by: George Armstrong <georgea@nvidia.com>

commit 5fd9085
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Feb 5 15:57:01 2026 -0800

    Add an option to limit number of tool calls (#1216)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit d820200
Author: Igor Gitman <igitman@nvidia.com>
Date:   Tue Feb 3 10:43:55 2026 -0800

    Add arena-hard v2 (#1205)

    Signed-off-by: bzantium <ryumin93@gmail.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: bzantium <ryumin93@gmail.com>

commit a30920e
Author: Igor Gitman <igitman@nvidia.com>
Date:   Mon Feb 2 10:53:55 2026 -0800

    Fix mkdocs warnings (#1204)

    Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit 19d7788
Author: Ivan <imoshkov@nvidia.com>
Date:   Mon Feb 2 23:25:13 2026 +0500

    Fix infinite wait in sandbox.wait_for_sandbox (#1206)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>

commit 3e65fbf
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Fri Jan 30 19:38:38 2026 -0800

    Improve tts (#1203)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 250c862
Author: Nick Ludwig <nliudvig@nvidia.com>
Date:   Fri Jan 30 22:12:29 2026 +0400

    SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202)

    Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

commit 7ded756
Author: Ivan <imoshkov@nvidia.com>
Date:   Fri Jan 30 09:57:41 2026 +0500

     Add proper token counting to code execution model (#1184)

    Signed-off-by: i-vainn <imoshkov@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>

commit b986304
Author: Igor Gitman <igitman@nvidia.com>
Date:   Thu Jan 29 17:57:07 2026 -0800

    Upgrade containers (#1198)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com>

commit 3b44f02
Author: Dan Lord <blahblahasdf@gmail.com>
Date:   Thu Jan 29 16:40:47 2026 -0800

    Fix incorrect string format (#1199)

    Signed-off-by: dlord <dlord@nvidia.com>

commit c4854b8
Author: Sadegh Mahdavi <smahdavi4@gmail.com>
Date:   Thu Jan 29 13:43:36 2026 -0800

    Update nemo-rl to latest (#1087)

    Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
    Signed-off-by: Igor Gitman <igitman@nvidia.com>
    Co-authored-by: Igor Gitman <igitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants