Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ List issues that this PR closes ([syntax](https://docs.github.com/en/issues/trac
* **You can potentially add a usage example below**

```python
# Add a code snippet demonstrating how to use this
# Add a code snippet demonstrating how to use this
```

# Before your PR is "Ready for review"
**Pre checks**:
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md)
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md)
- [ ] Did you write any new necessary tests?
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/NeMo-RL/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/NeMo-RL/blob/main/docs/documentation.md) for how to write, build and test the docs.
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA-NeMo/RL/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA-NeMo/RL/blob/main/docs/documentation.md) for how to write, build and test the docs.

# Additional Information
* ...
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ We follow a direct clone and branch workflow for now:

1. Clone the repository directly:
```bash
git clone https://github.com/NVIDIA/NeMo-RL
git clone https://github.com/NVIDIA-NeMo/RL
cd nemo-rl
```

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ cd nemo-rl
# by running (This is not necessary if you are using the pure Pytorch/DTensor path):
git submodule update --init --recursive

# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure
# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure
# submodules are automatically updated after switching branches or pulling updates by configuring git with:
# git config submodule.recurse true

Expand Down Expand Up @@ -226,7 +226,7 @@ sbatch \
We also support multi-turn generation and training (tool use, games, etc.).
Reference example for training to play a Sliding Puzzle Game:
```sh
uv run python examples/run_grpo_sliding_puzzle.py
uv run python examples/run_grpo_sliding_puzzle.py
```

## Supervised Fine-Tuning (SFT)
Expand Down Expand Up @@ -409,16 +409,16 @@ If you use NeMo RL in your research, please cite it using the following BibTeX e
```bibtex
@misc{nemo-rl,
title = {NeMo RL: A Scalable and Efficient Post-Training Library},
howpublished = {\url{https://github.com/NVIDIA/NeMo-RL}},
howpublished = {\url{https://github.com/NVIDIA-NeMo/RL}},
year = {2025},
note = {GitHub repository},
}
```

## Contributing

We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md) for more information on how to get involved.
We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md) for more information on how to get involved.

## Licenses

NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA/NeMo-RL/blob/main/LICENSE).
NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA-NeMo/RL/blob/main/LICENSE).
32 changes: 16 additions & 16 deletions docs/adding-new-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ $$\text{KL} = E_{x \sim \pi}[\pi(x) - \pi_{\text{ref}}(x)]$$

When summed/integrated, replacing the $x \sim \pi$ with $x \sim \pi_{\text{wrong}}$ leads to an error of:

$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$
$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$

So, to verify correctness, we calculate:

Expand Down Expand Up @@ -65,28 +65,28 @@ When investigating discrepancies beyond the acceptable threshold, focus on these

When validating Hugging Face-based models, perform the following checks:

- **Compare log probabilities**
- **Compare log probabilities**
Ensure the generation log probabilities from inference backends like **vLLM** match those computed by Hugging Face. This comparison helps diagnose potential mismatches.

- **Test parallelism**
- **Test parallelism**
Verify consistency with other parallelism settings.

- **Variance**
- **Variance**
Repeat tests multiple times (e.g., 10 runs) to confirm that behavior is deterministic or within acceptable variance.

- **Check sequence lengths**
Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens.
- **Check sequence lengths**
Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens.
Ensure the model behaves consistently at each length.

- **Use real and dummy data**
- **Real data:** Tokenize and generate from actual text samples.
- **Use real and dummy data**
- **Real data:** Tokenize and generate from actual text samples.
- **Dummy data:** Simple numeric sequences to test basic generation.

- **Vary sampling parameters**
Test both greedy and sampling generation modes.
- **Vary sampling parameters**
Test both greedy and sampling generation modes.
Adjust temperature and top-p to confirm output consistency across backends.

- **Test different batch sizes**
- **Test different batch sizes**
Try with batch sizes of 1, 8, and 32 to ensure consistent behavior across different batch configurations.

---
Expand All @@ -95,11 +95,11 @@ When validating Hugging Face-based models, perform the following checks:

### Additional Validation

- **Compare Megatron outputs**
- **Compare Megatron outputs**
Ensure the Megatron forward pass aligns with Hugging Face and the generation log probabilities from inference backends like **vLLM**.

- **Parallel settings**
Match the same parallelism configurations used for the HuggingFace-based tests.
- **Parallel settings**
Match the same parallelism configurations used for the HuggingFace-based tests.
Confirm outputs remain consistent across repeated runs.

---
Expand Down Expand Up @@ -128,7 +128,7 @@ By following these validation steps and ensuring your model's outputs remain con
We also maintain a set of standalone scripts that can be used to diagnose issues related to correctness that
we have encountered before.

## [1.max_model_len_respected.py](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py)
## [1.max_model_len_respected.py](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py)

Test if a new model respects the `max_model_len` passed to vllm:

Expand All @@ -142,7 +142,7 @@ uv run --extra vllm tools/model_diagnostics/1.max_model_len_respected.py Qwen/Qw
# [Qwen/Qwen2.5-1.5B] ALL GOOD!
```

## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py)
## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py)

Test that vLLM yields near-identical token log-probabilities when comparing decoding with a single prefill pass across multiple prompts.

Expand Down
2 changes: 1 addition & 1 deletion docs/model-quirks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document outlines special cases and model-specific behaviors that require c

### Tied Weights

Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA/NeMo-RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1.
Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA-NeMo/RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1.

**Special Handling:**
- We skip the tied weights check for all Gemma-3 models when using the DTensor policy, allowing training using TP > 1.
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/grpo-deepscaler-1.5b-8K.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ checkpointing:
save_period: 10

policy:
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227)
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227)
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/grpo_math_1B.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ checkpointing:
save_period: 10

policy:
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227)
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227)
model_name: "Qwen/Qwen2.5-1.5B"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
Expand Down
4 changes: 2 additions & 2 deletions examples/configs/grpo_sliding_puzzle.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ policy:
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
# Setting top_p/top_k to 0.999/10000 to strip out Qwen's special/illegal tokens
# https://github.com/NVIDIA/NeMo-RL/issues/237
# https://github.com/NVIDIA-NeMo/RL/issues/237
top_p: 0.999
top_k: 10000
stop_token_ids: null
Expand All @@ -38,7 +38,7 @@ policy:

data:
add_system_prompt: false

env:
sliding_puzzle_game:
cfg:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ policy:
context_parallel_size: 1
custom_parallel_plan: null
dynamic_batching:
# TODO: OOMs if enabled https://github.com/NVIDIA/NeMo-RL/issues/383
# TODO: OOMs if enabled https://github.com/NVIDIA-NeMo/RL/issues/383
enabled: False
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
Expand Down
2 changes: 1 addition & 1 deletion examples/converters/convert_dcp_to_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def main():

model_name_or_path = config["policy"]["model_name"]
# TODO: After the following PR gets merged:
# https://github.com/NVIDIA/NeMo-RL/pull/148/files
# https://github.com/NVIDIA-NeMo/RL/pull/148/files
# tokenizer should be copied from policy/tokenizer/* instead of relying on the model name
# We can expose a arg at the top level --tokenizer_path to plumb that through.
# This is more stable than relying on the current NeMo-RL get_tokenizer() which can
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/algorithms/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class DPOConfig(TypedDict):
preference_average_log_probs: bool
sft_average_log_probs: bool
## TODO(@ashors) support other loss functions
## https://github.com/NVIDIA/NeMo-RL/issues/193
## https://github.com/NVIDIA-NeMo/RL/issues/193
# preference_loss: str
# gt_reward_scale: float
preference_loss_weight: float
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/distributed/ray_actor_environment_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
ACTOR_ENVIRONMENT_REGISTRY: dict[str, str] = {
"nemo_rl.models.generation.vllm.VllmGenerationWorker": PY_EXECUTABLES.VLLM,
# Temporary workaround for the coupled implementation of DTensorPolicyWorker and vLLM.
# This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA/NeMo-RL/issues/501 is resolved.
# This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA-NeMo/RL/issues/501 is resolved.
"nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker": PY_EXECUTABLES.VLLM,
"nemo_rl.models.policy.fsdp1_policy_worker.FSDP1PolicyWorker": PY_EXECUTABLES.BASE,
"nemo_rl.models.policy.megatron_policy_worker.MegatronPolicyWorker": PY_EXECUTABLES.MCORE,
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/models/generation/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ def _patch_vllm_init_workers_ray():
enable_prefix_caching=torch.cuda.get_device_capability()[0] >= 8,
dtype=self.cfg["vllm_cfg"]["precision"],
seed=seed,
# Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA/NeMo-RL/issues/186)
# Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA-NeMo/RL/issues/186)
enforce_eager=True,
max_model_len=self.cfg["vllm_cfg"]["max_model_len"],
trust_remote_code=True,
Expand Down
4 changes: 2 additions & 2 deletions nemo_rl/models/policy/dtensor_policy_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def __init__(
device_map="cpu", # load weights onto CPU initially
# Always load the model in float32 to keep master weights in float32.
# Keeping the master weights in lower precision has shown to cause issues with convergence.
# https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models.
# https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models.
torch_dtype=torch.float32,
trust_remote_code=True,
**sliding_window_overwrite(
Expand Down Expand Up @@ -381,7 +381,7 @@ def train(
and not self.skip_tie_check
):
raise ValueError(
f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
)
if gbs is None:
gbs = self.cfg["train_global_batch_size"]
Expand Down
6 changes: 3 additions & 3 deletions nemo_rl/models/policy/fsdp1_policy_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def __init__(
device_map="cpu", # load weights onto CPU initially
# Always load the model in float32 to keep master weights in float32.
# Keeping the master weights in lower precision has shown to cause issues with convergence.
# https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models.
# https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models.
torch_dtype=torch.float32,
trust_remote_code=True,
**sliding_window_overwrite(
Expand All @@ -110,7 +110,7 @@ def __init__(
self.reference_model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="cpu", # load weights onto CPU initially
torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA/nemo-rl/issues/13 is fixed
torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA-NeMo/RL/issues/13 is fixed
trust_remote_code=True,
**sliding_window_overwrite(
model_name
Expand Down Expand Up @@ -249,7 +249,7 @@ def train(
skip_tie_check = os.environ.get("NRL_SKIP_TIED_WEIGHT_CHECK")
if self.num_tied_weights != 0 and not skip_tie_check:
raise ValueError(
f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
)

if gbs is None:
Expand Down
4 changes: 2 additions & 2 deletions nemo_rl/package_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
__contact_names__ = "NVIDIA"
__contact_emails__ = "nemo-tookit@nvidia.com"
__homepage__ = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/"
__repository_url__ = "https://github.com/NVIDIA/NeMo-RL"
__download_url__ = "https://github.com/NVIDIA/NeMo-RL/releases"
__repository_url__ = "https://github.com/NVIDIA-NeMo/RL"
__download_url__ = "https://github.com/NVIDIA-NeMo/RL/releases"
__description__ = "NeMo-RL - a toolkit for model alignment"
__license__ = "Apache2"
__keywords__ = "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, language, reinforcement learning, RLHF, preference modeling, SteerLM, DPO"
2 changes: 1 addition & 1 deletion nemo_rl/utils/native_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ def convert_dcp_to_hf(
config.save_pretrained(hf_ckpt_path)

# TODO: After the following PR gets merged:
# https://github.com/NVIDIA/NeMo-RL/pull/148/files
# https://github.com/NVIDIA-NeMo/RL/pull/148/files
# tokenizer should be copied from policy/tokenizer/* instead of relying on the model name
# We can expose a arg at the top level --tokenizer_path to plumb that through.
# This is more stable than relying on the current NeMo-RL get_tokenizer() which can
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ test = [
megatron-core = { workspace = true }
nemo-tron = { workspace = true }
# The NeMo Run source to be used by nemo-tron
nemo_run = { git = "https://github.com/NVIDIA/NeMo-Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" }
nemo_run = { git = "https://github.com/NVIDIA-NeMo/Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" }
# torch/torchvision/triton all come from the torch index in order to pick up aarch64 wheels
torch = [
{ index = "pytorch-cu128" },
Expand Down
2 changes: 1 addition & 1 deletion tests/functional/dpo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ uv run $PROJECT_ROOT/examples/run_dpo.py \
uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS

# TODO: threshold set higher since test is flaky
# https://github.com/NVIDIA/NeMo-RL/issues/370
# https://github.com/NVIDIA-NeMo/RL/issues/370
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/loss"]["3"] < 0.8'

13 changes: 13 additions & 0 deletions tests/functional/test_converter_roundtrip.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/env python3
"""
Functional test for converter roundtrip functionality.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ uv run examples/run_sft.py \
# Convert tensorboard logs to json
uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS

# TODO: the memory check is known to OOM. see https://github.com/NVIDIA/NeMo-RL/issues/263
# TODO: the memory check is known to OOM. see https://github.com/NVIDIA-NeMo/RL/issues/263
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
# TODO: FIGURE OUT CORRECT METRICS
Expand Down
Loading
Loading