Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@ List issues that this PR closes ([syntax](https://docs.github.com/en/issues/trac
* **You can potentially add a usage example below**

```python
# Add a code snippet demonstrating how to use this
# Add a code snippet demonstrating how to use this
```

# Before your PR is "Ready for review"
**Pre checks**:
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md)
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md)
- [ ] Did you write any new necessary tests?
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/NeMo-RL/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/NeMo-RL/blob/main/docs/documentation.md) for how to write, build and test the docs.
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA-NeMo/RL/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA-NeMo/RL/blob/main/docs/documentation.md) for how to write, build and test the docs.

# Additional Information
* ...
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ We follow a direct clone and branch workflow for now:

1. Clone the repository directly:
```bash
git clone https://github.com/NVIDIA/NeMo-RL
git clone https://github.com/NVIDIA-NeMo/RL
cd nemo-rl
```

Expand Down
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ cd nemo-rl
# by running (This is not necessary if you are using the pure Pytorch/DTensor path):
git submodule update --init --recursive

# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure
# Different branches of the repo can have different pinned versions of these third-party submodules. Ensure
# submodules are automatically updated after switching branches or pulling updates by configuring git with:
# git config submodule.recurse true

Expand Down Expand Up @@ -203,7 +203,7 @@ sbatch \
We also support multi-turn generation and training (tool use, games, etc.).
Reference example for training to play a Sliding Puzzle Game:
```sh
uv run python examples/run_grpo_sliding_puzzle.py
uv run python examples/run_grpo_sliding_puzzle.py
```

## Supervised Fine-Tuning (SFT)
Expand Down Expand Up @@ -367,16 +367,16 @@ If you use NeMo RL in your research, please cite it using the following BibTeX e
```bibtex
@misc{nemo-rl,
title = {NeMo RL: A Scalable and Efficient Post-Training Library},
howpublished = {\url{https://github.com/NVIDIA/NeMo-RL}},
howpublished = {\url{https://github.com/NVIDIA-NeMo/RL}},
year = {2025},
note = {GitHub repository},
}
```

## Contributing

We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA/NeMo-RL/blob/main/CONTRIBUTING.md) for more information on how to get involved.
We welcome contributions to NeMo RL\! Please see our [Contributing Guidelines](https://github.com/NVIDIA-NeMo/RL/blob/main/CONTRIBUTING.md) for more information on how to get involved.

## Licenses

NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA/NeMo-RL/blob/main/LICENSE).
NVIDIA NeMo RL is licensed under the [Apache License 2.0](https://github.com/NVIDIA-NeMo/RL/blob/main/LICENSE).
32 changes: 16 additions & 16 deletions docs/adding-new-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ $$\text{KL} = E_{x \sim \pi}[\pi(x) - \pi_{\text{ref}}(x)]$$

When summed/integrated, replacing the $x \sim \pi$ with $x \sim \pi_{\text{wrong}}$ leads to an error of:

$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$
$$\sum_{x} \left( \pi(x) - \pi_{\text{ref}}(x) \right) \left( \pi_{\text{wrong}}(x) - \pi(x) \right)$$

So, to verify correctness, we calculate:

Expand Down Expand Up @@ -65,28 +65,28 @@ When investigating discrepancies beyond the acceptable threshold, focus on these

When validating Hugging Face-based models, perform the following checks:

- **Compare log probabilities**
- **Compare log probabilities**
Ensure the generation log probabilities from inference backends like **vLLM** match those computed by Hugging Face. This comparison helps diagnose potential mismatches.

- **Test parallelism**
- **Test parallelism**
Verify consistency with other parallelism settings.

- **Variance**
- **Variance**
Repeat tests multiple times (e.g., 10 runs) to confirm that behavior is deterministic or within acceptable variance.

- **Check sequence lengths**
Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens.
- **Check sequence lengths**
Perform inference on sequence lengths of 100, 1,000, and 10,000 tokens.
Ensure the model behaves consistently at each length.

- **Use real and dummy data**
- **Real data:** Tokenize and generate from actual text samples.
- **Use real and dummy data**
- **Real data:** Tokenize and generate from actual text samples.
- **Dummy data:** Simple numeric sequences to test basic generation.

- **Vary sampling parameters**
Test both greedy and sampling generation modes.
- **Vary sampling parameters**
Test both greedy and sampling generation modes.
Adjust temperature and top-p to confirm output consistency across backends.

- **Test different batch sizes**
- **Test different batch sizes**
Try with batch sizes of 1, 8, and 32 to ensure consistent behavior across different batch configurations.

---
Expand All @@ -95,11 +95,11 @@ When validating Hugging Face-based models, perform the following checks:

### Additional Validation

- **Compare Megatron outputs**
- **Compare Megatron outputs**
Ensure the Megatron forward pass aligns with Hugging Face and the generation log probabilities from inference backends like **vLLM**.

- **Parallel settings**
Match the same parallelism configurations used for the HuggingFace-based tests.
- **Parallel settings**
Match the same parallelism configurations used for the HuggingFace-based tests.
Confirm outputs remain consistent across repeated runs.

---
Expand Down Expand Up @@ -128,7 +128,7 @@ By following these validation steps and ensuring your model's outputs remain con
We also maintain a set of standalone scripts that can be used to diagnose issues related to correctness that
we have encountered before.

## [1.max_model_len_respected.py](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py)
## [1.max_model_len_respected.py](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/1.max_model_len_respected.py)

Test if a new model respects the `max_model_len` passed to vllm:

Expand All @@ -142,7 +142,7 @@ uv run --extra vllm tools/model_diagnostics/1.max_model_len_respected.py Qwen/Qw
# [Qwen/Qwen2.5-1.5B] ALL GOOD!
```

## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA/NeMo-RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py)
## [2.long_generation_decode_vs_prefill](https://github.com/NVIDIA-NeMo/RL/blob/main/tools/model_diagnostics/2.long_generation_decode_vs_prefill.py)

Test that vLLM yields near-identical token log-probabilities when comparing decoding with a single prefill pass across multiple prompts.

Expand Down
2 changes: 1 addition & 1 deletion docs/model-quirks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This document outlines special cases and model-specific behaviors that require c

### Tied Weights

Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA/NeMo-RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1.
Weight tying between the embedding layer (`model.embed_tokens`) and output layer (`lm_head`) is currently not respected when using the FSDP1 policy or the DTensor policy when TP > 1 (See [this issue](https://github.com/NVIDIA-NeMo/RL/issues/227)). To avoid errors when training these models, we only allow training models with tied weights using the DTensor policy with TP=1. For Llama-3 and Qwen2.5 models, weight-tying is only enabled for the smaller models (< 2B), which can typically be trained without tensor parallelism. For Gemma-3, all model sizes have weight-tying enabled, including the larger models which require tensor parallelism. To support training of these models, we specially handle the Gemma-3 models by allowing training using the DTensor policy with TP > 1.

**Special Handling:**
- We skip the tied weights check for all Gemma-3 models when using the DTensor policy, allowing training using TP > 1.
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/grpo-deepscaler-1.5b-8K.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ checkpointing:
save_period: 10

policy:
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227)
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227)
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
Expand Down
2 changes: 1 addition & 1 deletion examples/configs/grpo_math_1B.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ checkpointing:
save_period: 10

policy:
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA/NeMo-RL/issues/227)
# Qwen/Qwen2.5-1.5B has tied weights which are only supported with dtensor policy with tp size 1 (https://github.com/NVIDIA-NeMo/RL/issues/227)
model_name: "Qwen/Qwen2.5-1.5B"
tokenizer:
name: ${policy.model_name} ## specify if you'd like to use a tokenizer different from the model's default
Expand Down
4 changes: 2 additions & 2 deletions examples/configs/grpo_sliding_puzzle.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ policy:
max_new_tokens: ${policy.max_total_sequence_length}
temperature: 1.0
# Setting top_p/top_k to 0.999/10000 to strip out Qwen's special/illegal tokens
# https://github.com/NVIDIA/NeMo-RL/issues/237
# https://github.com/NVIDIA-NeMo/RL/issues/237
top_p: 0.999
top_k: 10000
stop_token_ids: null
Expand All @@ -38,7 +38,7 @@ policy:

data:
add_system_prompt: false

env:
sliding_puzzle_game:
cfg:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ policy:
context_parallel_size: 1
custom_parallel_plan: null
dynamic_batching:
# TODO: OOMs if enabled https://github.com/NVIDIA/NeMo-RL/issues/383
# TODO: OOMs if enabled https://github.com/NVIDIA-NeMo/RL/issues/383
enabled: False
train_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.train_micro_batch_size}}
logprob_mb_tokens: ${mul:${policy.max_total_sequence_length}, ${policy.logprob_batch_size}}
Expand Down
2 changes: 1 addition & 1 deletion examples/convert_dcp_to_hf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def main():

model_name_or_path = config["policy"]["model_name"]
# TODO: After the following PR gets merged:
# https://github.com/NVIDIA/NeMo-RL/pull/148/files
# https://github.com/NVIDIA-NeMo/RL/pull/148/files
# tokenizer should be copied from policy/tokenizer/* instead of relying on the model name
# We can expose a arg at the top level --tokenizer_path to plumb that through.
# This is more stable than relying on the current NeMo-RL get_tokenizer() which can
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/algorithms/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ class DPOConfig(TypedDict):
preference_average_log_probs: bool
sft_average_log_probs: bool
## TODO(@ashors) support other loss functions
## https://github.com/NVIDIA/NeMo-RL/issues/193
## https://github.com/NVIDIA-NeMo/RL/issues/193
# preference_loss: str
# gt_reward_scale: float
preference_loss_weight: float
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/distributed/ray_actor_environment_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
ACTOR_ENVIRONMENT_REGISTRY: dict[str, str] = {
"nemo_rl.models.generation.vllm.VllmGenerationWorker": PY_EXECUTABLES.VLLM,
# Temporary workaround for the coupled implementation of DTensorPolicyWorker and vLLM.
# This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA/NeMo-RL/issues/501 is resolved.
# This will be reverted to PY_EXECUTABLES.BASE once https://github.com/NVIDIA-NeMo/RL/issues/501 is resolved.
"nemo_rl.models.policy.dtensor_policy_worker.DTensorPolicyWorker": PY_EXECUTABLES.VLLM,
"nemo_rl.models.policy.fsdp1_policy_worker.FSDP1PolicyWorker": PY_EXECUTABLES.BASE,
"nemo_rl.models.policy.megatron_policy_worker.MegatronPolicyWorker": PY_EXECUTABLES.MCORE,
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/models/generation/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@ def _patch_vllm_init_workers_ray():
enable_prefix_caching=torch.cuda.get_device_capability()[0] >= 8,
dtype=self.cfg["vllm_cfg"]["precision"],
seed=seed,
# Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA/NeMo-RL/issues/186)
# Don't use cuda-graph by default as it leads to convergence issues (see https://github.com/NVIDIA-NeMo/RL/issues/186)
enforce_eager=True,
max_model_len=self.cfg["vllm_cfg"]["max_model_len"],
trust_remote_code=True,
Expand Down
4 changes: 2 additions & 2 deletions nemo_rl/models/policy/dtensor_policy_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ def __init__(
device_map="cpu", # load weights onto CPU initially
# Always load the model in float32 to keep master weights in float32.
# Keeping the master weights in lower precision has shown to cause issues with convergence.
# https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models.
# https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models.
torch_dtype=torch.float32,
trust_remote_code=True,
**sliding_window_overwrite(
Expand Down Expand Up @@ -381,7 +381,7 @@ def train(
and not self.skip_tie_check
):
raise ValueError(
f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
f"Using dtensor policy with tp size {self.cfg['dtensor_cfg']['tensor_parallel_size']} for model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
)
if gbs is None:
gbs = self.cfg["train_global_batch_size"]
Expand Down
6 changes: 3 additions & 3 deletions nemo_rl/models/policy/fsdp1_policy_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def __init__(
device_map="cpu", # load weights onto CPU initially
# Always load the model in float32 to keep master weights in float32.
# Keeping the master weights in lower precision has shown to cause issues with convergence.
# https://github.com/NVIDIA/NeMo-RL/issues/279 will fix the issue of CPU OOM for larger models.
# https://github.com/NVIDIA-NeMo/RL/issues/279 will fix the issue of CPU OOM for larger models.
torch_dtype=torch.float32,
trust_remote_code=True,
**sliding_window_overwrite(
Expand All @@ -110,7 +110,7 @@ def __init__(
self.reference_model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="cpu", # load weights onto CPU initially
torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA/nemo-rl/issues/13 is fixed
torch_dtype=torch.float32, # use full precision in sft until https://github.com/NVIDIA-NeMo/RL/issues/13 is fixed
trust_remote_code=True,
**sliding_window_overwrite(
model_name
Expand Down Expand Up @@ -249,7 +249,7 @@ def train(
skip_tie_check = os.environ.get("NRL_SKIP_TIED_WEIGHT_CHECK")
if self.num_tied_weights != 0 and not skip_tie_check:
raise ValueError(
f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA/NeMo-RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
f"Using FSP1 with a model ({self.cfg['model_name']}) that has tied weights (num_tied_weights={self.num_tied_weights}) is not supported (https://github.com/NVIDIA-NeMo/RL/issues/227). Please use dtensor policy with tensor parallel == 1 instead."
)

if gbs is None:
Expand Down
4 changes: 2 additions & 2 deletions nemo_rl/package_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
__contact_names__ = "NVIDIA"
__contact_emails__ = "nemo-tookit@nvidia.com"
__homepage__ = "https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/"
__repository_url__ = "https://github.com/NVIDIA/NeMo-RL"
__download_url__ = "https://github.com/NVIDIA/NeMo-RL/releases"
__repository_url__ = "https://github.com/NVIDIA-NeMo/RL"
__download_url__ = "https://github.com/NVIDIA-NeMo/RL/releases"
__description__ = "NeMo-RL - a toolkit for model alignment"
__license__ = "Apache2"
__keywords__ = "deep learning, machine learning, gpu, NLP, NeMo, nvidia, pytorch, torch, language, reinforcement learning, RLHF, preference modeling, SteerLM, DPO"
2 changes: 1 addition & 1 deletion nemo_rl/utils/native_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ def convert_dcp_to_hf(
config.save_pretrained(hf_ckpt_path)

# TODO: After the following PR gets merged:
# https://github.com/NVIDIA/NeMo-RL/pull/148/files
# https://github.com/NVIDIA-NeMo/RL/pull/148/files
# tokenizer should be copied from policy/tokenizer/* instead of relying on the model name
# We can expose a arg at the top level --tokenizer_path to plumb that through.
# This is more stable than relying on the current NeMo-RL get_tokenizer() which can
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ test = [
megatron-core = { workspace = true }
nemo-tron = { workspace = true }
# The NeMo Run source to be used by nemo-tron
nemo_run = { git = "https://github.com/NVIDIA/NeMo-Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" }
nemo_run = { git = "https://github.com/NVIDIA-NeMo/Run", rev = "414f0077c648fde2c71bb1186e97ccbf96d6844c" }
# torch/torchvision/triton all come from the torch index in order to pick up aarch64 wheels
torch = [
{ index = "pytorch-cu128" },
Expand Down
2 changes: 1 addition & 1 deletion tests/functional/dpo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ uv run $PROJECT_ROOT/examples/run_dpo.py \
uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS

# TODO: threshold set higher since test is flaky
# https://github.com/NVIDIA/NeMo-RL/issues/370
# https://github.com/NVIDIA-NeMo/RL/issues/370
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/loss"]["3"] < 0.8'

Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ uv run examples/run_sft.py \
# Convert tensorboard logs to json
uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS

# TODO: the memory check is known to OOM. see https://github.com/NVIDIA/NeMo-RL/issues/263
# TODO: the memory check is known to OOM. see https://github.com/NVIDIA-NeMo/RL/issues/263
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
# TODO: FIGURE OUT CORRECT METRICS
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ uv run examples/run_sft.py \
# Convert tensorboard logs to json
uv run tests/json_dump_tb_logs.py $LOG_DIR --output_path $JSON_METRICS

# TODO: memory check will fail due to OOM tracked here https://github.com/NVIDIA/NeMo-RL/issues/263
# TODO: memory check will fail due to OOM tracked here https://github.com/NVIDIA-NeMo/RL/issues/263

# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
source $SCRIPT_DIR/common.env

# TODO: this config can crash on OOM
# https://github.com/NVIDIA/NeMo-RL/issues/263
# https://github.com/NVIDIA-NeMo/RL/issues/263

# ===== BEGIN CONFIG =====
NUM_NODES=4
Expand Down
Loading
Loading