Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ List issues that this PR closes ([syntax](https://docs.github.com/en/issues/trac

# Before your PR is "Ready for review"
**Pre checks**:
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/reinforcer/blob/main/CONTRIBUTING.md)
- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/nemo-rl/blob/main/CONTRIBUTING.md)
- [ ] Did you write any new necessary tests?
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/reinforcer/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/reinforcer/blob/main/docs/documentation.md) for how to write, build and test the docs.
- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/nemo-rl/blob/main/docs/testing.md) for how to run tests
- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/nemo-rl/blob/main/docs/documentation.md) for how to write, build and test the docs.

# Additional Information
* ...
24 changes: 12 additions & 12 deletions .github/workflows/_run_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:

- name: Docker pull image
run: |
docker pull nemoci.azurecr.io/nemo_reinforcer_container:${{ github.run_id }}
docker pull nemoci.azurecr.io/nemo_rl_container:${{ github.run_id }}

- name: Checkout repository
uses: actions/checkout@v4
Expand All @@ -80,22 +80,22 @@ jobs:
docker run --rm -u root -d --name nemo_container_${{ github.run_id }} --runtime=nvidia --gpus all --shm-size=64g \
--env TRANSFORMERS_OFFLINE=0 \
--env HYDRA_FULL_ERROR=1 \
--env HF_HOME=/home/TestData/reinforcer/hf_home \
--env HF_DATASETS_CACHE=/home/TestData/reinforcer/hf_datasets_cache \
--env REINFORCER_REPO_DIR=/opt/reinforcer \
--env HF_HOME=/home/TestData/nemo-rl/hf_home \
--env HF_DATASETS_CACHE=/home/TestData/nemo-rl/hf_datasets_cache \
--env NEMO_RL_REPO_DIR=/opt/nemo-rl \
--env HF_TOKEN \
--volume $GITHUB_WORKSPACE:/opt/reinforcer \
--volume $GITHUB_WORKSPACE:/opt/nemo-rl \
--volume $GITHUB_ACTION_DIR:$GITHUB_ACTION_DIR \
--volume /mnt/datadrive/TestData/reinforcer/datasets:/opt/reinforcer/datasets:ro \
--volume /mnt/datadrive/TestData/reinforcer/checkpoints:/home/TestData/reinforcer/checkpoints:ro \
--volume /mnt/datadrive/TestData/reinforcer/hf_home/hub:/home/TestData/reinforcer/hf_home/hub \
--volume /mnt/datadrive/TestData/reinforcer/hf_datasets_cache:/home/TestData/reinforcer/hf_datasets_cache \
nemoci.azurecr.io/nemo_reinforcer_container:${{ github.run_id }} \
--volume /mnt/datadrive/TestData/nemo-rl/datasets:/opt/nemo-rl/datasets:ro \
--volume /mnt/datadrive/TestData/nemo-rl/checkpoints:/home/TestData/nemo-rl/checkpoints:ro \
--volume /mnt/datadrive/TestData/nemo-rl/hf_home/hub:/home/TestData/nemo-rl/hf_home/hub \
--volume /mnt/datadrive/TestData/nemo-rl/hf_datasets_cache:/home/TestData/nemo-rl/hf_datasets_cache \
nemoci.azurecr.io/nemo_rl_container:${{ github.run_id }} \
bash -c "sleep $(( ${{ inputs.TIMEOUT }} * 60 + 60 ))"

- name: Run unit tests
run: |
docker exec nemo_container_${{ github.run_id }} git config --global --add safe.directory /opt/reinforcer
docker exec nemo_container_${{ github.run_id }} git config --global --add safe.directory /opt/nemo-rl
docker exec nemo_container_${{ github.run_id }} bash -eux -o pipefail -c "
# This is needed since we create virtualenvs in the workspace, so this allows it to be cleaned up if necessary
umask 000
Expand Down Expand Up @@ -141,6 +141,6 @@ jobs:
if: always()
run: |
# Ensure any added files in the mounted directory are owned by the runner user to allow it to clean up
docker exec nemo_container_${{ github.run_id }} bash -c "find /opt/reinforcer -path '/opt/reinforcer/datasets' -prune -o -exec chown $(id -u):$(id -g) {} +"
docker exec nemo_container_${{ github.run_id }} bash -c "find /opt/nemo-rl -path '/opt/nemo-rl/datasets' -prune -o -exec chown $(id -u):$(id -g) {} +"
docker container stop nemo_container_${{ github.run_id }} || true
docker container rm nemo_container_${{ github.run_id }} || true
18 changes: 9 additions & 9 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: "CICD Reinforcer"
name: "CICD NeMo RL"

on:
pull_request:
Expand Down Expand Up @@ -136,12 +136,12 @@ jobs:
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_build_container.yml@v0.22.7
with:
build-ref: ${{ github.sha }}
image-name: nemo_reinforcer_container
image-name: nemo_rl_container
dockerfile: docker/Dockerfile
image-label: nemo-reinforcer
image-label: nemo-rl
build-args: |
MAX_JOBS=32
REINFORCER_COMMIT=${{ github.sha }}
NEMO_RL_COMMIT=${{ github.sha }}

tests:
name: Tests
Expand All @@ -152,21 +152,21 @@ jobs:
RUNNER: self-hosted-azure
TIMEOUT: 60
UNIT_TEST_SCRIPT: |
cd /opt/reinforcer
cd /opt/nemo-rl
if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L0|L1|L2)$ ]]; then
uv run --no-sync bash -x ./tests/run_unit.sh
else
echo Skipping unit tests for docs-only level
fi
DOC_TEST_SCRIPT: |
cd /opt/reinforcer/docs
cd /opt/nemo-rl/docs
if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(docs|L0|L1|L2)$ ]]; then
uv run --no-sync sphinx-build -b doctest . _build/doctest
else
echo Skipping doc tests for level ${{ needs.pre-flight.outputs.test_level }}
fi
FUNCTIONAL_TEST_SCRIPT: |
cd /opt/reinforcer
cd /opt/nemo-rl
if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L1|L2)$ ]]; then
uv run --no-sync bash ./tests/functional/sft.sh
uv run --no-sync bash ./tests/functional/grpo.sh
Expand All @@ -177,7 +177,7 @@ jobs:
fi
# TODO: enable once we have convergence tests in CI
#CONVERGENCE_TEST_SCRIPT: |
# cd /opt/reinforcer
# cd /opt/nemo-rl
# if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L2)$ ]]; then
# echo "Running convergence tests"
# # Add your convergence test commands here
Expand All @@ -186,7 +186,7 @@ jobs:
# echo "Skipping convergence tests for level ${{ needs.pre-flight.outputs.test_level }}"
# fi
AFTER_SCRIPT: |
cd /opt/reinforcer
cd /opt/nemo-rl
cat <<EOF | tee -a $GITHUB_STEP_SUMMARY
# Test Summary for level: ${{ needs.pre-flight.outputs.test_level }}

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release-freeze.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ jobs:
code-freeze:
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_code_freeze.yml@v0.22.5
with:
library-name: NeMo-reinforcer
python-package: nemo_reinforcer
library-name: NeMo-RL
python-package: nemo_rl
release-type: ${{ inputs.release-type }}
freeze-commit: ${{ inputs.freeze-commit }}
dry-run: ${{ inputs.dry-run }}
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: "Release Reinforcer"
name: "Release NeMo-RL"

on:
workflow_dispatch:
Expand All @@ -35,9 +35,9 @@ jobs:
uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_release_library.yml@v0.22.6
with:
release-ref: ${{ inputs.release-ref }}
python-package: nemo_reinforcer
python-package: nemo_rl
python-version: "3.11"
library-name: NeMo-Reinforcer
library-name: NeMo-RL
dry-run: ${{ inputs.dry-run }}
version-bump-branch: ${{ inputs.version-bump-branch }}
secrets:
Expand Down
18 changes: 9 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
# Contributing To Nemo-Reinforcer
# Contributing To Nemo-RL

Thanks for your interest in contributing to Nemo-Reinforcer!
Thanks for your interest in contributing to Nemo-RL!

## Setting Up

### Development Environment

1. **Build and run the Docker container**:
```bash
docker buildx build -t nemo-reinforcer -f Dockerfile .
# Run the container with your local nemo-reinforcer directory mounted
docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer nemo-reinforcer
docker buildx build -t nemo-rl -f Dockerfile .
# Run the container with your local nemo-rl directory mounted
docker run -it --gpus all -v /path/to/nemo-rl:/workspace/nemo-rl nemo-rl
```

## Making Changes
Expand All @@ -19,7 +19,7 @@ docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer

#### Before You Start: Install pre-commit

From the [`nemo-reinforcer` root directory](.), run:
From the [`nemo-rl` root directory](.), run:
```bash
python3 -m pip install pre-commit
pre-commit install
Expand All @@ -31,8 +31,8 @@ We follow a direct clone and branch workflow for now:

1. Clone the repository directly:
```bash
git clone https://github.com/NVIDIA/reinforcer
cd reinforcer
git clone https://github.com/NVIDIA/nemo-rl
cd nemo-rl
```

2. Create a new branch for your changes:
Expand Down Expand Up @@ -69,7 +69,7 @@ This ensures that all significant changes are well-thought-out and properly docu
1. **User Adoption**: Helps users understand how to effectively use the library's features in their projects
2. **Developer Extensibility**: Enables developers to understand the internal architecture and implementation details, making it easier to modify, extend, or adapt the code for their specific use cases

Quality documentation is essential for both the usability of Nemo-Reinforcer and its ability to be customized by the community.
Quality documentation is essential for both the usability of Nemo-RL and its ability to be customized by the community.

## Code Quality

Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to >100B Parameters, scaling from 1 GPU to 100s
# Nemo-RL: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to >100B Parameters, scaling from 1 GPU to 100s

<!-- markdown all in one -->
- [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
- [Nemo-RL: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-rl-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
- [Features](#features)
- [Prerequisuites](#prerequisuites)
- [Quick start](#quick-start)
Expand All @@ -17,7 +17,7 @@
- [Multi-node](#multi-node-2)
- [Cluster Start](#cluster-start)

**Nemo-Reinforcer** is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters.
**Nemo-RL** is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters.

What you can expect:

Expand Down Expand Up @@ -52,8 +52,8 @@ What you can expect:

Clone **NeMo RL**
```sh
git clone git@github.com:NVIDIA/reinforcer.git
cd reinforcer
git clone git@github.com:NVIDIA/nemo-rl.git
cd nemo-rl
```

Install `uv`
Expand Down Expand Up @@ -111,7 +111,7 @@ uv run python examples/run_grpo_math.py \
#### Multi-node

```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
NUM_ACTOR_NODES=2

# grpo_math_8b uses Llama-3.1-8B-Instruct model
Expand All @@ -131,7 +131,7 @@ sbatch \
##### GRPO Qwen2.5-32B

```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
NUM_ACTOR_NODES=16

# Download Qwen before the job starts to avoid spending time downloading during the training loop
Expand Down Expand Up @@ -187,7 +187,7 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o
#### Multi-node

```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
NUM_ACTOR_NODES=2

COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
Expand Down Expand Up @@ -244,7 +244,7 @@ Refer to [dpo.yaml](examples/configs/dpo.yaml) for a full list of parameters tha
For distributed DPO training across multiple nodes, modify the following script for your use case:

```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
## number of nodes to use for your job
NUM_ACTOR_NODES=2

Expand Down
8 changes: 4 additions & 4 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,12 @@ WORKDIR /opt/reinforcer
# First copy only the dependency files
COPY --chown=ray --chmod=755 pyproject.toml uv.lock ./

ENV UV_PROJECT_ENVIRONMENT=/opt/reinforcer_venv
ENV VIRTUAL_ENV=/opt/reinforcer_venv
ENV UV_PROJECT_ENVIRONMENT=/opt/nemo_rl_venv
ENV VIRTUAL_ENV=/opt/nemo_rl_venv

# Create and activate virtual environment
RUN <<"EOF"
uv venv /opt/reinforcer_venv
uv venv /opt/nemo_rl_venv
# uv sync has a more reliable resolver than simple uv pip install which can fail

# Sync each training + inference backend one at a time (since they may conflict)
Expand All @@ -38,7 +38,7 @@ uv sync --locked --extra vllm --no-install-project
uv sync --locked --all-groups --no-install-project
EOF

ENV PATH="/opt/reinforcer_venv/bin:$PATH"
ENV PATH="/opt/nemo_rl_venv/bin:$PATH"

# The ray images automatically activate the anaconda venv. We will
# comment this out of the .bashrc to give the same UX between docker
Expand Down
4 changes: 2 additions & 2 deletions docs/adding-new-models.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Adding New Models

This guide outlines how to integrate and validate a new model within **NeMo-Reinforcer**. Each new model must pass a standard set of compatibility tests before being considered ready to be used in RL pipelines.
This guide outlines how to integrate and validate a new model within **NeMo-RL**. Each new model must pass a standard set of compatibility tests before being considered ready to be used in RL pipelines.

## Importance of Log Probability Consistency in Training and Inference

Expand Down Expand Up @@ -120,4 +120,4 @@ When validating your model, you should analyze the results across different conf

---

By following these validation steps and ensuring your model's outputs remain consistent across backends, you can confirm that your new model meets **NeMo-Reinforcer**'s requirements.
By following these validation steps and ensuring your model's outputs remain consistent across backends, you can confirm that your new model meets **NeMo-RL**'s requirements.
6 changes: 3 additions & 3 deletions docs/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
### Batched Job Submission

```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
NUM_ACTOR_NODES=1 # Total nodes requested (head is colocated on ray-worker-0)

COMMAND="uv run ./examples/run_grpo_math.py" \
Expand Down Expand Up @@ -43,12 +43,12 @@ tail -f 1980204-logs/ray-driver.log
### Interactive Launching

:::{tip}
A key advantage of running interactively on the head node is the ability to execute multiple multi-node jobs without needing to requeue in the SLURM job queue. This means during debugging sessions, you can avoid submitting a new `sbatch` command each time and instead debug and re-submit your Reinforcer job directly from the interactive session.
A key advantage of running interactively on the head node is the ability to execute multiple multi-node jobs without needing to requeue in the SLURM job queue. This means during debugging sessions, you can avoid submitting a new `sbatch` command each time and instead debug and re-submit your NeMo-RL job directly from the interactive session.
:::

To run interactively, launch the same command as the [Batched Job Submission](#batched-job-submission) except omit the `COMMAND` line:
```sh
# Run from the root of NeMo-Reinforcer repo
# Run from the root of NeMo-RL repo
NUM_ACTOR_NODES=1 # Total nodes requested (head is colocated on ray-worker-0)

CONTAINER=YOUR_CONTAINER \
Expand Down
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
import os
import sys

project = "NeMo-Reinforcer"
project = "NeMo-RL"
copyright = "2025, NVIDIA Corporation"
author = "NVIDIA Corporation"
release = "0.0.1"
Expand Down Expand Up @@ -59,7 +59,7 @@
sys.path.insert(0, os.path.abspath(".."))

autodoc2_packages = [
"../nemo_reinforcer", # Path to your package relative to conf.py
"../nemo_rl", # Path to your package relative to conf.py
]
autodoc2_render_plugin = "myst" # Use MyST for rendering docstrings
autodoc2_output_dir = "apidocs" # Output directory for autodoc2 (relative to docs/)
Expand Down
4 changes: 2 additions & 2 deletions docs/design-docs/checkpointing.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Checkpointing with HuggingFace Models

## Checkpoint Format
Reinforcer provides two checkpoint formats for HuggingFace models: Torch distributed and HuggingFace format. Torch distributed is used by default for efficiency, and HuggingFace format is provided for compatibility with HuggingFace's `AutoModel.from_pretrained` API. Note that HuggingFace format checkpoints save only the model weights, ignoring the optimizer states. It is recommended to use Torch distributed format to save intermediate checkpoints and to save a HuggingFace checkpoint only at the end of training.
NeMo-RL provides two checkpoint formats for HuggingFace models: Torch distributed and HuggingFace format. Torch distributed is used by default for efficiency, and HuggingFace format is provided for compatibility with HuggingFace's `AutoModel.from_pretrained` API. Note that HuggingFace format checkpoints save only the model weights, ignoring the optimizer states. It is recommended to use Torch distributed format to save intermediate checkpoints and to save a HuggingFace checkpoint only at the end of training.

There are two ways to get a Reinforcer checkpoint in HuggingFace format.
There are two ways to get a NeMo-RL checkpoint in HuggingFace format.

1. (Recommended) Save the HuggingFace checkpoint directly by passing `save_hf=True` to `HFPolicy`'s `save_checkpoint`:

Expand Down
Loading