NVIDIA-NeMo · SahilJain314 · Apr 28, 2025 · Apr 24, 2025 · Apr 25, 2025 · Apr 25, 2025
@@ -15,10 +15,10 @@ List issues that this PR closes ([syntax](https://docs.github.com/en/issues/trac
 
 # Before your PR is "Ready for review"
 **Pre checks**:
-- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/reinforcer/blob/main/CONTRIBUTING.md)
+- [ ] Make sure you read and followed [Contributor guidelines](/NVIDIA/nemo-rl/blob/main/CONTRIBUTING.md)
 - [ ] Did you write any new necessary tests?
-- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/reinforcer/blob/main/docs/testing.md) for how to run tests
-- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/reinforcer/blob/main/docs/documentation.md) for how to write, build and test the docs.
+- [ ] Did you run the unit tests and functional tests locally? Visit our [Testing Guide](/NVIDIA/nemo-rl/blob/main/docs/testing.md) for how to run tests
+- [ ] Did you add or update any necessary documentation? Visit our [Document Development Guide](/NVIDIA/nemo-rl/blob/main/docs/documentation.md) for how to write, build and test the docs.
 
 # Additional Information
 * ...
@@ -68,7 +68,7 @@ jobs:
 
         - name: Docker pull image
           run: |
-            docker pull nemoci.azurecr.io/nemo_reinforcer_container:${{ github.run_id }}
+            docker pull nemoci.azurecr.io/nemo_rl_container:${{ github.run_id }}
 
         - name: Checkout repository
           uses: actions/checkout@v4
@@ -80,22 +80,22 @@ jobs:
             docker run --rm -u root -d --name nemo_container_${{ github.run_id }} --runtime=nvidia --gpus all --shm-size=64g \
               --env TRANSFORMERS_OFFLINE=0 \
               --env HYDRA_FULL_ERROR=1 \
-              --env HF_HOME=/home/TestData/reinforcer/hf_home \
-              --env HF_DATASETS_CACHE=/home/TestData/reinforcer/hf_datasets_cache \
-              --env REINFORCER_REPO_DIR=/opt/reinforcer \
+              --env HF_HOME=/home/TestData/nemo-rl/hf_home \
+              --env HF_DATASETS_CACHE=/home/TestData/nemo-rl/hf_datasets_cache \
+              --env NEMO_RL_REPO_DIR=/opt/nemo-rl \
               --env HF_TOKEN \
-              --volume $GITHUB_WORKSPACE:/opt/reinforcer \
+              --volume $GITHUB_WORKSPACE:/opt/nemo-rl \
               --volume $GITHUB_ACTION_DIR:$GITHUB_ACTION_DIR \
-              --volume /mnt/datadrive/TestData/reinforcer/datasets:/opt/reinforcer/datasets:ro \
-              --volume /mnt/datadrive/TestData/reinforcer/checkpoints:/home/TestData/reinforcer/checkpoints:ro \
-              --volume /mnt/datadrive/TestData/reinforcer/hf_home/hub:/home/TestData/reinforcer/hf_home/hub \
-              --volume /mnt/datadrive/TestData/reinforcer/hf_datasets_cache:/home/TestData/reinforcer/hf_datasets_cache \
-              nemoci.azurecr.io/nemo_reinforcer_container:${{ github.run_id }} \
+              --volume /mnt/datadrive/TestData/nemo-rl/datasets:/opt/nemo-rl/datasets:ro \
+              --volume /mnt/datadrive/TestData/nemo-rl/checkpoints:/home/TestData/nemo-rl/checkpoints:ro \
+              --volume /mnt/datadrive/TestData/nemo-rl/hf_home/hub:/home/TestData/nemo-rl/hf_home/hub \
+              --volume /mnt/datadrive/TestData/nemo-rl/hf_datasets_cache:/home/TestData/nemo-rl/hf_datasets_cache \
+              nemoci.azurecr.io/nemo_rl_container:${{ github.run_id }} \
               bash -c "sleep $(( ${{ inputs.TIMEOUT }} * 60 + 60 ))"
 
         - name: Run unit tests
           run: |
-            docker exec nemo_container_${{ github.run_id }} git config --global --add safe.directory /opt/reinforcer
+            docker exec nemo_container_${{ github.run_id }} git config --global --add safe.directory /opt/nemo-rl
             docker exec nemo_container_${{ github.run_id }} bash -eux -o pipefail -c "
             # This is needed since we create virtualenvs in the workspace, so this allows it to be cleaned up if necessary
             umask 000
@@ -141,6 +141,6 @@ jobs:
           if: always()
           run: |
             # Ensure any added files in the mounted directory are owned by the runner user to allow it to clean up
-            docker exec nemo_container_${{ github.run_id }} bash -c "find /opt/reinforcer -path '/opt/reinforcer/datasets' -prune -o -exec chown $(id -u):$(id -g) {} +"
+            docker exec nemo_container_${{ github.run_id }} bash -c "find /opt/nemo-rl -path '/opt/nemo-rl/datasets' -prune -o -exec chown $(id -u):$(id -g) {} +"
             docker container stop nemo_container_${{ github.run_id }} || true
             docker container rm nemo_container_${{ github.run_id }} || true
@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-name: "CICD Reinforcer"
+name: "CICD NeMo RL"
 
 on:
   pull_request:
@@ -136,12 +136,12 @@ jobs:
     uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_build_container.yml@v0.22.7
     with:
       build-ref: ${{ github.sha }}
-      image-name: nemo_reinforcer_container
+      image-name: nemo_rl_container
       dockerfile: docker/Dockerfile
-      image-label: nemo-reinforcer
+      image-label: nemo-rl
       build-args: |
         MAX_JOBS=32
-        REINFORCER_COMMIT=${{ github.sha }}
+        NEMO_RL_COMMIT=${{ github.sha }}
 
   tests:
     name: Tests
@@ -152,21 +152,21 @@ jobs:
       RUNNER: self-hosted-azure
       TIMEOUT: 60
       UNIT_TEST_SCRIPT: |
-        cd /opt/reinforcer
+        cd /opt/nemo-rl
         if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L0|L1|L2)$ ]]; then
           uv run --no-sync bash -x ./tests/run_unit.sh
         else
           echo Skipping unit tests for docs-only level
         fi
       DOC_TEST_SCRIPT: |
-        cd /opt/reinforcer/docs
+        cd /opt/nemo-rl/docs
         if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(docs|L0|L1|L2)$ ]]; then
           uv run --no-sync sphinx-build -b doctest . _build/doctest
         else
           echo Skipping doc tests for level ${{ needs.pre-flight.outputs.test_level }}
         fi
       FUNCTIONAL_TEST_SCRIPT: |
-        cd /opt/reinforcer
+        cd /opt/nemo-rl
         if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L1|L2)$ ]]; then
           uv run --no-sync bash ./tests/functional/sft.sh
           uv run --no-sync bash ./tests/functional/grpo.sh
@@ -177,7 +177,7 @@ jobs:
         fi
       # TODO: enable once we have convergence tests in CI
       #CONVERGENCE_TEST_SCRIPT: |
-      #  cd /opt/reinforcer
+      #  cd /opt/nemo-rl
       #  if [[ "${{ needs.pre-flight.outputs.test_level }}" =~ ^(L2)$  ]]; then
       #    echo "Running convergence tests"
       #    # Add your convergence test commands here
@@ -186,7 +186,7 @@ jobs:
       #    echo "Skipping convergence tests for level ${{ needs.pre-flight.outputs.test_level }}"
       #  fi
       AFTER_SCRIPT: |
-        cd /opt/reinforcer
+        cd /opt/nemo-rl
         cat <<EOF | tee -a $GITHUB_STEP_SUMMARY
         # Test Summary for level: ${{ needs.pre-flight.outputs.test_level }}
 

@@ -36,8 +36,8 @@ jobs:
   code-freeze:
     uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_code_freeze.yml@v0.22.5
     with:
-      library-name: NeMo-reinforcer
-      python-package: nemo_reinforcer
+      library-name: NeMo-RL
+      python-package: nemo_rl
       release-type: ${{ inputs.release-type }}
       freeze-commit: ${{ inputs.freeze-commit }}
       dry-run: ${{ inputs.dry-run }}

@@ -11,7 +11,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-name: "Release Reinforcer"
+name: "Release NeMo-RL"
 
 on:
   workflow_dispatch:
@@ -35,9 +35,9 @@ jobs:
     uses: NVIDIA/NeMo-FW-CI-templates/.github/workflows/_release_library.yml@v0.22.6
     with:
       release-ref: ${{ inputs.release-ref }}
-      python-package: nemo_reinforcer
+      python-package: nemo_rl
       python-version: "3.11"
-      library-name: NeMo-Reinforcer
+      library-name: NeMo-RL
       dry-run: ${{ inputs.dry-run }}
       version-bump-branch: ${{ inputs.version-bump-branch }}
     secrets:

@@ -1,16 +1,16 @@
-# Contributing To Nemo-Reinforcer
+# Contributing To Nemo-RL
 
-Thanks for your interest in contributing to Nemo-Reinforcer!
+Thanks for your interest in contributing to Nemo-RL!
 
 ## Setting Up
 
 ### Development Environment
 
 1. **Build and run the Docker container**:
 ```bash
-docker buildx build -t nemo-reinforcer -f Dockerfile .
-# Run the container with your local nemo-reinforcer directory mounted
-docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer nemo-reinforcer
+docker buildx build -t nemo-rl -f Dockerfile .
+# Run the container with your local nemo-rl directory mounted
+docker run -it --gpus all -v /path/to/nemo-rl:/workspace/nemo-rl nemo-rl
 ```
 
 ## Making Changes
@@ -19,7 +19,7 @@ docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer
 
 #### Before You Start: Install pre-commit
 
-From the [`nemo-reinforcer` root directory](.), run:
+From the [`nemo-rl` root directory](.), run:
 ```bash
 python3 -m pip install pre-commit
 pre-commit install
@@ -31,8 +31,8 @@ We follow a direct clone and branch workflow for now:
 
 1. Clone the repository directly:
    ```bash
-   git clone https://github.com/NVIDIA/reinforcer
-   cd reinforcer
+   git clone https://github.com/NVIDIA/nemo-rl
+   cd nemo-rl
    ```
 
 2. Create a new branch for your changes:
@@ -69,7 +69,7 @@ This ensures that all significant changes are well-thought-out and properly docu
 1. **User Adoption**: Helps users understand how to effectively use the library's features in their projects
 2. **Developer Extensibility**: Enables developers to understand the internal architecture and implementation details, making it easier to modify, extend, or adapt the code for their specific use cases
 
-Quality documentation is essential for both the usability of Nemo-Reinforcer and its ability to be customized by the community.
+Quality documentation is essential for both the usability of Nemo-RL and its ability to be customized by the community.
 
 ## Code Quality
 

@@ -1,7 +1,7 @@
-# Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to >100B Parameters, scaling from 1 GPU to 100s
+# Nemo-RL: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to >100B Parameters, scaling from 1 GPU to 100s
 
 <!-- markdown all in one -->
-- [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
+- [Nemo-RL: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-rl-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
   - [Features](#features)
   - [Prerequisuites](#prerequisuites)
   - [Quick start](#quick-start)
@@ -17,7 +17,7 @@
       - [Multi-node](#multi-node-2)
   - [Cluster Start](#cluster-start)
 
-**Nemo-Reinforcer** is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters.
+**Nemo-RL** is a scalable and efficient post-training library designed for models ranging from 1 GPU to thousands, and from tiny to over 100 billion parameters.
 
 What you can expect:
 
@@ -52,8 +52,8 @@ What you can expect:
 
 Clone **NeMo RL**
 ```sh
-git clone git@github.com:NVIDIA/reinforcer.git
-cd reinforcer
+git clone git@github.com:NVIDIA/nemo-rl.git
+cd nemo-rl
 ```
 
 Install `uv`
@@ -111,7 +111,7 @@ uv run python examples/run_grpo_math.py \
 #### Multi-node
 
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 NUM_ACTOR_NODES=2
 
 # grpo_math_8b uses Llama-3.1-8B-Instruct model
@@ -131,7 +131,7 @@ sbatch \
 ##### GRPO Qwen2.5-32B
 
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 NUM_ACTOR_NODES=16
 
 # Download Qwen before the job starts to avoid spending time downloading during the training loop
@@ -187,7 +187,7 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o
 #### Multi-node
 
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 NUM_ACTOR_NODES=2
 
 COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
@@ -244,7 +244,7 @@ Refer to [dpo.yaml](examples/configs/dpo.yaml) for a full list of parameters tha
 For distributed DPO training across multiple nodes, modify the following script for your use case:
 
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 ## number of nodes to use for your job
 NUM_ACTOR_NODES=2
 

@@ -22,12 +22,12 @@ WORKDIR /opt/reinforcer
 # First copy only the dependency files
 COPY --chown=ray --chmod=755 pyproject.toml uv.lock ./
 
-ENV UV_PROJECT_ENVIRONMENT=/opt/reinforcer_venv
-ENV VIRTUAL_ENV=/opt/reinforcer_venv
+ENV UV_PROJECT_ENVIRONMENT=/opt/nemo_rl_venv
+ENV VIRTUAL_ENV=/opt/nemo_rl_venv
 
 # Create and activate virtual environment
 RUN <<"EOF"
-uv venv /opt/reinforcer_venv
+uv venv /opt/nemo_rl_venv
 # uv sync has a more reliable resolver than simple uv pip install which can fail
 
 # Sync each training + inference backend one at a time (since they may conflict)
@@ -38,7 +38,7 @@ uv sync --locked --extra vllm --no-install-project
 uv sync --locked --all-groups --no-install-project
 EOF
 
-ENV PATH="/opt/reinforcer_venv/bin:$PATH"
+ENV PATH="/opt/nemo_rl_venv/bin:$PATH"
 
 # The ray images automatically activate the anaconda venv. We will
 # comment this out of the .bashrc to give the same UX between docker

@@ -1,6 +1,6 @@
 # Adding New Models
 
-This guide outlines how to integrate and validate a new model within **NeMo-Reinforcer**. Each new model must pass a standard set of compatibility tests before being considered ready to be used in RL pipelines.
+This guide outlines how to integrate and validate a new model within **NeMo-RL**. Each new model must pass a standard set of compatibility tests before being considered ready to be used in RL pipelines.
 
 ## Importance of Log Probability Consistency in Training and Inference
 
@@ -120,4 +120,4 @@ When validating your model, you should analyze the results across different conf
 
 ---
 
-By following these validation steps and ensuring your model's outputs remain consistent across backends, you can confirm that your new model meets **NeMo-Reinforcer**'s requirements.
+By following these validation steps and ensuring your model's outputs remain consistent across backends, you can confirm that your new model meets **NeMo-RL**'s requirements.
@@ -12,7 +12,7 @@
 ### Batched Job Submission
 
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 NUM_ACTOR_NODES=1  # Total nodes requested (head is colocated on ray-worker-0)
 
 COMMAND="uv run ./examples/run_grpo_math.py" \
@@ -43,12 +43,12 @@ tail -f 1980204-logs/ray-driver.log
 ### Interactive Launching
 
 :::{tip}
-A key advantage of running interactively on the head node is the ability to execute multiple multi-node jobs without needing to requeue in the SLURM job queue. This means during debugging sessions, you can avoid submitting a new `sbatch` command each time and instead debug and re-submit your Reinforcer job directly from the interactive session.
+A key advantage of running interactively on the head node is the ability to execute multiple multi-node jobs without needing to requeue in the SLURM job queue. This means during debugging sessions, you can avoid submitting a new `sbatch` command each time and instead debug and re-submit your NeMo-RL job directly from the interactive session.
 :::
 
 To run interactively, launch the same command as the [Batched Job Submission](#batched-job-submission) except omit the `COMMAND` line:
 ```sh
-# Run from the root of NeMo-Reinforcer repo
+# Run from the root of NeMo-RL repo
 NUM_ACTOR_NODES=1  # Total nodes requested (head is colocated on ray-worker-0)
 
 CONTAINER=YOUR_CONTAINER \

@@ -23,7 +23,7 @@
 import os
 import sys
 
-project = "NeMo-Reinforcer"
+project = "NeMo-RL"
 copyright = "2025, NVIDIA Corporation"
 author = "NVIDIA Corporation"
 release = "0.0.1"
@@ -59,7 +59,7 @@
 sys.path.insert(0, os.path.abspath(".."))
 
 autodoc2_packages = [
-    "../nemo_reinforcer",  # Path to your package relative to conf.py
+    "../nemo_rl",  # Path to your package relative to conf.py
 ]
 autodoc2_render_plugin = "myst"  # Use MyST for rendering docstrings
 autodoc2_output_dir = "apidocs"  # Output directory for autodoc2 (relative to docs/)

@@ -1,9 +1,9 @@
 # Checkpointing with HuggingFace Models
 
 ## Checkpoint Format
-Reinforcer provides two checkpoint formats for HuggingFace models: Torch distributed and HuggingFace format. Torch distributed is used by default for efficiency, and HuggingFace format is provided for compatibility with HuggingFace's `AutoModel.from_pretrained` API. Note that HuggingFace format checkpoints save only the model weights, ignoring the optimizer states. It is recommended to use Torch distributed format to save intermediate checkpoints and to save a HuggingFace checkpoint only at the end of training. 
+NeMo-RL provides two checkpoint formats for HuggingFace models: Torch distributed and HuggingFace format. Torch distributed is used by default for efficiency, and HuggingFace format is provided for compatibility with HuggingFace's `AutoModel.from_pretrained` API. Note that HuggingFace format checkpoints save only the model weights, ignoring the optimizer states. It is recommended to use Torch distributed format to save intermediate checkpoints and to save a HuggingFace checkpoint only at the end of training. 
 
-There are two ways to get a Reinforcer checkpoint in HuggingFace format.
+There are two ways to get a NeMo-RL checkpoint in HuggingFace format.
 
 1. (Recommended) Save the HuggingFace checkpoint directly by passing `save_hf=True` to `HFPolicy`'s `save_checkpoint`: