Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,5 @@

# Codeowners
/.github/CODEOWNERS @nvidia-nemo/rl_maintainers

/research/template_project @terrykong
2 changes: 2 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,9 @@ ENV TORCH_CUDA_ARCH_LIST="9.0 10.0"

# First copy only the dependency files
COPY --from=nemo-rl pyproject.toml uv.lock ./
COPY --from=nemo-rl nemo_rl/__init__.py nemo_rl/package_info.py ./nemo_rl/
COPY --from=nemo-rl tools/build-custom-vllm.sh ./tools/build-custom-vllm.sh
COPY --from=nemo-rl --link research/ ./research/
COPY --from=nemo-rl --link 3rdparty/ ./3rdparty/

RUN <<"EOF" bash -exu
Expand Down
10 changes: 5 additions & 5 deletions nemo_rl/distributed/virtual_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,19 +44,19 @@ class PY_EXECUTABLES:
SYSTEM = sys.executable

# Use NeMo-RL direct dependencies.
BASE = "uv run --locked"
BASE = f"uv run --locked --directory {git_root}"

# Use NeMo-RL direct dependencies and vllm.
VLLM = "uv run --locked --extra vllm"
VLLM = f"uv run --locked --extra vllm --directory {git_root}"

# Use NeMo-RL direct dependencies and nemo-automodel.
AUTOMODEL = "uv run --locked --extra automodel"
AUTOMODEL = f"uv run --locked --extra automodel --directory {git_root}"

# Use NeMo-RL direct dependencies and Megatron.
MCORE = "uv run --locked --extra mcore"
MCORE = f"uv run --locked --extra mcore --directory {git_root}"

# Use Penguin dependencies
PENGUIN = "uv run --locked --extra penguin"
PENGUIN = f"uv run --locked --extra penguin --directory {git_root}"


@ray.remote # pragma: no cover
Expand Down
2 changes: 1 addition & 1 deletion nemo_rl/utils/venvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def create_local_venv(
exec_cmd.extend(["echo", f"Finished creating venv {venv_path}"])

# Always run uv sync first to ensure the build requirements are set (for --no-build-isolation packages)
subprocess.run(["uv", "sync"], env=env, check=True)
subprocess.run(["uv", "sync", "--directory", git_root], env=env, check=True)
subprocess.run(exec_cmd, env=env, check=True)

# Return the path to the python executable in the virtual environment
Expand Down
6 changes: 6 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ members = [
"3rdparty/Automodel-workspace/Automodel",
"3rdparty/Megatron-Bridge-workspace",
"3rdparty/Penguin-workspace",
# Research projects are also added here in order for them to share the global root level uv.lock.
# If we don't do this, the research projects do not see the global uv.lock, and may mistakenly
# install numpy>=2.0 because nemo-rl's core [dependencies] do not pin numpy, but when you inspect
# nemo-rl's uv.lock you'll see it's 1.X b/c megatron mandates 1.X in the optional dependencies, so
# globally we must choose 1.X otherwise we run into pickle issues from ray.
"research/template_project",
]

[[tool.uv.index]]
Expand Down
50 changes: 50 additions & 0 deletions research/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Research and Community Projects

This directory contains research experiments and community-contributed projects built on NeMo RL. Each project is self-contained and demonstrates different techniques and applications.

## Getting Started

To create a new research project, start with the template:

```bash
cp -r research/template_project research/my_new_project
```

The template includes:
- A minimal train-and-generate loop example
- Complete test suite structure (unit, functional, and test suites)
- Configuration examples
- Documentation template

## Expectations for Research Project Authors

> [!NOTE]
> This section is for research and community project authors contributing to the repository.

### Acceptance Criteria

The acceptance criteria for merging your research project into the main repository are reproduction steps for the results outlined in this README. We want to make sure others can reproduce your great work! Please include sufficient documentation in the README.md that enables users to follow and reproduce your results step-by-step.

> [!NOTE]
> We strongly encourage you to consider contributing universally applicable features directly to the core `nemo_rl` package. Your work can help improve NeMo RL for everyone! However, if your innovation introduces complexity that doesn't align with the core library's design principles, the research folder is exactly the right place for it. This directory exists specifically to showcase novel ideas and experimental approaches that may not fit neatly into "core".

### Code Reviews and Ownership

Code reviews for research projects will always involve the original authors. Please add your name to the `.github/CODEOWNERS` file to be alerted when any changes touch your project. The NeMo RL core team reserves the right to merge PRs that touch your project if the original author does not respond in a timely manner. This allows the core team to move quickly to resolve issues.

### Testing

Authors are encouraged to write tests for their research projects. This template demonstrates three types of tests:
1. **Unit tests** - Fast, isolated component tests
2. **Functional tests** - End-to-end tests with minimal configurations
3. **Test suites** (nightlies) - Longer-running comprehensive validation tests

All of these will be included in our automation. When changes occur in nemo-rl "core", the expectation is that it should not break tests that are written.

In the event that we cannot resolve test breakage and the authors are unresponsive, we reserve the right to disable the tests to ensure a high fidelity test signal. An example of this would be if we are deprecating a backend and the research project has not migrated to its replacement.

It should be noted that because we use `uv`, even if we must disable tests because the project will not work top-of-tree anymore, a user can always go back to the last working commit and run the research project with nemo-rl since the `uv.lock` represents the last known working state. Users can also build the Dockerfile at that commit to ensure a fully reproducible environment.

## Projects

- **[template_project](template_project/)** - A starting point for new research projects with example code and test structure
1 change: 1 addition & 0 deletions research/template_project/.python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
126 changes: 126 additions & 0 deletions research/template_project/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Template Project: A Starting Point

This is a template project for research experiments with NeMo RL.

> [!IMPORTANT]
> This is a template! To start a new research project, copy this directory to a new location:
> ```bash
> cp -r research/template_project research/my_new_project
> ```
> Then add your code and tests! Note that this project includes `nemo-rl` as a core dependency.

## What This Shows

The `single_update.py` script demonstrates a minimal train-and-generate loop:
1. Sets up a Ray compute cluster
2. Initializes vLLM generation and an LM policy
3. Trains the policy on a small batch using NLL loss
4. Refits the generation engine with the updated policy weights
5. Generates outputs with the new policy
6. Repeats the loop (10 iterations by default)

This shows the basic cycle of training a language model and using it for generation.

## Running the Example

To run the `single_update.py` script:

```bash
uv run single_update.py
```

## Testing

This project includes a comprehensive test suite following NeMo RL's testing patterns.

### Unit Tests

Unit tests validate individual components and functions.

```bash
# Run all unit tests
uv run --group test pytest tests/unit/
```

### Functional Tests

Functional tests run end-to-end scenarios with minimal configurations. These tests require GPU access.

> [!IMPORTANT]
> Functional tests require at least 1 GPU to run.

```bash
# Run the single_update functional test (runs for 1 step)
uv run bash tests/functional/single_update.sh
```

### Test Suites

Test suites are longer-running comprehensive tests designed for validation on multiple steps.

> [!IMPORTANT]
> Test suites require 8 GPUs and may take several minutes to complete.

```bash
# Run the single_update test suite locally (runs for 10 steps on 1 node with 8 GPUs)
bash tests/test_suites/llm/single_update_1n8g.sh

# Launch on SLURM with code snapshots
# For full documentation on tools/launch, see:
# https://github.com/NVIDIA-NeMo/RL/blob/main/tests/test_suites/README.md#launching-with-code-snapshots
bash ../../tools/launch tests/test_suites/llm/single_update_1n8g.sh

# Dry run to estimate GPU hours needed
DRYRUN=1 bash ../../tools/launch tests/test_suites/llm/single_update_1n8g.sh
```

> [!TIP]
> The `tools/launch` script creates code snapshots and launches SLURM jobs for reproducible experiments. It automatically extracts the configuration from your test suite script and submits the appropriate number of jobs.

The test suite structure mirrors nemo-rl's test organization:
- `tests/unit/` - Fast, isolated unit tests
- `tests/functional/` - End-to-end tests with minimal configurations
- `tests/test_suites/llm/` - Comprehensive multi-step validation tests
- `configs/recipes/llm/` - Configuration files for test suites (using defaults to inherit from base configs)

## Updating Dependencies

If you update the dependencies of this research project, run the following command to update the global `uv.lock` file and freeze the working set of dependencies:

```bash
uv lock
```

This command will:
- Resolve all dependencies
- Update `uv.lock` with the latest compatible versions
- Ensure dependency consistency across environments

## Python Version

> [!NOTE]
> This project uses Python 3.12 as specified in `.python-version`.
> This Python version should always be kept in sync with the `.python-version` file at the root of the `nemo-rl` repository to ensure compatibility.


## Citation

If you use this research project or have questions, please contact:

```
Author: AUTHOR NAMES HERE
Email: AUTHOR EMAILS HERE
Organization: ORGANIZATION HERE (optional)
```

If you use this research project, please cite it using the following BibTeX entry:

```bibtex
@misc{template-project,
title = {Template Project: A Starting Point},
author = {AUTHOR NAMES HERE},
howpublished = {\url{https://github.com/NVIDIA-NeMo/RL/tree/main/research/template_project}},
year = {2025},
note = {Research project based on NeMo RL},
}
```
Loading
Loading