Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,6 @@ docker buildx build -t nemo-reinforcer -f Dockerfile .
docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer nemo-reinforcer
```

2. **Install the package in development mode**:
```bash
cd /workspace/nemo-reinforcer
pip install -e .
```

## Making Changes

### Workflow: Clone and Branch (No Fork Required)
Expand Down
34 changes: 12 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<!-- markdown all in one -->
- [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s)
- [Features](#features)
- [Installation](#installation)
- [Prerequisuites](#prerequisuites)
- [Quick start](#quick-start)
- [SFT](#sft)
- [Single Node](#single-node)
Expand Down Expand Up @@ -38,28 +38,26 @@ What you can expect:
- 🔜 **Environment Isolation** - Dependency isolation between components
- 🔜 **DPO Algorithm** - Direct Preference Optimization for alignment

## Installation
## Prerequisuites

```sh
# For faster setup we use `uv`
# For faster setup and environment isolation, we use `uv`
pip install uv

# Specify a virtual env that uses Python 3.12
uv venv -p python3.12.9 .venv
# Install NeMo-Reinforcer with vllm
uv pip install -e .[vllm]
# Install NeMo-Reinforcer with dev/test dependencies
uv pip install -e '.[dev,test]'
# If you cannot install at the system level, you can install for your user with
# pip install --user uv

# Use uv run to launch any runs.
# Note that it is recommended to not activate the venv and instead use `uv run` since
# Use `uv run` to launch all commands. It handles pip installing implicitly and
# ensures your environment is up to date with our lock file.

# Note that it is not recommended to activate the venv and instead use `uv run` since
# it ensures consistent environment usage across different shells and sessions.
# Example: uv run python examples/run_grpo_math.py
```

## Quick start

**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.

### SFT

Expand Down Expand Up @@ -91,21 +89,14 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o

For distributed training across multiple nodes:

Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command.

```sh
export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache
```

```sh
# Run from the root of NeMo-Reinforcer repo
NUM_ACTOR_NODES=2
# Add a timestamp to make each job name unique
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# SFT experiment uses Llama-3.1-8B model
COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
UV_CACHE_DIR=YOUR_UV_CACHE_DIR \
COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
CONTAINER=YOUR_CONTAINER \
MOUNTS="$PWD:$PWD" \
sbatch \
Expand Down Expand Up @@ -159,8 +150,7 @@ NUM_ACTOR_NODES=2
TIMESTAMP=$(date +%Y%m%d_%H%M%S)

# grpo_math_8b uses Llama-3.1-8B-Instruct model
COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \
UV_CACHE_DIR=YOUR_UV_CACHE_DIR \
COMMAND="uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \
CONTAINER=YOUR_CONTAINER \
MOUNTS="$PWD:$PWD" \
sbatch \
Expand Down
38 changes: 20 additions & 18 deletions docs/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
- [Slurm](#slurm)
- [Batched Job Submission](#batched-job-submission)
- [Interactive Launching](#interactive-launching)
- [Slurm UV\_CACHE\_DIR](#slurm-uv_cache_dir)
- [Kubernetes](#kubernetes)

## Slurm
Expand All @@ -14,7 +15,7 @@
# Run from the root of NeMo-Reinforcer repo
NUM_ACTOR_NODES=1 # Total nodes requested (head is colocated on ray-worker-0)

COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py" \
COMMAND="uv run ./examples/run_grpo_math.py" \
CONTAINER=YOUR_CONTAINER \
MOUNTS="$PWD:$PWD" \
sbatch \
Expand All @@ -39,21 +40,6 @@ Make note of the the job submission number. Once the job begins you can track it
tail -f 1980204-logs/ray-driver.log
```

:::{note}
`UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` and is mounted to head and worker nodes
to ensure fast `venv` creation.

If you would like to override it to somewhere else all head/worker nodes can access, you may set it
via:

```sh
...
UV_CACHE_DIR=/path/that/all/workers/and/head/can/access \
sbatch ... \
ray.sub
```
:::

### Interactive Launching

:::{tip}
Expand Down Expand Up @@ -87,11 +73,27 @@ bash 1980204-attach.sh
```
Now that you are on the head node, you can launch the command like so:
```sh
uv venv .venv
uv pip install -e .
uv run ./examples/run_grpo_math.py
```

### Slurm UV_CACHE_DIR

There several choices for `UV_CACHE_DIR` when using `ray.sub`:

1. (default) `UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` when not specified the shell environment, and is mounted to head and worker nodes to serve as a persistent cache between runs.
2. Use the warm uv cache from our docker images
```sh
...
UV_CACHE_DIR=/home/ray/.cache/uv \
sbatch ... \
ray.sub
```

(1) is more efficient in general since the cache is not ephemeral and is persisted run to run; but for users that
don't want to persist the cache, you can use (2), which is just as performant as (1) if the `uv.lock` is
covered by warmed cache.


## Kubernetes

TBD
2 changes: 1 addition & 1 deletion docs/guides/grpo.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ uv run examples/run_grpo_math.py --config <PATH TO YAML CONFIG> {overrides}

If not specified, `config` will default to [examples/configs/grpo.yaml](../../examples/configs/grpo_math_1B.yaml)

**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
**Reminder**: Don't forget to set your HF_HOME, WANDB_API_KEY, and HF_DATASETS_CACHE (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.

## Now, for the details:

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/sft.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ uv run examples/run_sft.py \
cluster.gpus_per_node=1 \
logger.wandb.name="sft-dev-1-gpu"
```
**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.

## Datasets

Expand Down
2 changes: 0 additions & 2 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,6 @@ Functional tests may require multiple GPUs to run. See each script to understand
Functional tests are located under `tests/functional/`.

```sh
# Install the project and the test dependencies
uv pip install -e '.[test]'
# Run the functional test for sft
uv run bash tests/functional/sft.sh
```
Expand Down
4 changes: 2 additions & 2 deletions nemo_reinforcer/models/generation/vllm.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,8 @@ def __init__(
self.SamplingParams = vllm.SamplingParams
except ImportError:
raise ImportError(
"vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` "
"or `pip install vllm --no-build-isolation` separately."
f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. "
"If you are working interactively, you can install by running `uv sync --extra vllm` anywhere in the repo."
)
vllm_kwargs = self.cfg.get("vllm_kwargs", {}).copy()

Expand Down
5 changes: 2 additions & 3 deletions nemo_reinforcer/models/generation/vllm_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,8 @@
import vllm
except ImportError:
raise ImportError(
"vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` "
"or `pip install vllm` separately. This issue may also occur if worker is using incorrect "
"py_executable."
f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. "
"If you are working interactively, you can install by running `uv sync --extra vllm` anywhere in the repo."
)


Expand Down