diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2b02a7fb63..85559eabc6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -13,12 +13,6 @@ docker buildx build -t nemo-reinforcer -f Dockerfile . docker run -it --gpus all -v /path/to/nemo-reinforcer:/workspace/nemo-reinforcer nemo-reinforcer ``` -2. **Install the package in development mode**: -```bash -cd /workspace/nemo-reinforcer -pip install -e . -``` - ## Making Changes ### Workflow: Clone and Branch (No Fork Required) diff --git a/README.md b/README.md index a0aec8cde4..14391c5883 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ - [Nemo-Reinforcer: A Scalable and Efficient Post-Training Library for Models Ranging from tiny to \>100B Parameters, scaling from 1 GPU to 100s](#nemo-reinforcer-a-scalable-and-efficient-post-training-library-for-models-ranging-from-tiny-to-100b-parameters-scaling-from-1-gpu-to-100s) - [Features](#features) - - [Installation](#installation) + - [Prerequisuites](#prerequisuites) - [Quick start](#quick-start) - [SFT](#sft) - [Single Node](#single-node) @@ -38,28 +38,26 @@ What you can expect: - 🔜 **Environment Isolation** - Dependency isolation between components - 🔜 **DPO Algorithm** - Direct Preference Optimization for alignment -## Installation +## Prerequisuites ```sh -# For faster setup we use `uv` +# For faster setup and environment isolation, we use `uv` pip install uv -# Specify a virtual env that uses Python 3.12 -uv venv -p python3.12.9 .venv -# Install NeMo-Reinforcer with vllm -uv pip install -e .[vllm] -# Install NeMo-Reinforcer with dev/test dependencies -uv pip install -e '.[dev,test]' +# If you cannot install at the system level, you can install for your user with +# pip install --user uv -# Use uv run to launch any runs. -# Note that it is recommended to not activate the venv and instead use `uv run` since +# Use `uv run` to launch all commands. It handles pip installing implicitly and +# ensures your environment is up to date with our lock file. + +# Note that it is not recommended to activate the venv and instead use `uv run` since # it ensures consistent environment usage across different shells and sessions. # Example: uv run python examples/run_grpo_math.py ``` ## Quick start -**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. +**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. ### SFT @@ -91,12 +89,6 @@ Refer to `examples/configs/sft.yaml` for a full list of parameters that can be o For distributed training across multiple nodes: -Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command. - -```sh -export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache -``` - ```sh # Run from the root of NeMo-Reinforcer repo NUM_ACTOR_NODES=2 @@ -104,8 +96,7 @@ NUM_ACTOR_NODES=2 TIMESTAMP=$(date +%Y%m%d_%H%M%S) # SFT experiment uses Llama-3.1-8B model -COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \ -UV_CACHE_DIR=YOUR_UV_CACHE_DIR \ +COMMAND="uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \ CONTAINER=YOUR_CONTAINER \ MOUNTS="$PWD:$PWD" \ sbatch \ @@ -159,8 +150,7 @@ NUM_ACTOR_NODES=2 TIMESTAMP=$(date +%Y%m%d_%H%M%S) # grpo_math_8b uses Llama-3.1-8B-Instruct model -COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \ -UV_CACHE_DIR=YOUR_UV_CACHE_DIR \ +COMMAND="uv run ./examples/run_grpo_math.py --config examples/configs/grpo_math_8B.yaml cluster.num_nodes=2 checkpointing.checkpoint_dir='results/llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='grpo-llama8b_math'" \ CONTAINER=YOUR_CONTAINER \ MOUNTS="$PWD:$PWD" \ sbatch \ diff --git a/docs/cluster.md b/docs/cluster.md index c949b5eb77..8a73288e09 100644 --- a/docs/cluster.md +++ b/docs/cluster.md @@ -4,6 +4,7 @@ - [Slurm](#slurm) - [Batched Job Submission](#batched-job-submission) - [Interactive Launching](#interactive-launching) + - [Slurm UV\_CACHE\_DIR](#slurm-uv_cache_dir) - [Kubernetes](#kubernetes) ## Slurm @@ -14,7 +15,7 @@ # Run from the root of NeMo-Reinforcer repo NUM_ACTOR_NODES=1 # Total nodes requested (head is colocated on ray-worker-0) -COMMAND="uv pip install -e .; uv run ./examples/run_grpo_math.py" \ +COMMAND="uv run ./examples/run_grpo_math.py" \ CONTAINER=YOUR_CONTAINER \ MOUNTS="$PWD:$PWD" \ sbatch \ @@ -39,21 +40,6 @@ Make note of the the job submission number. Once the job begins you can track it tail -f 1980204-logs/ray-driver.log ``` -:::{note} -`UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` and is mounted to head and worker nodes -to ensure fast `venv` creation. - -If you would like to override it to somewhere else all head/worker nodes can access, you may set it -via: - -```sh -... -UV_CACHE_DIR=/path/that/all/workers/and/head/can/access \ -sbatch ... \ - ray.sub -``` -::: - ### Interactive Launching :::{tip} @@ -87,11 +73,27 @@ bash 1980204-attach.sh ``` Now that you are on the head node, you can launch the command like so: ```sh -uv venv .venv -uv pip install -e . uv run ./examples/run_grpo_math.py ``` +### Slurm UV_CACHE_DIR + +There several choices for `UV_CACHE_DIR` when using `ray.sub`: + +1. (default) `UV_CACHE_DIR` defaults to `$SLURM_SUBMIT_DIR/uv_cache` when not specified the shell environment, and is mounted to head and worker nodes to serve as a persistent cache between runs. +2. Use the warm uv cache from our docker images + ```sh + ... + UV_CACHE_DIR=/home/ray/.cache/uv \ + sbatch ... \ + ray.sub + ``` + +(1) is more efficient in general since the cache is not ephemeral and is persisted run to run; but for users that +don't want to persist the cache, you can use (2), which is just as performant as (1) if the `uv.lock` is +covered by warmed cache. + + ## Kubernetes TBD diff --git a/docs/guides/grpo.md b/docs/guides/grpo.md index b84cbf9f0c..3010295846 100644 --- a/docs/guides/grpo.md +++ b/docs/guides/grpo.md @@ -12,7 +12,7 @@ uv run examples/run_grpo_math.py --config {overrides} If not specified, `config` will default to [examples/configs/grpo.yaml](../../examples/configs/grpo_math_1B.yaml) -**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. +**Reminder**: Don't forget to set your HF_HOME, WANDB_API_KEY, and HF_DATASETS_CACHE (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. ## Now, for the details: diff --git a/docs/guides/sft.md b/docs/guides/sft.md index 54ebe4d341..4d22b71427 100644 --- a/docs/guides/sft.md +++ b/docs/guides/sft.md @@ -21,7 +21,7 @@ uv run examples/run_sft.py \ cluster.gpus_per_node=1 \ logger.wandb.name="sft-dev-1-gpu" ``` -**Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. +**Reminder**: Don't forget to set your `HF_HOME`, `WANDB_API_KEY`, and `HF_DATASETS_CACHE` (if needed). You'll need to do a `huggingface-cli login` as well for Llama models. ## Datasets diff --git a/docs/testing.md b/docs/testing.md index 474b6fa0b1..466789dfd6 100644 --- a/docs/testing.md +++ b/docs/testing.md @@ -103,8 +103,6 @@ Functional tests may require multiple GPUs to run. See each script to understand Functional tests are located under `tests/functional/`. ```sh -# Install the project and the test dependencies -uv pip install -e '.[test]' # Run the functional test for sft uv run bash tests/functional/sft.sh ``` diff --git a/nemo_reinforcer/models/generation/vllm.py b/nemo_reinforcer/models/generation/vllm.py index 041da525e3..1483245259 100644 --- a/nemo_reinforcer/models/generation/vllm.py +++ b/nemo_reinforcer/models/generation/vllm.py @@ -153,8 +153,8 @@ def __init__( self.SamplingParams = vllm.SamplingParams except ImportError: raise ImportError( - "vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` " - "or `pip install vllm --no-build-isolation` separately." + f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. " + "If you are working interactively, you can install by running `uv sync --extra vllm` anywhere in the repo." ) vllm_kwargs = self.cfg.get("vllm_kwargs", {}).copy() diff --git a/nemo_reinforcer/models/generation/vllm_backend.py b/nemo_reinforcer/models/generation/vllm_backend.py index a7fd12aa26..1e5fa21a33 100644 --- a/nemo_reinforcer/models/generation/vllm_backend.py +++ b/nemo_reinforcer/models/generation/vllm_backend.py @@ -17,9 +17,8 @@ import vllm except ImportError: raise ImportError( - "vLLM is not installed. Please install it with `pip install nemo-reinforcer[vllm]` " - "or `pip install vllm` separately. This issue may also occur if worker is using incorrect " - "py_executable." + f"vLLM is not installed. Please check that VllmGenerationWorker.DEFAULT_PY_EXECUTABLE covers the vllm dependency. " + "If you are working interactively, you can install by running `uv sync --extra vllm` anywhere in the repo." )