NVIDIA-NeMo · SahilJain314 · Mar 22, 2025 · Mar 21, 2025 · Mar 21, 2025 · Mar 21, 2025
@@ -54,11 +54,66 @@ uv pip install -e '.[dev,test]'
 
 **Reminder**: Don't forget to set your HF_HOME and WANDB_API_KEY (if needed). You'll need to do a `huggingface-cli login` as well for Llama models.
 
+### SFT
+
+We provide a sample SFT experiment that uses the [SQuAD dataset](https://rajpurkar.github.io/SQuAD-explorer/).
+
+#### Single Node
+
+The experiment is set up to run on 8 GPUs. If using a machine that has access to 8 GPUs, you can launch the experiment as follows:
+
+```sh
+uv run python examples/run_sft.py
+```
+
+This trains `Llama3.1-8B` on 8 GPUs. To run on a single GPU, we'll have to override a few of the experiment settings. We replace the 8B model with a smaller 1B model, decrease the batch size, and update the cluster configuration to use a single gpu:
+
+```sh
+uv run python examples/run_sft.py \
+  policy.model_name="meta-llama/Llama-3.2-1B" \
+  policy.train_global_batch_size=16 \
+  sft.val_global_batch_size=16 \
+  cluster.gpus_per_node=1
+```
+
+Refer to [sft.yaml](examples/configs/sft.yaml) for a full list of parameters that can be overridden.
+
+#### Multi-node
+
+For distributed training across multiple nodes:
+
+Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command.
+```sh
+export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache
+```
+
+```sh
+# Run from the root of NeMo-Reinforcer repo
+NUM_ACTOR_NODES=2
+# Add a timestamp to make each job name unique
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+
+# SFT experiment uses Llama-3.1-8B model
+COMMAND="uv pip install -e .; uv run ./examples/run_sft.py --config examples/configs/sft.yaml cluster.num_nodes=2 cluster.gpus_per_node=8 checkpointing.checkpoint_dir='results/sft_llama8b_2nodes' logger.wandb_enabled=True logger.wandb.name='sft-llama8b'" \
+RAY_DEDUP_LOGS=0 \
+UV_CACHE_DIR=YOUR_UV_CACHE_DIR \
+CONTAINER=YOUR_CONTAINER \
+MOUNTS="$PWD:$PWD" \
+sbatch \
+    --nodes=${NUM_ACTOR_NODES} \
+    --account=YOUR_ACCOUNT \
+    --job-name=YOUR_JOBNAME \
+    --partition=YOUR_PARTITION \
+    --time=4:0:0 \
+    --gres=gpu:8 \
+    ray.sub
+```
+
 ### GRPO
 
 We have a reference GRPO experiment config set up trained for math benchmarks using the [OpenInstructMath2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2) dataset.
 
-#### Single GPU
+#### Single Node
 
 To run GRPO on a single GPU for `Llama-3.2-1B-Instruct`:
 
@@ -67,25 +122,28 @@ To run GRPO on a single GPU for `Llama-3.2-1B-Instruct`:
 uv run python examples/run_grpo_math.py
 ```
 
-By default, this uses the configuration in `examples/configs/grpo_math_1B.yaml`. You can customize parameters with command-line overrides:
+By default, this uses the configuration in `examples/configs/grpo_math_1B.yaml`. You can customize parameters with command-line overrides. For example, to run on 8 gpus,
+
+```sh
+# Run the GRPO math example using a 1B parameter model using 8 GPUs
+uv run python examples/run_grpo_math.py \
+  cluster.gpus_per_node=8
+```
+
+You can override any of the parameters listed in the yaml configuration file. For example,
 
 ```sh
 uv run python examples/run_grpo_math.py \
   policy.model_name="Qwen/Qwen2-1.5B" \
   checkpointing.checkpoint_dir="results/qwen1_5b_math" \
   logger.wandb_enabled=True \
   logger.wandb.name="grpo-qwen1_5b_math" \
-  logger.num_val_samples_to_print=10
+  logger.num_val_samples_to_print=10 \
 ```
 
 #### Multi-node
 
-For distributed training across multiple nodes:
-
-Set `UV_CACHE_DIR` to a directory that can be read from all workers before running any uv run command.
-```sh
-export UV_CACHE_DIR=/path/that/all/workers/can/access/uv_cache
-```
+For the general multi-node setup, refer to the [SFT multi-node](#multi-node) documentation. The only thing that differs from SFT is the `COMMAND`:
 
 ```sh
 # Run from the root of NeMo-Reinforcer repo