Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions docs/guides/sft.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ The script [examples/run_sft.py](../../examples/run_sft.py) can be used to launc

Be sure to launch the job using `uv`. The command to launch an SFT job is as follows:
```bash
uv run examples/run_sft.py --config <PATH TO YAML CONFIG> --output-dir <PATH TO OUTPUT DIRECTORY>
uv run examples/run_sft.py --config <PATH TO YAML CONFIG>
```
If not specified, `config` will default to [examples/configs/sft.yaml](../../examples/configs/sft.yaml) and `output-dir` will default to `./outputs`.
If not specified, `config` will default to [examples/configs/sft.yaml](../../examples/configs/sft.yaml).

## Configuration

Expand All @@ -17,16 +17,16 @@ Reinforcer allows users to configure experiments using `yaml` config files. An e
To override a value in the config, either update the value in the `yaml` file directly, or pass the override via the command line. For example:

```bash
python examples/run_sft.py \
data.max_input_seq_length=8192 \
logger.wandb.name="sft-dev-sl-8192"
uv run examples/run_sft.py \
cluster.gpus_per_node=1 \
logger.wandb.name="sft-dev-1-gpu"
```

## Datasets

SFT datasets in Reinforcer are encapsulated using classes. Each SFT data class is expected to have the following attributes:
- `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below.
- `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below.
1. `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below.
2. `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below.

SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design_docs/chat_datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example:

Expand Down Expand Up @@ -62,4 +62,7 @@ task_spec = TaskDataSpec(
)
```

By default, NeMo-Reinforcer has support for `Squad` and `OpenAssistant` datasets. If you would like to use a custom dataset, create a new dataset class with the expected attributes.
By default, NeMo-Reinforcer has support for `Squad` and `OpenAssistant` datasets. Both of these datasets are downloaded from HuggingFace and preprocessed on-the-fly, so there's no need to provide a path to any datasets on disk.

Adding a new dataset is a straightforward process.
As long as your custom dataset has the `formatted_ds` and `task_spec` attributes described above, it can serve as a drop-in replacement for Squad and OpenAssistant.