Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/adding_new_models.md → docs/adding-new-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ $$

where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$

as a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient.
As a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient.

## Understanding Discrepancies Between Backends

Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
Empty file.
12 changes: 12 additions & 0 deletions docs/design-docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
```{toctree}
:caption: 📐 Design Docs
:hidden:

design-and-philosophy.md
padding.md
logger.md
uv.md
chat-datasets.md
generation.md
checkpointing.md
```
File renamed without changes.
File renamed without changes.
File renamed without changes.
9 changes: 9 additions & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
```{toctree}
:caption: 📚 Guides
:hidden:

adding-new-models.md
sft.md
grpo.md
eval.md
```
2 changes: 1 addition & 1 deletion docs/guides/sft.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ SFT datasets in Reinforcer are encapsulated using classes. Each SFT data class i
1. `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below.
2. `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below.

SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design_docs/chat_datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example:
SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design-docs/chat-datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example:

```python
def format_squad(data):
Expand Down
18 changes: 9 additions & 9 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
:caption: 🖥️ Environment Start
:hidden:

local_workstation.md
local-workstation.md
cluster.md

```
Expand All @@ -15,7 +15,7 @@ cluster.md
:caption: 📚 Guides
:hidden:

adding_new_models.md
adding-new-models.md
guides/sft.md
guides/grpo.md
guides/eval.md
Expand All @@ -41,11 +41,11 @@ apidocs/index.rst
:caption: 📐 Design Docs
:hidden:

design_docs/design_and_philosophy.md
design_docs/padding.md
design_docs/logger.md
design_docs/uv.md
design_docs/chat_datasets.md
design_docs/generation.md
design_docs/checkpointing.md
design-docs/design-and-philosophy.md
design-docs/padding.md
design-docs/logger.md
design-docs/uv.md
design-docs/chat-datasets.md
design-docs/generation.md
design-docs/checkpointing.md
```
File renamed without changes.