NVIDIA-NeMo · terrykong · Apr 15, 2025 · Apr 10, 2025 · Apr 10, 2025 · Apr 10, 2025
@@ -22,7 +22,7 @@ $$
 
 where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$
 
-as a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient.
+As a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient.
 
 ## Understanding Discrepancies Between Backends
 

@@ -0,0 +1,12 @@
+```{toctree}
+:caption: 📐 Design Docs
+:hidden:
+
+design-and-philosophy.md
+padding.md
+logger.md
+uv.md
+chat-datasets.md
+generation.md
+checkpointing.md
+```
@@ -0,0 +1,9 @@
+```{toctree}
+:caption: 📚 Guides
+:hidden:
+
+adding-new-models.md
+sft.md
+grpo.md
+eval.md
+```
@@ -29,7 +29,7 @@ SFT datasets in Reinforcer are encapsulated using classes. Each SFT data class i
   1. `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below.
   2. `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below.
 
-SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design_docs/chat_datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example:
+SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design-docs/chat-datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example:
 
 ```python
 def format_squad(data):

@@ -6,7 +6,7 @@
 :caption: 🖥️  Environment Start
 :hidden:
 
-local_workstation.md
+local-workstation.md
 cluster.md
 
 ```
@@ -15,7 +15,7 @@ cluster.md
 :caption: 📚 Guides
 :hidden:
 
-adding_new_models.md
+adding-new-models.md
 guides/sft.md
 guides/grpo.md
 guides/eval.md
@@ -41,11 +41,11 @@ apidocs/index.rst
 :caption: 📐 Design Docs
 :hidden:
 
-design_docs/design_and_philosophy.md
-design_docs/padding.md
-design_docs/logger.md
-design_docs/uv.md
-design_docs/chat_datasets.md
-design_docs/generation.md
-design_docs/checkpointing.md
+design-docs/design-and-philosophy.md
+design-docs/padding.md
+design-docs/logger.md
+design-docs/uv.md
+design-docs/chat-datasets.md
+design-docs/generation.md
+design-docs/checkpointing.md
 ```