diff --git a/docs/adding_new_models.md b/docs/adding-new-models.md similarity index 99% rename from docs/adding_new_models.md rename to docs/adding-new-models.md index c39642ea69..34aaaaf3b0 100644 --- a/docs/adding_new_models.md +++ b/docs/adding-new-models.md @@ -22,7 +22,7 @@ $$ where samples are drawn as $x \sim \pi_{\text{sampling-framework}}$ -as a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient. +As a measure of multiplicative probability error for sampled tokens. Note that this is not exhaustive (the sampling framework could lack distribution support and we wouldn't catch it here, as $x \sim \pi_{\text{sampling-framework}}$). To get a much stricter guarantee on correctness, you should run this metric twice and average the results, where in the second run, you sample $x \sim \pi_{\text{training-framework}}$. In practice, we use just the former in our tests and find it sufficient. ## Understanding Discrepancies Between Backends diff --git a/docs/design_docs/chat_datasets.md b/docs/design-docs/chat-datasets.md similarity index 100% rename from docs/design_docs/chat_datasets.md rename to docs/design-docs/chat-datasets.md diff --git a/docs/design_docs/checkpointing.md b/docs/design-docs/checkpointing.md similarity index 100% rename from docs/design_docs/checkpointing.md rename to docs/design-docs/checkpointing.md diff --git a/docs/design_docs/design_and_philosophy.md b/docs/design-docs/design-and-philosophy.md similarity index 100% rename from docs/design_docs/design_and_philosophy.md rename to docs/design-docs/design-and-philosophy.md diff --git a/docs/design_docs/generation.md b/docs/design-docs/generation.md similarity index 100% rename from docs/design_docs/generation.md rename to docs/design-docs/generation.md diff --git a/docs/design-docs/gpu-logger.md b/docs/design-docs/gpu-logger.md new file mode 100644 index 0000000000..e69de29bb2 diff --git a/docs/design-docs/index.md b/docs/design-docs/index.md new file mode 100644 index 0000000000..e178a61002 --- /dev/null +++ b/docs/design-docs/index.md @@ -0,0 +1,12 @@ +```{toctree} +:caption: 📐 Design Docs +:hidden: + +design-and-philosophy.md +padding.md +logger.md +uv.md +chat-datasets.md +generation.md +checkpointing.md +``` \ No newline at end of file diff --git a/docs/design_docs/logger.md b/docs/design-docs/logger.md similarity index 100% rename from docs/design_docs/logger.md rename to docs/design-docs/logger.md diff --git a/docs/design_docs/padding.md b/docs/design-docs/padding.md similarity index 100% rename from docs/design_docs/padding.md rename to docs/design-docs/padding.md diff --git a/docs/design_docs/uv.md b/docs/design-docs/uv.md similarity index 100% rename from docs/design_docs/uv.md rename to docs/design-docs/uv.md diff --git a/docs/guides/index.md b/docs/guides/index.md new file mode 100644 index 0000000000..4276cc8d22 --- /dev/null +++ b/docs/guides/index.md @@ -0,0 +1,9 @@ +```{toctree} +:caption: 📚 Guides +:hidden: + +adding-new-models.md +sft.md +grpo.md +eval.md +``` \ No newline at end of file diff --git a/docs/guides/sft.md b/docs/guides/sft.md index 4d452b109d..8a67da85e8 100644 --- a/docs/guides/sft.md +++ b/docs/guides/sft.md @@ -29,7 +29,7 @@ SFT datasets in Reinforcer are encapsulated using classes. Each SFT data class i 1. `formatted_ds`: The dictionary of formatted datasets. This dictionary should contain `train` and `validation` splits, and each split should conform to the format described below. 2. `task_spec`: The `TaskDataSpec` for this dataset. This should specify the name you choose for this dataset as well as the `custom_template` for this dataset. More on custom templates below. -SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design_docs/chat_datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example: +SFT datasets are expected to follow the HuggingFace chat format. Refer to the [chat dataset document](../design-docs/chat-datasets.md) for details. If your data is not in the correct format, simply write a preprocessing script to convert the data into this format. [data/hf_datasets/squad.py](../../nemo_reinforcer/data/hf_datasets/squad.py) has an example: ```python def format_squad(data): diff --git a/docs/index.md b/docs/index.md index 553778ff98..0b802b0ce2 100644 --- a/docs/index.md +++ b/docs/index.md @@ -6,7 +6,7 @@ :caption: 🖥️ Environment Start :hidden: -local_workstation.md +local-workstation.md cluster.md ``` @@ -15,7 +15,7 @@ cluster.md :caption: 📚 Guides :hidden: -adding_new_models.md +adding-new-models.md guides/sft.md guides/grpo.md guides/eval.md @@ -41,11 +41,11 @@ apidocs/index.rst :caption: 📐 Design Docs :hidden: -design_docs/design_and_philosophy.md -design_docs/padding.md -design_docs/logger.md -design_docs/uv.md -design_docs/chat_datasets.md -design_docs/generation.md -design_docs/checkpointing.md +design-docs/design-and-philosophy.md +design-docs/padding.md +design-docs/logger.md +design-docs/uv.md +design-docs/chat-datasets.md +design-docs/generation.md +design-docs/checkpointing.md ``` diff --git a/docs/local_workstation.md b/docs/local-workstation.md similarity index 100% rename from docs/local_workstation.md rename to docs/local-workstation.md