diff --git a/docs/index.md b/docs/index.md index 190f021b9..da62bdd3c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -140,12 +140,12 @@ Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepa {bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step` ::: -:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth and TRL -:link: training-unsloth-trl +:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth +:link: training-unsloth :link-type: ref Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more. +++ -{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`trl` {bdg-secondary}`single-step` +{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step` ::: :::: @@ -211,7 +211,7 @@ tutorials/index.md tutorials/creating-resource-server tutorials/offline-training-w-rollouts tutorials/nemo-rl-grpo/index.md -tutorials/unsloth-trl-training +tutorials/unsloth-training ``` ```{toctree} diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md index ff5c4561a..3a065ee87 100644 --- a/docs/tutorials/index.md +++ b/docs/tutorials/index.md @@ -60,12 +60,12 @@ Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepa {bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step` ::: -:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth and TRL -:link: training-unsloth-trl +:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth +:link: training-unsloth :link-type: ref Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more. +++ -{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`trl` {bdg-secondary}`single-step` +{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step` ::: :::: diff --git a/docs/tutorials/unsloth-trl-training.md b/docs/tutorials/unsloth-training.md similarity index 66% rename from docs/tutorials/unsloth-trl-training.md rename to docs/tutorials/unsloth-training.md index 166afda01..f611fd908 100644 --- a/docs/tutorials/unsloth-trl-training.md +++ b/docs/tutorials/unsloth-training.md @@ -1,19 +1,16 @@ -(training-unsloth-trl)= +(training-unsloth)= -# RL Training with Unsloth and TRL +# RL Training with Unsloth -This tutorial demonstrates how to use [Unsloth](https://github.com/unslothai/unsloth) and [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) to fine-tune models for single-step tasks with NeMo Gym verifiers and datasets. +This tutorial demonstrates how to use [Unsloth](https://github.com/unslothai/unsloth) to fine-tune models for single-step tasks with NeMo Gym verifiers and datasets. **Unsloth** is a fast, memory-efficient library for fine-tuning large language models. It provides optimized implementations that significantly reduce memory usage and training time, making it possible to fine-tune larger models on consumer hardware. -**TRL** is a library from HuggingFace for post-training models using techniques like SFT, GRPO, and DPO. It is built on top of [Transformers](https://github.com/huggingface/transformers) and supports a variety of model architectures and modalities. - - -Both Unsloth and TRL can be used with NeMo Gym single-step verifiers including math tasks, structured outputs, instruction following, reasoning gym, and more. +Unsloth can be used with NeMo Gym single-step verifiers including math tasks, structured outputs, instruction following, reasoning gym, and more. :::{card} -**Goal**: Fine-tune a model for single-step tasks using Unsloth and TRL with NeMo Gym verifiers. +**Goal**: Fine-tune a model for single-step tasks using Unsloth with NeMo Gym verifiers. ^^^ @@ -28,7 +25,7 @@ Both Unsloth and TRL can be used with NeMo Gym single-step verifiers including m ## Getting Started -Follow this interactive notebook to train your first model with Unsloth or TRL and NeMo Gym: +Follow this interactive notebook to train your first model with Unsloth and NeMo Gym: :::{button-link} https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/nemo_gym_sudoku.ipynb :color: primary @@ -37,7 +34,9 @@ Follow this interactive notebook to train your first model with Unsloth or TRL a Unsloth GRPO notebook ::: -> **Note:** This notebook supports **single-step tasks** including math, structured outputs, instruction following, reasoning gym, and more. For multi-step tool calling scenarios, see the {doc}`GRPO with NeMo RL ` tutorial. A complete multi-step and multi-turn integration with Unsloth and TRL is under construction! +Check out [Unsloth's documentation](https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym) for more details. + +> **Note:** This notebook supports **single-step tasks** including math, structured outputs, instruction following, reasoning gym, and more. For multi-step tool calling scenarios, see the {doc}`GRPO with NeMo RL ` tutorial. A multi-step integration with Unsloth is under construction! ---