Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,12 +140,12 @@ Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepa
{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step`
:::

:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth and TRL
:link: training-unsloth-trl
:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth
:link: training-unsloth
:link-type: ref
Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more.
+++
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`trl` {bdg-secondary}`single-step`
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step`
:::

::::
Expand Down Expand Up @@ -211,7 +211,7 @@ tutorials/index.md
tutorials/creating-resource-server
tutorials/offline-training-w-rollouts
tutorials/nemo-rl-grpo/index.md
tutorials/unsloth-trl-training
tutorials/unsloth-training
```

```{toctree}
Expand Down
6 changes: 3 additions & 3 deletions docs/tutorials/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,12 @@ Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepa
{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo` {bdg-secondary}`multi-step`
:::

:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth and TRL
:link: training-unsloth-trl
:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` Unsloth
:link: training-unsloth
:link-type: ref
Fast, memory-efficient fine-tuning for single-step tasks: math, structured outputs, instruction following, reasoning gym and more.
+++
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`trl` {bdg-secondary}`single-step`
{bdg-primary}`training` {bdg-secondary}`unsloth` {bdg-secondary}`single-step`
:::

::::
Original file line number Diff line number Diff line change
@@ -1,19 +1,16 @@
(training-unsloth-trl)=
(training-unsloth)=

# RL Training with Unsloth and TRL
# RL Training with Unsloth

This tutorial demonstrates how to use [Unsloth](https://github.com/unslothai/unsloth) and [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) to fine-tune models for single-step tasks with NeMo Gym verifiers and datasets.
This tutorial demonstrates how to use [Unsloth](https://github.com/unslothai/unsloth) to fine-tune models for single-step tasks with NeMo Gym verifiers and datasets.

**Unsloth** is a fast, memory-efficient library for fine-tuning large language models. It provides optimized implementations that significantly reduce memory usage and training time, making it possible to fine-tune larger models on consumer hardware.

**TRL** is a library from HuggingFace for post-training models using techniques like SFT, GRPO, and DPO. It is built on top of [Transformers](https://github.com/huggingface/transformers) and supports a variety of model architectures and modalities.


Both Unsloth and TRL can be used with NeMo Gym single-step verifiers including math tasks, structured outputs, instruction following, reasoning gym, and more.
Unsloth can be used with NeMo Gym single-step verifiers including math tasks, structured outputs, instruction following, reasoning gym, and more.

:::{card}

**Goal**: Fine-tune a model for single-step tasks using Unsloth and TRL with NeMo Gym verifiers.
**Goal**: Fine-tune a model for single-step tasks using Unsloth with NeMo Gym verifiers.

^^^

Expand All @@ -28,7 +25,7 @@ Both Unsloth and TRL can be used with NeMo Gym single-step verifiers including m

## Getting Started

Follow this interactive notebook to train your first model with Unsloth or TRL and NeMo Gym:
Follow this interactive notebook to train your first model with Unsloth and NeMo Gym:

:::{button-link} https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/nemo_gym_sudoku.ipynb
:color: primary
Expand All @@ -37,7 +34,9 @@ Follow this interactive notebook to train your first model with Unsloth or TRL a
Unsloth GRPO notebook
:::

> **Note:** This notebook supports **single-step tasks** including math, structured outputs, instruction following, reasoning gym, and more. For multi-step tool calling scenarios, see the {doc}`GRPO with NeMo RL <nemo-rl-grpo/index>` tutorial. A complete multi-step and multi-turn integration with Unsloth and TRL is under construction!
Check out [Unsloth's documentation](https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym) for more details.

> **Note:** This notebook supports **single-step tasks** including math, structured outputs, instruction following, reasoning gym, and more. For multi-step tool calling scenarios, see the {doc}`GRPO with NeMo RL <nemo-rl-grpo/index>` tutorial. A multi-step integration with Unsloth is under construction!

---

Expand Down