diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 5db0f53fcaa..16036f7d83f 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -53,8 +53,6 @@ title: Community Tutorials - local: lora_without_regret title: LoRA Without Regret - - local: sentiment_tuning - title: Sentiment Tuning - local: multi_adapter_rl title: Multi Adapter RLHF title: Examples diff --git a/docs/source/example_overview.md b/docs/source/example_overview.md index 488a99d2924..898459ba783 100644 --- a/docs/source/example_overview.md +++ b/docs/source/example_overview.md @@ -36,8 +36,6 @@ These notebooks are easier to run and are designed for quick experimentation wit Legacy / Older Notebooks - [`best_of_n.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/best_of_n.ipynb): This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO. -- [`gpt2-sentiment.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb): This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook. -- [`gpt2-sentiment-control.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment-control.ipynb): This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook. ## Scripts diff --git a/docs/source/sentiment_tuning.md b/docs/source/sentiment_tuning.md deleted file mode 100644 index 025ffba0da5..00000000000 --- a/docs/source/sentiment_tuning.md +++ /dev/null @@ -1,31 +0,0 @@ -# Sentiment Tuning Examples - -The notebooks and scripts in these examples show how to fine-tune a model with a sentiment classifier (such as `lvwerra/distilbert-imdb`). - -Here's an overview of the notebooks and scripts in the [trl repository](https://github.com/huggingface/trl/tree/main/examples): - -| File | Description | -| --- |--- | -| [`examples/scripts/ppo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/ppo.py) [](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/sentiment/notebooks/gpt2-sentiment.ipynb) | This script shows how to use the `PPOTrainer` to fine-tune a sentiment analysis model using IMDB dataset | -| [`examples/notebooks/gpt2-sentiment.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb) | This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook. | -| [`examples/notebooks/gpt2-control.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-control.ipynb) [](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/sentiment/notebooks/gpt2-sentiment-control.ipynb) | This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook. | - -## Usage - -```bash -# 1. run directly -python examples/scripts/ppo.py -# 2. run via `accelerate` (recommended), enabling more features (e.g., multiple GPUs, deepspeed) -accelerate config # will prompt you to define the training configuration -accelerate launch examples/scripts/ppo.py # launches training -# 3. get help text and documentation -python examples/scripts/ppo.py --help -# 4. configure logging with wandb and, say, mini_batch_size=1 and gradient_accumulation_steps=16 -python examples/scripts/ppo.py --log_with wandb --mini_batch_size 1 --gradient_accumulation_steps 16 -``` - -Note: if you don't want to log with `wandb` remove `log_with="wandb"` in the scripts/notebooks. You can also replace it with your favourite experiment tracker that's [supported by `accelerate`](https://huggingface.co/docs/accelerate/usage_guides/tracking). - -## Few notes on multi-GPU - -To run in multi-GPU setup with DDP (distributed Data Parallel) change the `device_map` value to `device_map={"": Accelerator().process_index}` and make sure to run your script with `accelerate launch yourscript.py`. If you want to apply naive pipeline parallelism you can use `device_map="auto"`. diff --git a/examples/notebooks/README.md b/examples/notebooks/README.md index 0c55f02e5ae..9aee78682c0 100644 --- a/examples/notebooks/README.md +++ b/examples/notebooks/README.md @@ -13,5 +13,3 @@ This directory contains a collection of Jupyter notebooks that demonstrate how t Legacy / Older Notebooks - [`best_of_n.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/best_of_n.ipynb): This notebook demonstrates how to use the "Best of N" sampling strategy using TRL when fine-tuning your model with PPO. -- [`gpt2-sentiment.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment.ipynb): This notebook demonstrates how to reproduce the GPT2 imdb sentiment tuning example on a jupyter notebook. -- [`gpt2-sentiment-control.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/gpt2-sentiment-control.ipynb): This notebook demonstrates how to reproduce the GPT2 sentiment control example on a jupyter notebook. diff --git a/examples/notebooks/gpt2-sentiment-control.ipynb b/examples/notebooks/gpt2-sentiment-control.ipynb deleted file mode 100644 index fa19ed3f1c1..00000000000 --- a/examples/notebooks/gpt2-sentiment-control.ipynb +++ /dev/null @@ -1,897 +0,0 @@ -{ - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Tune GPT2 to generate controlled sentiment reviews\n", - "> Optimise GPT2 to produce IMDB movie reviews with controlled sentiment using a BERT sentiment classifier for rewards.\n", - "\n", - "**WARNING:** We often experienced loss spikes in this examples which caused model training to fail or slow down. There is a [GitHub issue](https://github.com/lvwerra/trl/issues/101) to track the issue." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n",
- "Figure: Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face.
\n", - "/home/leandro_huggingface_co/trl/examples/sentiment/notebooks/wandb/run-20230206_125743-jpcnr7jx"
- ],
- "text/plain": [
- "
\n",
- "Figure: Reward mean and distribution evolution during training.
\n", - "
\n",
- "Figure: Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face.
\n", - "
\n",
- "Figure: Reward mean and distribution evolution during training.
\n", - "| \n", - " | query | \n", - "response (before) | \n", - "response (after) | \n", - "rewards (before) | \n", - "rewards (after) | \n", - "
|---|---|---|---|---|---|
| 0 | \n", - "I rented Zero Day | \n", - "4 for my sister. To my surprise, the Wii caug... | \n", - ". It is a pleasure. It is a huge leap 68 years... | \n", - "1.736068 | \n", - "2.423731 | \n", - "
| 1 | \n", - "The only | \n", - "distro of her | \n", - "special compliments is the | \n", - "0.150852 | \n", - "0.190159 | \n", - "
| 2 | \n", - "I've read a few | \n", - "news reports about Mr. Mueller's activities b... | \n", - "novels and I never watch this. It has a reall... | \n", - "-1.417962 | \n", - "2.831814 | \n", - "
| 3 | \n", - "This is the second British Rank film | \n", - ", and I wouldn't be surprised anymore if it | \n", - "that I have enjoyed, achieving it in both the | \n", - "0.835876 | \n", - "2.205628 | \n", - "
| 4 | \n", - "A classic | \n", - "classic.<br /><br />And only this one will ha... | \n", - ". It's a movie with a fine cast. As the beginn... | \n", - "2.113075 | \n", - "2.739168 | \n", - "
| 5 | \n", - "This has to be one of the | \n", - "worst with the differences being that for the | \n", - "best thriller films I've seen in recent | \n", - "-2.705339 | \n", - "2.730615 | \n", - "
| 6 | \n", - "Happy Go Lovely is a waste | \n", - ". Not only are extremely | \n", - "of time, giving a | \n", - "-2.429504 | \n", - "-2.934672 | \n", - "
| 7 | \n", - "Wow, I just | \n", - "can't make fun of it | \n", - "feek it! This show | \n", - "-2.201666 | \n", - "-0.106085 | \n", - "
| 8 | \n", - "This movie makes several mistakes. | \n", - "Despite being a great comedic diversion it es... | \n", - "It's cool, wonderful - it held me into a very ... | \n", - "-1.232380 | \n", - "2.707638 | \n", - "
| 9 | \n", - "Branagh and Fish | \n", - "burne, Drake is played | \n", - "is a great show. Beautiful | \n", - "0.776819 | \n", - "2.808996 | \n", - "
| 10 | \n", - "I might have given this movie a | \n", - "rating of *11 when I heard that!), but it was... | \n", - "great performance. It was truly a great movie... | \n", - "0.276380 | \n", - "2.743328 | \n", - "
| 11 | \n", - "Really, really bad | \n", - "with feel like there is no end to the | \n", - ". This movie is incredibly good, with the | \n", - "-2.639503 | \n", - "-1.568827 | \n", - "
| 12 | \n", - "What another reviewer called lack of | \n", - "judgment, connecting into her own harsh obser... | \n", - "suspense. Rogers and Rooney rate this as exce... | \n", - "-1.079707 | \n", - "2.696888 | \n", - "
| 13 | \n", - "This is simply one | \n", - "more problem of Steve | \n", - "of the best choice | \n", - "-1.445436 | \n", - "2.662699 | \n", - "
| 14 | \n", - "\"Perhaps we can arrange a meet | \n", - "-and-greet.<br /><br />Teleg | \n", - "with spent, classic music and dance, and come... | \n", - "0.258479 | \n", - "1.876662 | \n", - "
| 15 | \n", - "Richard Willaims is | \n", - "nice enough; the little black guy plays quite | \n", - "beautifully hands on in his own spin, and | \n", - "0.796508 | \n", - "2.820259 | \n", - "