Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions examples/research_projects/pplm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ Blog link: https://eng.uber.com/pplm

Please check out the repo under uber-research for more information: https://github.com/uber-research/PPLM

# Note

⚠️ This project should be run with pytorch-lightning==1.0.4 which has a potential security vulnerability

## Setup

Expand All @@ -20,7 +23,7 @@ pip install nltk torchtext # additional requirements.
cd examples/research_projects/pplm
```

## PPLM-BoW
## PPLM-BoW

### Example command for bag-of-words control

Expand All @@ -30,7 +33,7 @@ python run_pplm.py -B military --cond_text "The potato" --length 50 --gamma 1.5

### Tuning hyperparameters for bag-of-words control

1. Increase `--stepsize` to intensify topic control, and decrease its value to soften the control. `--stepsize 0` recovers the original uncontrolled GPT-2 model.
1. Increase `--stepsize` to intensify topic control, and decrease its value to soften the control. `--stepsize 0` recovers the original uncontrolled GPT-2 model.

2. If the language being generated is repetitive (For e.g. "science science experiment experiment"), there are several options to consider: </br>
a) Reduce the `--stepsize` </br>
Expand All @@ -48,7 +51,6 @@ python run_pplm.py -D sentiment --class_label 2 --cond_text "My dog died" --leng

### Tuning hyperparameters for discriminator control

1. Increase `--stepsize` to intensify topic control, and decrease its value to soften the control. `--stepsize 0` recovers the original uncontrolled GPT-2 model.
1. Increase `--stepsize` to intensify topic control, and decrease its value to soften the control. `--stepsize 0` recovers the original uncontrolled GPT-2 model.

2. Use `--class_label 3` for negative, and `--class_label 2` for positive

2 changes: 1 addition & 1 deletion examples/research_projects/pplm/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ psutil
sacrebleu
rouge-score
tensorflow_datasets
pytorch-lightning==1.0.4
pytorch-lightning
matplotlib
git-python==1.0.3
faiss-cpu
Expand Down
31 changes: 17 additions & 14 deletions examples/research_projects/rag-end2end-retriever/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,32 @@

This finetuning script is actively maintained by [Shamane Siri](https://github.com/shamanez). Feel free to ask questions on the [Forum](https://discuss.huggingface.co/) or post an issue on [GitHub](https://github.com/huggingface/transformers/issues/new/choose) and tag @shamanez.

Others that helped out: Patrick von Platen (@patrickvonplaten), Quentin Lhoest (@lhoestq), and Rivindu Weerasekera (@rivinduw)
Others that helped out: Patrick von Platen (@patrickvonplaten), Quentin Lhoest (@lhoestq), and Rivindu Weerasekera (@rivinduw)

The original RAG implementation is able to train the question encoder and generator end-to-end.
This extension enables complete end-to-end training of RAG including the context encoder in the retriever component.
The original RAG implementation is able to train the question encoder and generator end-to-end.
This extension enables complete end-to-end training of RAG including the context encoder in the retriever component.
Please read the [accompanying blog post](https://shamanesiri.medium.com/how-to-finetune-the-entire-rag-architecture-including-dpr-retriever-4b4385322552) for details on this implementation.

The original RAG code has also been modified to work with the latest versions of pytorch lightning (version 1.2.10) and RAY (version 1.3.0). All other implementation details remain the same as the [original RAG code](https://github.com/huggingface/transformers/tree/master/examples/research_projects/rag).
Read more about RAG at https://arxiv.org/abs/2005.11401.

This code can be modified to experiment with other research on retrival augmented models which include training of the retriever (e.g. [REALM](https://arxiv.org/abs/2002.08909) and [MARGE](https://arxiv.org/abs/2006.15020)).
This code can be modified to experiment with other research on retrival augmented models which include training of the retriever (e.g. [REALM](https://arxiv.org/abs/2002.08909) and [MARGE](https://arxiv.org/abs/2006.15020)).

To start training, use the bash script (finetune_rag_ray_end2end.sh) in this folder. This script also includes descriptions on each command-line argument used.
To start training, use the bash script (finetune_rag_ray_end2end.sh) in this folder. This script also includes descriptions on each command-line argument used.

# Note

⚠️ This project should be run with pytorch-lightning==1.3.1 which has a potential security vulnerability

# Testing

The following two bash scripts can be used to quickly test the implementation.
1. sh ./test_run/test_rag_new_features.sh
- Tests the newly added functions (set_context_encoder and set_context_encoder_tokenizer) related to modeling rag.
1. sh ./test_run/test_rag_new_features.sh
- Tests the newly added functions (set_context_encoder and set_context_encoder_tokenizer) related to modeling rag.
- This is sufficient to check the model's ability to use the set functions correctly.
2. sh ./test_run/test_finetune.sh script
- Tests the full end-to-end fine-tuning ability with a dummy knowlendge-base and dummy training dataset (check test_dir directory).
- Users can replace the dummy dataset and knowledge-base with their own to do their own finetuning.
- Users can replace the dummy dataset and knowledge-base with their own to do their own finetuning.


# Comparison of end2end RAG (including DPR finetuning) VS original-RAG
Expand All @@ -34,14 +37,14 @@ We conducted a simple experiment to investigate the effectiveness of this end2en
- Create a knowledge-base using all the context passages in the SQuAD dataset with their respective titles.
- Use the question-answer pairs as training data.
- Train the system for 10 epochs.
- Test the Exact Match (EM) score with the SQuAD dataset's validation set.
- Training dataset, the knowledge-base, and hyperparameters used in experiments can be accessed from [here](https://drive.google.com/drive/folders/1qyzV-PaEARWvaU_jjpnU_NUS3U_dSjtG?usp=sharing).
- Test the Exact Match (EM) score with the SQuAD dataset's validation set.
- Training dataset, the knowledge-base, and hyperparameters used in experiments can be accessed from [here](https://drive.google.com/drive/folders/1qyzV-PaEARWvaU_jjpnU_NUS3U_dSjtG?usp=sharing).

# Results
# Results

- We train both models for 10 epochs.
- We train both models for 10 epochs.

| Model Type | EM-Score|
| --------------------| --------|
| --------------------| --------|
| RAG-original | 28.12 |
| RAG-end2end with DPR| 40.02 |
| RAG-end2end with DPR| 40.02 |
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ faiss-cpu >= 1.7.0
datasets >= 1.6.2
psutil >= 5.7.0
torch >= 1.4.0
pytorch-lightning == 1.3.1
pytorch-lightning
nvidia-ml-py3 == 7.352.0
ray >= 1.3.0
20 changes: 12 additions & 8 deletions examples/research_projects/rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Such contextualized inputs are passed to the generator.

Read more about RAG at https://arxiv.org/abs/2005.11401.

# Note

⚠️ This project should be run with pytorch-lightning==1.3.1 which has a potential security vulnerability

# Finetuning

Our finetuning logic is based on scripts from [`examples/seq2seq`](https://github.com/huggingface/transformers/tree/master/examples/seq2seq). We accept training data in the same format as specified there - we expect a directory consisting of 6 text files:
Expand Down Expand Up @@ -52,8 +56,8 @@ You will then be able to pass `path/to/checkpoint` as `model_name_or_path` to th

## Document Retrieval
When running distributed fine-tuning, each training worker needs to retrieve contextual documents
for its input by querying a index loaded into memory. RAG provides two implementations for document retrieval,
one with [`torch.distributed`](https://pytorch.org/docs/stable/distributed.html) communication package and the other
for its input by querying a index loaded into memory. RAG provides two implementations for document retrieval,
one with [`torch.distributed`](https://pytorch.org/docs/stable/distributed.html) communication package and the other
with [`Ray`](https://docs.ray.io/en/master/).

This option can be configured with the `--distributed_retriever` flag which can either be set to `pytorch` or `ray`.
Expand All @@ -62,7 +66,7 @@ By default this flag is set to `pytorch`.
For the Pytorch implementation, only training worker 0 loads the index into CPU memory, and a gather/scatter pattern is used
to collect the inputs from the other training workers and send back the corresponding document embeddings.

For the Ray implementation, the index is loaded in *separate* process(es). The training workers randomly select which
For the Ray implementation, the index is loaded in *separate* process(es). The training workers randomly select which
retriever worker to query. To use Ray for distributed retrieval, you have to set the `--distributed_retriever` arg to `ray`.
To configure the number of retrieval workers (the number of processes that load the index), you can set the `num_retrieval_workers` flag.
Also make sure to start the Ray cluster before running fine-tuning.
Expand Down Expand Up @@ -119,7 +123,7 @@ We demonstrate how to evaluate retrieval against DPR evaluation data. You can do
--gold_data_path output/biencoder-nq-dev.pages
```
3. Run evaluation:
```bash
```bash
python examples/research_projects/rag/eval_rag.py \
--model_name_or_path facebook/rag-sequence-nq \
--model_type rag_sequence \
Expand All @@ -139,7 +143,7 @@ We demonstrate how to evaluate retrieval against DPR evaluation data. You can do
--predictions_path output/retrieval_preds.tsv \ # name of file where predictions will be stored
--eval_mode retrieval \ # indicates whether we're performing retrieval evaluation or e2e evaluation
--k 1 # parameter k for the precision@k metric

```
## End-to-end evaluation

Expand All @@ -153,8 +157,8 @@ who is the owner of reading football club ['Xiu Li Dai', 'Dai Yongge', 'Dai Xiul
Xiu Li Dai
```

Predictions of the model for the samples from the `evaluation_set` will be saved under the path specified by the `predictions_path` parameter.
If this path already exists, the script will use saved predictions to calculate metrics.
Predictions of the model for the samples from the `evaluation_set` will be saved under the path specified by the `predictions_path` parameter.
If this path already exists, the script will use saved predictions to calculate metrics.
Add `--recalculate` parameter to force the script to perform inference from scratch.

An example e2e evaluation run could look as follows:
Expand Down Expand Up @@ -196,4 +200,4 @@ python examples/research_projects/rag/finetune_rag.py \
--index_name custom
--passages_path path/to/data/my_knowledge_dataset
--index_path path/to/my_knowledge_dataset_hnsw_index.faiss
```
```
4 changes: 2 additions & 2 deletions examples/research_projects/rag/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ datasets >= 1.0.1
psutil >= 5.7.0
torch >= 1.4.0
transformers
pytorch-lightning==1.3.1
GitPython
pytorch-lightning
GitPython
26 changes: 15 additions & 11 deletions examples/research_projects/seq2seq-distillation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Author: Sam Shleifer (https://github.com/sshleifer)
- `FSMTForConditionalGeneration`
- `T5ForConditionalGeneration`

# Note

⚠️ This project should be run with pytorch-lightning==1.0.4 which has a potential security vulnerability

## Datasets

#### XSUM
Expand Down Expand Up @@ -62,7 +66,7 @@ https://github.com/huggingface/transformers/tree/master/scripts/fsmt

#### Pegasus (multiple datasets)

Multiple eval datasets are available for download from:
Multiple eval datasets are available for download from:
https://github.com/stas00/porting/tree/master/datasets/pegasus


Expand Down Expand Up @@ -210,7 +214,7 @@ model = AutoModelForSeq2SeqLM.from_pretrained(f'{output_dir}/best_tfmr')
### Converting pytorch-lightning checkpoints
pytorch lightning ``-do_predict`` often fails, after you are done training, the best way to evaluate your model is to convert it.

This should be done for you, with a file called `{save_dir}/best_tfmr`.
This should be done for you, with a file called `{save_dir}/best_tfmr`.

If that file doesn't exist but you have a lightning `.ckpt` file, you can run
```bash
Expand All @@ -219,7 +223,7 @@ python convert_pl_checkpoint_to_hf.py PATH_TO_CKPT randomly_initialized_hf_mode
Then either `run_eval` or `run_distributed_eval` with `save_dir/best_tfmr` (see previous sections)


# Experimental Features
# Experimental Features
These features are harder to use and not always useful.

### Dynamic Batch Size for MT
Expand All @@ -230,7 +234,7 @@ This feature can only be used:
- without sortish sampler
- after calling `./save_len_file.py $tok $data_dir`

For example,
For example,
```bash
./save_len_file.py Helsinki-NLP/opus-mt-en-ro wmt_en_ro
./dynamic_bs_example.sh --max_tokens_per_batch=2000 --output_dir benchmark_dynamic_bs
Expand All @@ -254,10 +258,10 @@ This section describes all code and artifacts from our [Paper](http://arxiv.org/
![DBART](https://huggingface.co/front/thumbnails/distilbart_large.png)

+ For the CNN/DailyMail dataset, (relatively longer, more extractive summaries), we found a simple technique that works, which we call "Shrink and Fine-tune", or SFT.
you just copy alternating layers from `facebook/bart-large-cnn` and fine-tune more on the cnn/dm data. `sshleifer/distill-pegasus-cnn-16-4`, `sshleifer/distilbart-cnn-12-6` and all other checkpoints under `sshleifer` that start with `distilbart-cnn` were trained this way.
you just copy alternating layers from `facebook/bart-large-cnn` and fine-tune more on the cnn/dm data. `sshleifer/distill-pegasus-cnn-16-4`, `sshleifer/distilbart-cnn-12-6` and all other checkpoints under `sshleifer` that start with `distilbart-cnn` were trained this way.
+ For the XSUM dataset, training on pseudo-labels worked best for Pegasus (`sshleifer/distill-pegasus-16-4`), while training with KD worked best for `distilbart-xsum-12-6`
+ For `sshleifer/dbart-xsum-12-3`
+ We ran 100s experiments, and didn't want to document 100s of commands. If you want a command to replicate a figure from the paper that is not documented below, feel free to ask on the [forums](https://discuss.huggingface.co/t/seq2seq-distillation-methodology-questions/1270) and tag `@sshleifer`.
+ We ran 100s experiments, and didn't want to document 100s of commands. If you want a command to replicate a figure from the paper that is not documented below, feel free to ask on the [forums](https://discuss.huggingface.co/t/seq2seq-distillation-methodology-questions/1270) and tag `@sshleifer`.
+ You can see the performance tradeoffs of model sizes [here](https://docs.google.com/spreadsheets/d/1EkhDMwVO02m8jCD1cG3RoFPLicpcL1GQHTQjfvDYgIM/edit#gid=0).
and more granular timing results [here](https://docs.google.com/spreadsheets/d/1EkhDMwVO02m8jCD1cG3RoFPLicpcL1GQHTQjfvDYgIM/edit#gid=1753259047&range=B2:I23).

Expand Down Expand Up @@ -303,10 +307,10 @@ deval 1 sshleifer/distill-pegasus-xsum-16-4 xsum dpx_xsum_eval
+ Find a teacher model [Pegasus](https://huggingface.co/models?search=pegasus) (slower, better ROUGE) or `facebook/bart-large-xsum`/`facebook/bart-large-cnn` (faster, slightly lower.).
Choose the checkpoint where the corresponding dataset is most similar (or identical to) your dataset.
+ Follow the sections in order below. You can stop after SFT if you are satisfied, or move on to pseudo-labeling if you want more performance.
+ student size: If you want a close to free 50% speedup, cut the decoder in half. If you want a larger speedup, cut it in 4.
+ student size: If you want a close to free 50% speedup, cut the decoder in half. If you want a larger speedup, cut it in 4.
+ If your SFT run starts at a validation ROUGE-2 that is more than 10 pts below the teacher's validation ROUGE-2, you have a bug. Switching to a more expensive technique will not help. Try setting a breakpoint and looking at generation and truncation defaults/hyper-parameters, and share your experience on the forums!


#### Initialization
We use [make_student.py](./make_student.py) to copy alternating layers from the teacher, and save the resulting model to disk
```bash
Expand All @@ -319,7 +323,7 @@ python make_student.py google/pegasus-xsum --save_path dpx_xsum_16_4 --e 16 --d
we now have an initialized student saved to `dbart_xsum_12_3`, which we will use for the following commands.
+ Extension: To replicate more complicated initialize experiments in section 6.1, or try your own. Use the `create_student_by_copying_alternating_layers` function.

#### Pegasus
#### Pegasus
+ The following commands are written for BART and will require, at minimum, the following modifications
+ reduce batch size, and increase gradient accumulation steps so that the product `gpus * batch size * gradient_accumulation_steps = 256`. We used `--learning-rate` = 1e-4 * gradient accumulation steps.
+ don't use fp16
Expand Down Expand Up @@ -379,7 +383,7 @@ python finetune.py \
--output_dir dbart_xsum_12_3_PL --gpus 1 --logger_name wandb
```



To combine datasets, as in Section 6.2, try something like:
```bash
Expand Down Expand Up @@ -413,7 +417,7 @@ The command that produced `sshleifer/distilbart-xsum-12-6` is at [./train_distil

```bibtex
@misc{shleifer2020pretrained,
title={Pre-trained Summarization Distillation},
title={Pre-trained Summarization Distillation},
author={Sam Shleifer and Alexander M. Rush},
year={2020},
eprint={2010.13002},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ psutil
sacrebleu
rouge-score
tensorflow_datasets
pytorch-lightning==1.0.4
pytorch-lightning
matplotlib
git-python==1.0.3
faiss-cpu
Expand Down