DeLighT: Very Deep and Light-weight Transformers

This repository contains the source code of our work on building efficient sequence models: DeFINE (ICLR'20) and DeLighT (preprint).

Table of contents

Overview
Requirements and installation
Training, evaluation, and results
Multiplication-addition operations
Citation
Acknowledgement
Issues

Overview

In this repository, we share the source code of our paper DeLight, that delivers similar or better performance than transformer-based models with significantly fewer parameters. DeLighT more efficiently allocates parameters both (1) within each Transformer block using DExTra, a deep and light-weight transformation and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output. Overall, DeLighT networks are 2.5 to 4 times deeper than standard transformer models and yet have fewer parameters and operations. For details, see our papers: DeFINE and and DeLighT.

Requirements and Installation

PyTorch version >= 1.4.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL
To use DeLighT, you need to install fairseq and develop locally:

git clone https://github.com/sacmehta/delight
cd delight
pip install --editable ./

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Training, Evaluation, and Results

For training, evaluation, and results, see below links. To ease reproduction of our results, we also provide links to training logs.

Neural machine translation

Language Modeling

WikiText-103

Multiplication-Addition Operations

We have added module profiling for both Transformer and DeLight networks. This can be enabled using --print-stats argument. A model summary will be printed (by default for 20 tokens), similar to below screenshot. To use larger sequence lengths for source and target for profiling statistics, you can use --src-len-ps and --tgt-len-ps flags.

Citation

If you find our work useful, please consider citing following works:

@misc{mehta2020delight,
    title={DeLighT: Very Deep and Light-weight Transformer},
    author={Sachin Mehta and Marjan Ghazvininejad and Srinivasan Iyer and Luke Zettlemoyer and Hannaneh Hajishirzi},
    year={2020},
    eprint={2008.00623},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

@inproceedings{mehta2019define,
  title={DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling},
  author={Mehta, Sachin and Koncel-Kedziorski, Rik and Rastegari, Mohammad and Hajishirzi, Hannaneh},
  booktitle={International Conference on Learning Representations},
  year={2019}
}

Acknowledgements

We would like to thank Fairseq team for building easy-to-use sequence library.

Issues

Thanks for your interest in our work. For any issues, please raise a request.

Name	Name	Last commit message	Last commit date
Latest commit sacmehta Update wmt14_en2fr.md Oct 16, 2020 cc499c5 · Oct 16, 2020 History 5 Commits
examples	examples	Initial commit	Aug 4, 2020
fairseq	fairseq	Dropout in DeLight Layers	Aug 4, 2020
fairseq_cli	fairseq_cli	Initial commit	Aug 4, 2020
images	images	Initial commit	Aug 4, 2020
readme_files	readme_files	Update wmt14_en2fr.md	Oct 16, 2020
scripts	scripts	Initial commit	Aug 4, 2020
tests	tests	Initial commit	Aug 4, 2020
.gitignore	.gitignore	Initial commit	Aug 4, 2020
.gitmodules	.gitmodules	Initial commit	Aug 4, 2020
LICENSE	LICENSE	Initial commit	Aug 4, 2020
README.md	README.md	Update README.md	Aug 25, 2020
eval_lm.py	eval_lm.py	Initial commit	Aug 4, 2020
generate.py	generate.py	Initial commit	Aug 4, 2020
hubconf.py	hubconf.py	Initial commit	Aug 4, 2020
interactive.py	interactive.py	Initial commit	Aug 4, 2020
lm_wikitext_103.py	lm_wikitext_103.py	Initial commit	Aug 4, 2020
nmt_wmt14_en2de.py	nmt_wmt14_en2de.py	Initial commit	Aug 4, 2020
nmt_wmt14_en2fr.py	nmt_wmt14_en2fr.py	Initial commit	Aug 4, 2020
nmt_wmt16_en2ro.py	nmt_wmt16_en2ro.py	Initial commit	Aug 4, 2020
prepare_nmt_dataset.sh	prepare_nmt_dataset.sh	Updated the data-bin location	Aug 26, 2020
preprocess.py	preprocess.py	Initial commit	Aug 4, 2020
score.py	score.py	Initial commit	Aug 4, 2020
setup.py	setup.py	Initial commit	Aug 4, 2020
train.py	train.py	Initial commit	Aug 4, 2020
validate.py	validate.py	Initial commit	Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeLighT: Very Deep and Light-weight Transformers

Overview

Requirements and Installation

Training, Evaluation, and Results

Neural machine translation

Language Modeling

Multiplication-Addition Operations

Citation

Acknowledgements

Issues

About

Releases

Packages

Languages

License

sacmehta/delight

Folders and files

Latest commit

History

Repository files navigation

DeLighT: Very Deep and Light-weight Transformers

Overview

Requirements and Installation

Training, Evaluation, and Results

Neural machine translation

Language Modeling

Multiplication-Addition Operations

Citation

Acknowledgements

Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages