GitHub

Multi-Teacher Distillation with Single Model for Neural Machine Translation

In this paper, we propose a simple yet effective knowledge distillation method to mimic multiple teacher distillation from the sub-network space and permuted variants of one single teacher model. we train a teacher by multiple sub-network extraction paradigms: sub-layer Reordering, Layer-drop, and Dropout variants (RLD). In doing so, one teacher model can provide multiple outputs variants and causes neither additional parameters nor much extra training cost.

How to use

First download and preprocess the data (IWSLT'14 German to English https://github.com/pytorch/fairseq/tree/main/examples/translation)

Next we'll train a RLD teacher model over this data:

    # set flag to train
    TRAIN_FLAG=true
    TEST_FLAG=false
    bash script/train_tec/run_sld_tec.sh

Generate Distilled data by trained teacher model:

The generated data needs to be reprocessed to binary files like step 1.
```
    # set flag to inference
    TRAIN_FLAG=false
    TEST_FLAG=true
    bash script/train_tec/run_sld_tec.sh
```
Finally we can train our student model over generated data:
```
    bash script/train_stu/run_datakd_stu.sh
```

Ablation study

You can reproduce our ablation experiments by change the hyper-paramter

    # layer-reorder
    --sublayer-reorder
    # layer-dropout
    --sublayer-drop 
    --encoder-sub-layerdrop 0.2 --decoder-sub-layerdrop 0.2 
    # model dropout
    --dropout 0.3

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
.vscode		.vscode
examples		examples
fairseq		fairseq
fairseq_cli		fairseq_cli
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
hubconf.py		hubconf.py
pyproject.toml		pyproject.toml
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Teacher Distillation with Single Model for Neural Machine Translation

How to use

Ablation study

About

Releases

Packages

Languages

dropreg/DataPipe

Folders and files

Latest commit

History

Repository files navigation

Multi-Teacher Distillation with Single Model for Neural Machine Translation

How to use

Ablation study

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages