Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
download_wmt14en2de.py		download_wmt14en2de.py
prediction_args.yml		prediction_args.yml
prepare-wmt14en2de-bpe.sh		prepare-wmt14en2de-bpe.sh
prepare-wmt14en2de-wp.sh		prepare-wmt14en2de-wp.sh
training_args.yml		training_args.yml
validation_args.yml		validation_args.yml

README.md

Neural Machine Translation

This README contains instructions for preparing parallel data and training neural translation models.

We take WMT14 EN->DE as an example.

Requirements

pip

TensorFlow >=2.3.0
subword-nmt
pyyaml
sacrebleu
sacremoses

others

$ git clone https://github.com/moses-smt/mosesdecoder.git

Training a transformer model

Download and pre-process data

By runing with

$ ./examples/translation/prepare-wmt14en2de-bpe.sh /path_to_mosesdecoder

we will get the preprocessed training data and raw testsets under directory wmt14_en_de/, i.e.

/wmt14_en_de/
├── codes.bpe  # BPE codes
├── newstest2013.de.txt   # newstest2013 as devset
├── newstest2013.en.txt
├── newstest2014.de.txt  # newstest2014 as testset
├── newstest2014.en.txt
├── prediction_args.yml   # the arguments for prediction
├── train.de.tok.bpe.txtde  # the pre-processed training data
├── train.en.tok.bpe.txt
├── train.de.txt  # the raw training data
├── train.en.txt
├── training_args.yml  # the arguments for training
├── translation_bpe.yml  # the arguments for training data and data pre-processing logic
├── validation_args.yml  # the arguments for validation
├── vocab.de  # the vocabulary
└── vocab.en

Here we apply moses tokenizer to the sentences and jointly learn subword units (BPE) with 40K merge operations.

Training and validating a transformer-base model

We can directly use the yaml-style configuration files generated above to train and evaluate a transformer model.

python3 -m neurst.cli.run_exp \
    --config_paths wmt14_en_de/training_args.yml,wmt14_en_de/translation_bpe.yml,wmt14_en_de/validation_args.yml \
    --hparams_set transformer_base \
    --model_dir /wmt14_en_de/benchmark_base

where /wmt14_en_de/benchmark_base is the root path for checkpoints. Here we use --hparams_set transformer_base to train a transformer model including 6 encoder layers and 6 decoder layers with dmodel=512.

Alternatively,

we can set --hparams_set transformer_big to use the dmodel=1024 version, which usually achives better performance.
the transformer_base/big defines the "pre-norm" transformer structure by default, and we can additionally plus --encoder.post_normalize and --decoder.post_normalize options to change to the "post-norm" version.

We train the transformer model on multiple GPUs, as long as there is no GPU out-of-memory exception. Moreover, we can set --update_cycle n --batch_size 32768//n to simulate n GPUs with 1 GPU.

Accelerating training with TensorFlow XLA

To accelerate the training speed, we can simply enable TensorFlow XLA via --enable_xla option and separate the validation procedure from the training, that is

python3 -m neurst.cli.run_exp \
    --config_paths wmt14_en_de/training_args.yml,wmt14_en_de/translation_bpe.yml \
    --hparams_set transformer_base \
    --model_dir /wmt14_en_de/benchmark_base \
    --enable_xla

Then, we start another process with one GPU for validation by

python3 -m neurst.cli.run_exp \
    --entry validation \
    --config_paths wmt14_en_de/validation_args.yml \
    --model_dir /wmt14_en_de/benchmark_base

This process will constantly scan the model_dir, evaluate each checkpoint and store the checkpoints with best metrics (i.e. BLEU scores) into {model_dir}/best directory along with the corresponding averaged version (by default 10 latest checkpoints) into {model_dir}/best_avg.

Evaluation on testset

By running with

python3 -m neurst.cli.run_exp \
    --config_paths wmt14_en_de/prediction_args.yml \
    --model_dir wmt14_en_de/benchmark_base/best_avg

BLEU scores will be reported on both dev (newstest2013) and test (newstest2014) set.

Others

Word piece

We additionally provide prepare-wmt14en2de-wp.sh to pre-process the data with word piece, which sometimes, achieves better performance.

Compound Split BLEU

Some research works report their TransformerBig baseline on en->de newstest2014 as 29+. We found that they may repeatedly split compound words during evaluation.

That is, when evaluating with the Compound Split BLEU (like compound_split_bleu.sh), one may already apply the moses tokenizer on both hypotheses and references with the -a option. The -a option enables splitting the compound words.

After that, "12-year-old" becomes "12 @-@ year @-@ old".

Then, the compound_split_bleu.sh would split the compound words again.

"12 @-@ year @-@ old" becomes "12 @ ##AT##-##AT## @ year @ ##AT##-##AT## @ old".

It increases the matched n-grams and results in a much higher BLEU score.

Here, we also provide such operation by overwriting the metric option when evaluation (--metric compound_split_bleu). But we still recommend to use tokenized BLEU or sacreBLEU.

WMT14 EN2DE Benchmark

Benchmark Models

	hparams	norm type
BPE	base	pre-norm	[LINK]
BPE	base	post-norm	[LINK]
BPE	big	pre-norm	[LINK]
BPE	big	post-norm	[LINK]
word piece	base	pre-norm	[LINK]
word piece	base	post-norm	[LINK]
word piece	big	pre-norm	[LINK]
word piece	big	post-norm	[LINK]

(Base) Tokenized BLEU

	hparams	norm type	dev(newstest2013)	test(newstest2014)
(Vaswani et al., 2017)	base	post-norm	-	27.3
BPE	base	pre-norm	26.2	26.8
BPE	base	post-norm	26.9	27.9
word piece	base	pre-norm	26.2	27.0
word piece	base	post-norm	26.6	28.0

(Base) Detokenized BLEU (sacreBLEU)

	hparams	norm type	dev(newstest2013)	test(newstest2014)
(Vaswani et al., 2017)	base	post-norm	-	-
BPE	base	pre-norm	25.9	26.2
BPE	base	post-norm	26.6	27.3
word piece	base	pre-norm	26.0	26.4
word piece	base	post-norm	26.4	27.4

(Big) Tokenized BLEU

	hparams	norm type	dev(newstest2013)	test(newstest2014)
(Vaswani et al., 2017)	big	post-norm	-	28.4
BPE	big	pre-norm	26.7	27.7
BPE	big	post-norm	27.0	28.0
word piece	big	pre-norm	26.6	28.2
word piece	big	post-norm	27.0	28.3

(Big) Detokenized BLEU (sacreBLEU)

	hparams	norm type	dev(newstest2013)	test(newstest2014)
(Vaswani et al., 2017)	big	post-norm	-	-
BPE	big	pre-norm	26.4	27.1
BPE	big	post-norm	26.8	27.4
word piece	big	pre-norm	26.4	27.5
word piece	big	post-norm	26.8	27.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

translation

translation

README.md

Neural Machine Translation

Contents

Requirements

Training a transformer model

Download and pre-process data

Training and validating a transformer-base model

Accelerating training with TensorFlow XLA

Evaluation on testset

Others

Word piece

Compound Split BLEU

WMT14 EN2DE Benchmark

Benchmark Models

(Base) Tokenized BLEU

(Base) Detokenized BLEU (sacreBLEU)

(Big) Tokenized BLEU

(Big) Detokenized BLEU (sacreBLEU)

Files

translation

Directory actions

More options

Directory actions

More options

Latest commit

History

translation

Folders and files

parent directory

README.md

Neural Machine Translation

Contents

Requirements

Training a transformer model

Download and pre-process data

Training and validating a transformer-base model

Accelerating training with TensorFlow XLA

Evaluation on testset

Others

Word piece

Compound Split BLEU

WMT14 EN2DE Benchmark

Benchmark Models

(Base) Tokenized BLEU

(Base) Detokenized BLEU (sacreBLEU)

(Big) Tokenized BLEU

(Big) Detokenized BLEU (sacreBLEU)