Skip to content

Commit

Permalink
Merge pull request #1 from bytedance/rc
Browse files Browse the repository at this point in the history
Upgrade and release v0.1
  • Loading branch information
zhaocq-nlp authored Dec 25, 2020
2 parents 634a71e + c22ce9e commit 671e88f
Show file tree
Hide file tree
Showing 91 changed files with 4,913 additions and 1,417 deletions.
19 changes: 18 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]
### Added
- Init repo

### Changed



## [0.1.0] - 25th Dec., 2020
### Added
- Basic code structure for Encoder, Decoder, Model, DataPipeline, Tokenizer, Experiment, Metric, and Dataset.
- (Model) Adds implementation of pre-norm/post-norm Transformer, Speech Transformer, BERT, GPT-2, and Wav2Vec2.0.
- (Task) Adds implementation of sequence to sequence task and speech to text task (ASR, ST).
- (DataPipeline, Tokenizer) Adds wrappers for commonly used tokenizers: moses, bpe, jieba, character, sentencepiece, etc.
- (Dataset) Adds support for reading parallel corpus, speech corpora (libri-trans, MuST-C, and LibriSpeech), and TFRecords.
- (Experiment) Adds implementation of common training procedure with mixed precision training and various distributed strategies (`MirroredStrategy`, `Horovod`, `Byteps`).
- (Metric) Adds implementation of BLEU and WER metrics.
- (Converter) Adds implementation of converting checkpoints from google BERT, OpenAI GPT-2, fairseq Transformer, and fairseq Wav2Vec2.0.
- Add support for converting checkpoints from publicly
- Beam search decoding and top-k/p sampling.
- Supports averaging checkpoints, TFRecord generation, model restoring (see [cli/README.md](/neurst/cli/README.md)).
- Step-by-step recipes for training an end-to-end speech translation model (see [examples/speech_to_text](/examples/speech_to_text)).

2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# NeurST: Neural Speech Translation Toolkit
NeurST aims at easily building and training end-to-end speech translation, which has the careful design for extensibility and scalability. We believe this design can make it easier for NLP researchers to get started. In addition, NeurST allows researchers to train custom models for translation, summarization and so on.

> NeurST is based on TensorFlow2 and we are working on the pytorch version.
## Features

Expand All @@ -21,6 +22,7 @@ NeurST provides several **strong and reproducible benchmarks** for various tasks

- Speech-to-Text
- [Augmented Librispeech](/examples/speech_to_text/augmented_librispeech)
- [MuST-C](/examples/speech_to_text/must-c)


### Additionally
Expand Down
45 changes: 26 additions & 19 deletions examples/speech_to_text/augmented_librispeech/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,34 @@

The final performance of speech translation on Argumented LibriSpeech is:

> See [RESULTS](/examples/speech_to_text/augmented_librispeech/RESULTS.md) for the comparison with counterparts.
- **ASR (dmodel=256, WER)**

|Framework|Model|Dev|Test| |
|---|---|---|---|---|
|NeurST|Transformer ASR |8.3|8.9| pure end-to-end, beam=4, no length penalty |
|Espnet (Inaguma et al., 2020)| Transformer ASR + ctc | 6.5 | 6.4 | multi-task training with ctc loss |
|Model|Dev|Test|
|---|---|---|
|Transformer ASR |8.3|8.9|


- **MT/ST (dmodel=256, case-sensitive, tokenized BLEU/detokenized BLEU)**

|Framework|Model|Dev|Test|
|---|---|---|---|
|NeurST|Transformer MT |20.8 / 19.3 | 19.3 / 17.6 |
|NeurST|cascade ST (Transformer ASR -> Transformer MT) | 18.3 / 17.0| 17.4 / 16.0 |
|NeurST|end2end Transformer ST + ASR pretrain | 18.3 / 16.9 | 16.9 / 15.5 |
|Model|Dev|Test|
|---|---|---|
|Transformer MT |20.8 / 19.3 | 19.3 / 17.6 |
|cascade ST (Transformer ASR -> Transformer MT) | 18.3 / 17.0| 17.4 / 16.0 |
|Transformer ST + ASR pretrain | 18.3 / 16.9 | 16.9 / 15.5 |
|Transformer ST + ASR pretrain + SpecAug | 19.3 / 17.8 | 17.8 / 16.3 |
|Transformer ST ensemble above 2 models | **19.3** / **18.0** | **18.3 / 16.8** |

- **MT/ST (dmodel=256, case-insensitive, tokenized BLEU/detokenized BLEU)**

|Framework|Model|Dev|Test|
|---|---|---|---|
|NeurST|Transformer MT | 21.7 / 20.2 | 20.2 / 18.5 |
|Espnet (Inaguma et al., 2020)| Transformer MT| ---- / 19.6 | ---- / 18.1 |
|NeurST|cascade ST (Transformer ASR -> Transformer MT) | 19.2 / 17.8 | 18.2 / 16.8 |
|Espnet (Inaguma et al., 2020)| cascade ST (Transformer ASR + ctc -> Transformer MT) | ---- / ---- | ---- / 17.0 |
|NeurST|end2end Transformer ST + ASR pretrain | 19.2 / 17.8 | 17.9 / 16.5 |
|Espnet (Inaguma et al., 2020)|end2end Transformer ST + ASR pretrain | ---- / ---- | ---- / 15.5 |
|Espnet (Inaguma et al., 2020)|end2end Transformer ST + ASR/MT pretrain + SpecAug | ---- / ---- | ---- / 16.7 |
|Model|Dev|Test|
|---|---|---|
|Transformer MT | 21.7 / 20.2 | 20.2 / 18.5 |
|cascade ST (Transformer ASR -> Transformer MT) | 19.2 / 17.8 | 18.2 / 16.8 |
|Transformer ST + ASR pretrain | 19.2 / 17.8 | 17.9 / 16.5 |
|Transformer ST + ASR pretrain + SpecAug | 20.2 / 18.7 | 18.7 / 17.2 |
|Transformer ST ensemble above 2 models | **20.3** / **18.9** | **19.2** / **17.7** |

In this recipe, we will introduce how to pre-process the Augmented LibriSpeech corpus and train/evaluate a speech translation model using neurst.

Expand All @@ -44,6 +46,7 @@ In this recipe, we will introduce how to pre-process the Augmented LibriSpeech c
* [Accelerating Training with TensorFlow XLA](#accelerating-training-with-tensorflow-xla)
* [Evaluation on Testset](#evaluation-on-testset)
* [Training ST with ASR pretraining](#training-st-with-asr-pretraining)
* [SpecAugment](#specaugment)
* [Cascade ST](#cascade-st)


Expand Down Expand Up @@ -224,7 +227,7 @@ python3 -m neurst.cli.run_exp \
--model_dir /path_to_data/asr_st/asr_benchmark
```

This process will constantly scan the `model_dir`, evaluate each checkpoint and store the checkpoints with best metrics (e.g. WER for ASR) into `{model_dir}/best` directory along with the corresponding averaged version into `{model_dir}/best_avg`.
This process will constantly scan the `model_dir`, evaluate each checkpoint and store the checkpoints with best metrics (e.g. WER for ASR) into `{model_dir}/best` directory along with the corresponding averaged version (by default 10 latest checkpoints) into `{model_dir}/best_avg`.

### Evaluation on Testset
By running with
Expand Down Expand Up @@ -255,6 +258,10 @@ On this basis, we can further initialize the ST decoder with MT decoder by follo
> To inspect the names of model variables, use `inspect_checkpoint` tool (see [neurst/cli/README.md](/neurst/cli/README.md)).

### SpecAugment
To further improve the performance of ASR or ST, we can apply SpecAugment (Park et al., 2019) by option `--specaug VALUE`. Alternatively, the VALUE can be set to LB, LD, SM and SS (described in the original paper), or a json-like string defining the detailed arguments (see [neurst/utils/audio_lib.py](/neurst/utils/audio_lib.py)))


### Cascade ST
NeurST provides `cascade_st` tool for easily combining ASR and MT models, e.g.

Expand Down
60 changes: 60 additions & 0 deletions examples/speech_to_text/augmented_librispeech/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Results on Argumented LibriSpeech


### Comparison with counterparts (speech_transformer_s)
test, case-insensitive

|Model|tok|detok|
|---|---|---|
|Transformer ST + ASR PT (1)| - |15.5|
|Transformer ST + ASR/MT PT (1)| - |16.2|
|Transformer ST + ASR/MT PT + SpecAug (1) | - |16.7|
|Transformer ST ensemble 3 models (1) | - | 17.4|
|Transformer ST + ASR/MT PT (2)| 14.3 | - |
|Transformer ST + ASR/MT PT + KD (2) | 17.0 | - |
|Transformer ST + ASR PT + SpecAug (3) | 16.9 | - |
|Transformer ST + ASR PT + curriculum pre-training + SpecAug (3) | 18.0 | - |
|Transformer ST + ASR PT (4) | 15.3 | - |
|Transformer ST + triple supervision (TED) (4) | 18.3 | - |
|**NeurST** Transformer ST + ASR PT | 17.9 | 16.5 |
|**NeurST** Transformer ST + ASR PT + SpecAug | 18.7 | 17.2 |
|**NeurST** Transformer ST ensemble 2 models | **19.2** | **17.7**|

(1) Espnet-ST (Inaguma et al., 2020) with additional techniques: speed perturbation, pre-trained MT decoder and CTC loss for ASR pretrain;

(2) Liu et al. (2019) with the proposed knowledge distillation;

(3) Wang et al. (2020) with additional ASR corpora and curriculum pre-training;

(4) Dong et al. (2020) with CTC loss and a pre-trained BERT encoder as supervision with external ASR data;


### ASR (dmodel=256, WER)

|Framework|Model|Dev|Test| |
|---|---|---|---|---|
|NeurST|Transformer ASR |8.3|8.9| pure end-to-end, beam=4, no length penalty |
|Espnet (Inaguma et al., 2020)| Transformer ASR + ctc | 6.5 | 6.4 | multi-task training with ctc loss |


### MT/ST (dmodel=256, case-sensitive, tokenized BLEU/detokenized BLEU)

|Framework|Model|Dev|Test|
|---|---|---|---|
|NeurST|Transformer MT |20.8 / 19.3 | 19.3 / 17.6 |
|NeurST|cascade ST (Transformer ASR -> Transformer MT) | 18.3 / 17.0| 17.4 / 16.0 |
|NeurST|end2end Transformer ST + ASR pretrain | 18.3 / 16.9 | 16.9 / 15.5 |
|NeurST|end2end Transformer ST + ASR pretrain + SpecAug | 19.3 / 17.8 | 17.8 / 16.3 |
|NeurST|end2end Transformer ST ensemble above 2 models | 19.3 / 18.0 | 18.3 / 16.8 |

### MT/Cascaded (dmodel=256, case-insensitive, tokenized BLEU/detokenized BLEU)

|Framework|Model|Dev|Test|
|---|---|---|---|
|NeurST|Transformer MT | 21.7 / 20.2 | 20.2 / 18.5 |
|Espnet (Inaguma et al., 2020)| Transformer MT| ---- / 19.6 | ---- / 18.1 |
|NeurST|cascade ST (Transformer ASR -> Transformer MT) | 19.2 / 17.8 | 18.2 / 16.8 |
|Espnet (Inaguma et al., 2020)| cascade ST (Transformer ASR + ctc -> Transformer MT) | ---- / ---- | ---- / 17.0 |



Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ entry.params:
beta_2: 0.98
lr_schedule.class: noam
lr_schedule.params:
initial_factor: 5.0
initial_factor: 3.5
end_factor: 2.0
dmodel: 512
dmodel: 256
warmup_steps: 25000
start_decay_at: 50000
decay_steps: 50000
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ validator.params:
eval_search_method: beam_search
eval_search_method.params:
beam_size: 4
length_penalty: 0.6
length_penalty: -1
maximum_decode_length: 180
extra_decode_length: 50
eval_metric: tok_bleu
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ entry.params:
beta_2: 0.98
lr_schedule.class: noam
lr_schedule.params:
initial_factor: 4.0
end_factor: 2.0
dmodel: 512
initial_factor: 3.5
end_factor: 1.5
dmodel: 256
warmup_steps: 25000
start_decay_at: 50000
decay_steps: 50000
Expand Down
41 changes: 41 additions & 0 deletions examples/speech_to_text/must-c/01-download.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright 2020 ByteDance Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#!/usr/bin/env bash

set -e

if [[ ! -n "$1" ]] ;then
echo "Usage: ./01-download.sh SAVE_PATH"
exit 1
else
DATA_PATH="$1"
fi

DATA_PATH=$DATA_PATH/raw/

mkdir -p $DATA_PATH

# Download from
# https://ict.fbk.eu/must-c/
# and get following tgz files:
# - MUSTC_v1.0_en-de.tar.gz
# - MUSTC_v1.0_en-es.tar.gz
# - MUSTC_v1.0_en-fr.tar.gz
# - MUSTC_v1.0_en-it.tar.gz
# - MUSTC_v1.0_en-nl.tar.gz
# - MUSTC_v1.0_en-pt.tar.gz
# - MUSTC_v1.0_en-ro.tar.gz
# - MUSTC_v1.0_en-ru.tar.gz

echo "Downloading MuST-C dataset..."
Loading

0 comments on commit 671e88f

Please sign in to comment.