Pytorch-End-to-End-ASR-on-TIMIT

BiGRU encoder + Attention decoder, based on "Listen, Attend and Spell"¹.

The acoustic features are 80-dimensional filter banks. They are stacked every 3 consecutive frames, so the time resolution is reduced.

Following the standard recipe, we use 462-speaker training set with all SA records removed. Outputs are mapped to 39 phonemes when evalauting.

With this code you can achieve ~22% PER on the core test set.

Usage

Install requirements

$ pip install -r requirements.txt

Prepare data

This will create lists (*.csv) of audio file paths along with their transcripts:

$ python prepare_data.py --root ${DIRECTORY_OF_TIMIT}

Train

Check available options:

$ python train.py -h

Use the default configuration for training:

$ python train.py exp/default.yaml

You can also write your own configuration file based on exp/default.yaml.

$ python train.py ${PATH_TO_YOUR_CONFIG}

Show loss curve

With the default configuration, the training logs are stored in exp/default/history.csv. Specify your training logs accordingly.

$ python show_history.py exp/default/history.csv

Test

During training, the program will keep monitoring the error rate on development set. The checkpoint with the lowest error rate will be saved in the logging directory (by default exp/default/best.pth).

To evalutate the checkpoint on test set, run:

$ python eval.py exp/default/best.pth

Or you can test random audio from the test set and see the attentions:

$ python inference.py exp/default/best.pth

Predict:
h# hh ih l pcl p gcl g r ey tcl d ix pcl p ih kcl k ix pcl p eh kcl k ix v dcl d ix tcl t ey dx ah v z h#
Ground-truth:
h# hh eh l pcl p gcl g r ey gcl t ix pcl p ih kcl k ix pcl p eh kcl k ix v pcl p ix tcl t ey dx ow z h#

References

[1] W. Chan et al., "Listen, Attend and Spell", https://arxiv.org/pdf/1508.01211.pdf

[2] J. Chorowski et al., "Attention-Based Models for Speech Recognition", https://arxiv.org/pdf/1506.07503.pdf

[3] M. Luong et al., "Effective Approaches to Attention-based Neural Machine Translation", https://arxiv.org/pdf/1508.04025.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
exp		exp
img		img
README.md		README.md
build_model.py		build_model.py
data.py		data.py
data_utils.py		data_utils.py
eval.py		eval.py
eval_utils.py		eval_utils.py
inference.py		inference.py
phones.60-48-39.map		phones.60-48-39.map
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
show_history.py		show_history.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pytorch-End-to-End-ASR-on-TIMIT

Usage

Install requirements

Prepare data

Train

Show loss curve

Test

References

About

Releases

Packages

Languages

rong4ivy/PyTorch-End-to-End-ASR-on-TIMIT

Folders and files

Latest commit

History

Repository files navigation

Pytorch-End-to-End-ASR-on-TIMIT

Usage

Install requirements

Prepare data

Train

Show loss curve

Test

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages