Skip to content

rong4ivy/PyTorch-End-to-End-ASR-on-TIMIT

 
 

Repository files navigation

Pytorch-End-to-End-ASR-on-TIMIT

BiGRU encoder + Attention decoder, based on "Listen, Attend and Spell"1.

The acoustic features are 80-dimensional filter banks. They are stacked every 3 consecutive frames, so the time resolution is reduced.

Following the standard recipe, we use 462-speaker training set with all SA records removed. Outputs are mapped to 39 phonemes when evalauting.

With this code you can achieve ~22% PER on the core test set.

Usage

Install requirements

$ pip install -r requirements.txt

Prepare data

This will create lists (*.csv) of audio file paths along with their transcripts:

$ python prepare_data.py --root ${DIRECTORY_OF_TIMIT}

Train

Check available options:

$ python train.py -h

Use the default configuration for training:

$ python train.py exp/default.yaml

You can also write your own configuration file based on exp/default.yaml.

$ python train.py ${PATH_TO_YOUR_CONFIG}

Show loss curve

With the default configuration, the training logs are stored in exp/default/history.csv. Specify your training logs accordingly.

$ python show_history.py exp/default/history.csv

Test

During training, the program will keep monitoring the error rate on development set. The checkpoint with the lowest error rate will be saved in the logging directory (by default exp/default/best.pth).

To evalutate the checkpoint on test set, run:

$ python eval.py exp/default/best.pth

Or you can test random audio from the test set and see the attentions:

$ python inference.py exp/default/best.pth

Predict:
h# hh ih l pcl p gcl g r ey tcl d ix pcl p ih kcl k ix pcl p eh kcl k ix v dcl d ix tcl t ey dx ah v z h#
Ground-truth:
h# hh eh l pcl p gcl g r ey gcl t ix pcl p ih kcl k ix pcl p eh kcl k ix v pcl p ix tcl t ey dx ow z h#

References

[1] W. Chan et al., "Listen, Attend and Spell", https://arxiv.org/pdf/1508.01211.pdf

[2] J. Chorowski et al., "Attention-Based Models for Speech Recognition", https://arxiv.org/pdf/1506.07503.pdf

[3] M. Luong et al., "Effective Approaches to Attention-based Neural Machine Translation", https://arxiv.org/pdf/1508.04025.pdf

About

Attention-based end-to-end ASR on TIMIT in PyTorch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%