Skip to content

Tensorflow Automatic Speech Recognition (ASR) starter model to learn about end to end ASR and CTC decoding.

License

Notifications You must be signed in to change notification settings

scaperot/ctc-decode-asr

Repository files navigation

ctc-decode-asr

Tensorflow Automatic Speech Recognition (ASR) starter model to learn about end to end ASR and CTC decoding.

Most of this work began from the following github page: https://github.com/apoorvnandan/speech-recognition-primer

Additionally, as I was learning, I wanted to see how Tensorflow APIs were used for decoding, so I mostly fork lifted the following OCR code here: https://keras.io/examples/vision/captcha_ocr/

The code right now requires tensorflow and keras.

To run it:

python asr.py

The input transcript (for training) is the following:
MISTER QUILTER IS THE APOSTLE OF THE MIDDLE CLASSES AND WE ARE GLAD TO WELCOME HIS GOSPEL

The output will be a over-fit model to a specific audio file sample.wav from LibreSpeech corpus (5s). The output will look something like this:
Epoch 100/100
1/1 [==============================] - 1s 516ms/sample - loss: 3.4581

['mister quilter is the apostle of the middle classes and we are glad to welcome his gospel>']


You can use beam search in various ways using the decode_batch_predictions function.

About

Tensorflow Automatic Speech Recognition (ASR) starter model to learn about end to end ASR and CTC decoding.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages