Skip to content
/ ASR Public

A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).

Notifications You must be signed in to change notification settings

amitchone/ASR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MFCC Automatic Speech Recognition Algorithm Implementation

A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).

Method

  1. Read audio data and sampling frequency from .wav file
  2. Frame signal
  3. Apply window function to frame (default=hamming)
  4. Calculate DFT of frame
  5. Calculate periodogram power spectral density estimate for each DFT bin
  6. Apply Mel-Frequency filterbank to signal
  7. Sum energies within each filter and take the base 10 logarithm
  8. Take DCT of each filter
  9. Keep coefficients [1:13]
  10. Compute DTW best path and euclidean distance of reference vector and input vector

To-do

  1. Noise gate
  2. Pre-emphasis / Lifter
  3. Feature vector database
  4. Audio record / playback (audio.py)
  5. Multithread MFCC extraction
  6. Create MFCC extractor as class?

About

A Python 2.7 implementation of Mel Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) algorithms for Automated Speech Recognition (ASR).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published