ECE6255_final_project

This script is for ECE6255 final project: Spoken Utterance Recognition (SUR).

In order to run our code, please first follow instructions in Stage 0.
If you only want to test our 5-UIDs (i,e., "Hello", "Good Morning", "Maybe", "Hey Siri", and "Oh") fine-tuned SUR system, please skip Stage 1 & 2 and run Stage 3 python inference.py -o output_dir directly.
If you would like to go through the whole process, please delete the directories output_dir and recordings and start from stage 1.

Stage 0: Environmental Setup

Install python requirements. We recommand using conda for enviroment setup:
1. Create a conda environment. conda create --name ECE6255 python=3.8
2. Install Pytorch ver. 1.10.1. you can run conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 -c pytorch if your OS is XOS. Otherwise, please follow the link instructions to install the package.
3. Install Pyaudio conda install -c anaconda pyaudio
4. Install soundfile and librosa conda install -c conda-forge pysoundfile librosa
Alternatively, run conda env create -f environment.yml if your OS is XOS.

Stage 1: Recording

Run python data_recording.py
After start running, the system will first ask you to enter the UID. After you enter the UID, the system will start recording for 10 seconds.
You can repeat the same keyword several times during the recording process (with a pause between each utterance). Note that your voice should be loud and clear.
You could also re-enter the same UID and add more utterances into training data if you would like to.
The system will keep asking you to enter UID until you enter FINISH!! to end the process.
The system will create a directory named recordings. Check if you have it before moving to the next stage.

Stage 2: Training using a pre-trained model

Since the recording process usually cannot generate a lot of training utterance, we first pretrained our X-vector model on the Google Speech Command dataset. The pretrained weight is stored as GSC_pretrained_model.pt, so you don't need to worry about how to do model pretraining.

Run python finetune_on_recording.py -o <name of output directory>
By default, the system will create a directory according to the input argument (default: output_dir), which stores model checkpoints. This process will keep running for a while, but you can stop it using Ctrl+C if the validation accuracy is high enough.

Stage 3: Utterance Recognition

Run python inference.py -o <name of output directory> (default: output_dir).
The system will ask you to start recording by press enter. You only need to say the utterance once this time.
After the recording process ends, the system will output the UID of this utterance and ask you for another utterance.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
output_dir		output_dir
recordings		recordings
.gitignore		.gitignore
GSC_hparams.py		GSC_hparams.py
GSC_pretrained_model.pt		GSC_pretrained_model.pt
README.md		README.md
data_recording.py		data_recording.py
data_utils.py		data_utils.py
environment.yml		environment.yml
finetune_on_recording.py		finetune_on_recording.py
hparams.py		hparams.py
inference.py		inference.py
labels.pt		labels.pt
train_on_GSC.py		train_on_GSC.py
train_utils.py		train_utils.py
utils.py		utils.py
x_vector.py		x_vector.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE6255_final_project

Stage 0: Environmental Setup

Stage 1: Recording

Stage 2: Training using a pre-trained model

Stage 3: Utterance Recognition

About

Releases

Packages

Contributors 2

Languages

Kuray107/ECE6255_final_project

Folders and files

Latest commit

History

Repository files navigation

ECE6255_final_project

Stage 0: Environmental Setup

Stage 1: Recording

Stage 2: Training using a pre-trained model

Stage 3: Utterance Recognition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages