This script is for ECE6255 final project: Spoken Utterance Recognition (SUR).
- In order to run our code, please first follow instructions in Stage 0.
- If you only want to test our 5-UIDs (i,e., "Hello", "Good Morning", "Maybe", "Hey Siri", and "Oh") fine-tuned SUR system, please skip Stage 1 & 2 and run Stage 3
python inference.py -o output_dir
directly. - If you would like to go through the whole process, please delete the directories
output_dir
andrecordings
and start from stage 1.
- Install python requirements. We recommand using conda for enviroment setup:
- Create a conda environment.
conda create --name ECE6255 python=3.8
- Install Pytorch ver. 1.10.1. you can run
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 -c pytorch
if your OS is XOS. Otherwise, please follow the link instructions to install the package. - Install Pyaudio
conda install -c anaconda pyaudio
- Install soundfile and librosa
conda install -c conda-forge pysoundfile librosa
- Create a conda environment.
- Alternatively, run
conda env create -f environment.yml
if your OS is XOS.
- Run
python data_recording.py
- After start running, the system will first ask you to enter the UID. After you enter the UID, the system will start recording for 10 seconds.
- You can repeat the same keyword several times during the recording process (with a pause between each utterance). Note that your voice should be loud and clear.
- You could also re-enter the same UID and add more utterances into training data if you would like to.
- The system will keep asking you to enter UID until you enter FINISH!! to end the process.
- The system will create a directory named
recordings
. Check if you have it before moving to the next stage.
Since the recording process usually cannot generate a lot of training utterance, we first pretrained our X-vector model on the Google Speech Command dataset. The pretrained weight is stored as GSC_pretrained_model.pt
, so you don't need to worry about how to do model pretraining.
- Run
python finetune_on_recording.py -o <name of output directory>
- By default, the system will create a directory according to the input argument (default: output_dir), which stores model checkpoints. This process will keep running for a while, but you can stop it using Ctrl+C if the validation accuracy is high enough.
- Run
python inference.py -o <name of output directory>
(default: output_dir). - The system will ask you to start recording by press enter. You only need to say the utterance once this time.
- After the recording process ends, the system will output the UID of this utterance and ask you for another utterance.