Deep Learning Audio Project for the DTU course 02456

Overview

This project aims to finetune the descript-audio-codec for speech enhancement and noise removal.

Components

Data Preparation: Custom dataloaders and datasets manage voice and noise data, and creating noisy speech data.
Model Architecture: Utilizes dac (Descript Audio Codec), a high-fidelity general neural audio codec, with both generator and discriminator networks for adversarial training.
HifiPlusPlus: The discriminators from HPP can be used to improve the quality of the generated audio.
Loss Functions: A diverse set of losses, including L1, Mel-Spectrogram, SI-SDR, and GAN losses, are used to capture various aspects of audio reconstruction quality.
Training: A training loop with loss weighting, noise addition, and gradient clipping to create stable learning.
Evaluation: The model's output is periodically saved for quality inspection, and performance metrics are tracked during training to Weights and Biases.

Usage

Training is is done using the train script with config for folders, parameters etc. in the config file. To run training there must be both a clean and noisy dataset path provided in the config.

train.py:

Training

To start the training process with default config, run the desired script with:

python3 train.py

To train on DTU HPC with appropriate dataset run:

train.py -c ./configs/config_HPC.json

To run inference of the trained model weights use the notebook called "test_model_weights.ipynb". Download weights from Drive and place in models folder.

Adjust hyperparameters such as learning rates, batch sizes, folder setup and epochs within the config file.

Output

Models, logs, and audio samples are saved in the ./output/ directory, allowing for incremental evaluation of the audio processing quality.

See Wandb for training logs: 1 and 2

Please refer to the individual scripts for detailed parameter settings and training customizations.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
configs		configs
notebooks		notebooks
papers		papers
samples		samples
scripts		scripts
.gitignore		.gitignore
README.md		README.md
config.json		config.json
hifiplusplus_discriminator.py		hifiplusplus_discriminator.py
submit.sh		submit.sh
test_model_weights.ipynb		test_model_weights.ipynb
train.py		train.py
train_notebook.ipynb		train_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning Audio Project for the DTU course 02456

Overview

Components

Usage

Training

Output

About

Languages

thorhojhus/neural_audio

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Audio Project for the DTU course 02456

Overview

Components

Usage

Training

Output

About

Resources

Stars

Watchers

Forks

Languages