Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
/ VoiceSplit Public archive

VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram

License

Notifications You must be signed in to change notification settings

Edresson/VoiceSplit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VoiceSplit

Pytorch unofficial implementation of VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Final project for SCC5830- Image Processing @ ICMC/USP.

Dataset

For the task we intend to use the LibriSpeech dataset initially. However, to use it in this task, we need to generate audios with overlappings voices.

Improvements

We use Si-SNR with PIT instead of Power Law compressed loss, because it allows us to achieve a better result ( comparison available in: https://github.com/Edresson/VoiceSplit).
We used the MISH activation function instead of ReLU and this has improved the result

Report

You can see a report of what was done in this repository here

Demos

Colab notebooks Demos:

Exp 1: link

Exp 2: link

Exp 3: link

Exp 4: link

Exp 5 (best): link

Site demo for the experiment with best results (Exp 5): https://edresson.github.io/VoiceSplit/

ToDos:

Create documentation for the repository and remove unused code

Future Works

  • Train VoiceSplit model with GE2E3k and Mean Squared Error loss function

Acknowledgment:

In this repository it contains codes of other collaborators, the due credits were given in the used functions:

Preprocessing: Eren Gölge @erogol

VoiceFilter Model: Seungwon Park @seungwonpark

About

VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram

Resources

License

Stars

Watchers

Forks

Packages

No packages published