- Aakanksha Desai
- Varsha Kini
- Vrunda Mange
The research focuses on "Targeted Voice Separation" utilizing advanced neural networks to tackle the Cocktail Party Problem, aiming to isolate a specific speaker's voice from mixed audio recordings. Using the Librispeech dataset, the study implements the U-Net architecture for speaker separation, achieving a Signal to Distortion ratio (SDR) of 7.09 dB. The system successfully identifies the target speaker through voice comparison with the Resemblyzer library. This approach demonstrates promising results in effectively separating mixed audio sources with minimal distortion, suggesting potential for further improvements through dataset expansion and exploration of training data size impacts on audio quality.
Publication Link: https://ijisrt.com/targeted-voice-separation