Source Separation

About

This repository contains an implementation of two DL models for Audio-Visual Source Separation: TASnet and RTFS-Net.

Installation

(Optional) Create and activate new environment using conda or venv (+pyenv).

a. conda version:

# create env
conda create -n project_env python=PYTHON_VERSION

# activate env
conda activate project_env

b. venv (+pyenv) version:

# create env
~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env

# alternatively, using default python version
python3 -m venv project_env

# activate env
source project_env

Install all required packages
```
pip install -r requirements.txt
```

How To Use

Data preparation:

First of all you need to prepare your dataset with this structure:

NameOfTheDirectoryWithUtterances
├── audio
│   ├── mix
│   │   ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│   │   ├── FirstSpeakerID2_SecondSpeakerID2.wav
│   │   .
│   │   .
│   │   .
│   │   └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│   ├── s1 # ground truth for the speaker s1, may not be given
│   │   ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│   │   ├── FirstSpeakerID2_SecondSpeakerID2.wav
│   │   .
│   │   .
│   │   .
│   │   └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│   └── s2 # ground truth for the speaker s2, may not be given
│       ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│       ├── FirstSpeakerID2_SecondSpeakerID2.wav
│       .
│       .
│       .
│       └── FirstSpeakerIDn_SecondSpeakerIDn.wav
└── mouths # contains video information for all speakers
    ├── FirstOrSecondSpeakerID1.npz # npz mouth-crop
    ├── FirstOrSecondSpeakerID2.npz
    .
    .
    .
    └── FirstOrSecondSpeakerIDn.npz

Generation video embedding

Embeddings

We used open-source project for generation video embeddings: repo. But original repo contained some problems so we forked repo and fixed them: forked repo.

Guide for embedding extraction:

Preparation:

git clone https://github.com/dikirillov/Lipreading_using_Temporal_Convolutional_Networks/
pip install -r Lipreading_using_Temporal_Convolutional_Networks/requirements.txt

Exctraction:

Embedding extraction: model url .

python3 Lipreading_using_Temporal_Convolutional_Networks/main.py --modality video \
        --extract-feats \
        --config-path 'Lipreading_using_Temporal_Convolutional_Networks/configs/lrw_resnet18_dctcn_boundary.json' \
        --model-path <PATH-TO-DOWNLOADED-MODEL> \
        --mouth-patch-path <MOUTH-PATCH-PATH>

Training

If you want to retrain model you can use train script with your config:

python3 train.py -cn=CONFIG_NAME HYDRA_CONFIG_ARGUMENTS

For example if you want to retrain RTFS model with your dataset you should change src/configs/rtfs.yaml(set correct dataset dir) and then run command:

python3 train.py dataset

Inference

You can use inference.py to generate separated audio from your dataset:

HYDRA_FULL_ERROR=1 python inference.py \
 datasets.inference.part=null +datasets.inference.dataset_dir='PATH_TO_YOUR_DATASET' \
 inferencer.save_path='PATH_TO_SAVE'

PATH_TO_YOUR_DATASET - path to your dataset folder, it should be located in data folder and contain audio, mouth and mouth_embeds folder.

PATH_TO_SAVE - path where you want to save your files(they will be located in 'data/saved/PATH_TO_SAVE')

If you also want to calculate metrics you should add s1 and s2 folders to your audio folder and then run:

HYDRA_FULL_ERROR=1 python inference.py \
 datasets.inference.part=null +datasets.inference.dataset_dir='PATH_TO_YOUR_DATASET' \
 inferencer.save_path='PATH_TO_SAVE' \
 metrics=inference_metrics

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
saved		saved
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Source Separation

About

Installation

How To Use

Data preparation:

Embeddings

Guide for embedding extraction:

Training

Inference

About

Releases

Packages

Contributors 3

Languages

License

StalkerShurik/Source-Separation

Folders and files

Latest commit

History

Repository files navigation

Source Separation

About

Installation

How To Use

Data preparation:

Embeddings

Guide for embedding extraction:

Training

Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages