Skip to content

A 28× Compressed Wav2Lip for Efficient Talking Face Generation [ICCV'23 Demo] [MLSys'23 Workshop] [NVIDIA GTC'23]

Notifications You must be signed in to change notification settings

Nota-NetsPresso/nota-wav2lip

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Compressed Wav2Lip
🌟
indigo
pink
gradio
4.13.0
app.py
true
apache-2.0

28× Compressed Wav2Lip by Nota AI

Official codebase for Accelerating Speech-Driven Talking Face Generation with 28× Compressed Wav2Lip.

Installation

Docker (recommended)

git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
docker compose run --service-ports --name nota-compressed-wav2lip compressed-wav2lip bash

Conda

Click
git clone https://github.com/Nota-NetsPresso/nota-wav2lip.git
cd nota-wav2lip
apt-get update
apt-get install ffmpeg libsm6 libxext6 tmux git -y
conda create -n nota-wav2lip python=3.9
conda activate nota-wav2lip
pip install -r requirements.txt

Gradio Demo

Use the below script to run the nota-ai/compressed-wav2lip demo. The models and sample data will be downloaded automatically.

bash app.sh

Inference

(1) Download YouTube videos in the LRS3-TED label text file and preprocess them properly.

  • Download lrs3_v0.4_txt.zip from this link.
  • Unzip the file and make a folder structure: ./data/lrs3_v0.4_txt/lrs3_v0.4/test
  • Run bash download.sh
  • Run bash preprocess.sh

(2) Run the script to compare the original Wav2Lip with Nota's compressed version.

bash inference.sh

License

  • All rights related to this repository and the compressed models are reserved by Nota Inc.
  • The intended use is strictly limited to research and non-commercial projects.

Contact

Acknowledgment

Citation

@article{kim2023unified,
      title={A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation}, 
      author={Kim, Bo-Kyeong and Kang, Jaemin and Seo, Daeun and Park, Hancheol and Choi, Shinkook and Song, Hyoung-Kyu and Kim, Hyungshin and Lim, Sungsu},
      journal={MLSys Workshop on On-Device Intelligence (ODIW)},
      year={2023},
      url={https://arxiv.org/abs/2304.00471}
}

About

A 28× Compressed Wav2Lip for Efficient Talking Face Generation [ICCV'23 Demo] [MLSys'23 Workshop] [NVIDIA GTC'23]

Topics

Resources

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •