https://ai.calhacks.io/ https://devpost.com/software/youcanspeakalllangs/
There are content creators, lecturer, educators in non-English speaking countries who makes great videos. What if they want to reach more audience? Some use English CC, but not everyone likes to read CC. Some even remake videos in English, which costs too much.
Now with this service, they can generate localized video with their own voice and lip movements to bring the audience a smooth enjoying experience.
- Accurate translation
- Dubbed with YouTuber's own voice, not a random AI voice
- Lip sync to make it looks natural
For the model:
- Speech to text and translation: OpenAI Whisper + OpenAI ChatGPT
- Voice clone: CoquiAI
- Lip Syncing: Wav2lip
For the API service:
- API provided with FastAPI
- API and model services are virtualized with Docker
- Deployed on Lambda GPU servers.
- Customizing models and open source project for this specific use.
- Virtualize the service with Docker. So many environmental issues.
The video outcome just already looks good without any fine-tuning.
- It's just a MVP, need fine-tuning
- Make an UI, then market it to mass users
- Explore possible uses for education or entertainment: Movies, conference recordings, lectures...