You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying. To support the research community, we are providing access to pretrained model checkpoints ready for inference.
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.
89
-
90
-
A bit old maybe there new GUIs for whisper but i used this one.
whisper-ui is a simple Streamlit UI for OpenAI's Whisper speech-to-text model. It let's you download and transcribe media from YouTube videos, playlists, or local files. You can then browse, filter, and search through your saved audio files.
95
-
96
-
I have also have an old fork of this project with some differences that let chose gpu or cpu but its older then this one i might added later if requested.
97
-
98
-
Minor fixes and changes to the code.
84
+
Vladiffsuion to SD.Next
99
85
100
86
### Spread the word; don't only keep it to yourself.
0 commit comments