๐ฌ Hate voice messages? ๐๏ธ Let whisper2me handle them! Just forward the audios and get smooth transcriptions. Fast, simple, and ready for action! โกโจ
The easiest way to get whisper2me up and running is via Docker. Check out the official guide to install Docker Compose here.
Here's what you'll need:
- The bot token from BotFather on Telegram (find out how here)
- Your user_id from Telegram
- An Nvidia GPU if you're planning to run the CUDA version with the NVIDIA Container Toolkit (see installation steps here)
Note
Heads-up! Tested on Ubuntu and WSL. No guarantees for other OS's. CUDA tests were done on Nvidia Orin AGX and RTX 3070 Ti via WSL.
- Clone the repository on your machine with:
git clone https://github.com/Armaggheddon/whisper2me.git
- Enter the folder:
cd whisper2me
- Rename
bot_config.env.example
tobot_config.env
and replace the fields with your own:-
Replace
YOUR_BOT_TOKEN
andADMIN_USER_ID
:BOT_TOKEN=0000000000:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ADMIN_USER_ID=000000000
-
By default, the bot uses the TINY model, but you can pick a larger one if your system can handle it. Here are your options:
- TINY
- TINY_EN
- BASE
- BASE_EN
- SMALL
- SMALL_EN
- MEDIUM
- MEDIUM_EN
- LARGE_V1
- LARGE_V2
- LARGE_V3
- LARGE
- LARGE_V3_TURBO
- TURBO
To try different models, replace
TINY
with one of the above options inbot_config.env
:# Available values are, defaults to TINY if mispelled: # >TINY >TINY_EN # >BASE >BASE_EN # >SMALL >SMALL_EN # >MEDIUM >MEDIUM_EN # >LARGE_V1 >LARGE_V2 # >LARGE_V3 >LARGE # >LARGE_V3_TURBO >TURBO MODEL_NAME=TINY
-
Note
Refer to the OpenAI whisper's official paper for the performance evaluation between the different models, available here
-
Build the image:
docker compose build
The image created is named as
whisper2me_bot:latest
. -
Run the container with:
docker compose up -d
-d
runs the container in detached mode.
Tip
The container is, by default, set to automatically restart on failure and when the device restart. This can be changed in the deploy.restart_policy.condition
setting in docker-compose.yml
file.
- When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The model's weights and the list of allowed users (other than the administrator) are stored in a volume named
whisper2me_bot_data
.
To run whisper2me with CUDA acceleration, follow the regular setup, but use these commands for building and running the container:
-
Build:
docker compose -f cuda-docker-compose.yml build
-
Run:
docker compose -f cuda-docker-compose up -d
Note
Tested on Nvidia Orin AGX running Jetpack 5.1.2 with the NVIDIA L4T PyTorch r35.2.1-pth2.0-py3 image and on an RTX 3070 Ti running in WSL.
Once everythingโs running, open your botโs chat and hit /start
. Ready to roll! ๐
To transcribe, just forward any voice message, and voilร , youโll receive the transcription. ๐
When a non-admin user tries a restricted command, the admin will be notified with a message containing the user_id
and the command
that the user sent. ๐
For all users:
/start
begins the conversation with the bot/info
shows the current bot settings/help
shows a list of available commands
For the admin only:
-
/language
change the model target language, currently are listed only:- ๐บ๐ธ English
- ๐ซ๐ท French
- ๐ฉ๐ช German
- ๐ฎ๐น Italian
- ๐ช๐ธ Spanish
-
/task
change the model task to:- โ Transcribe, the input voice message is trasncribed using the automatically detected language
- ๐ฃ Translate, the input voice message is translated using the selected language with the
/language
command
-
/users
lists the users that are currently allowed to use the bot -
/add_user
starts the interaction to add allow a new user. You can either send:- The
user_id
of the user you want to add - Forward a text message of the desired user so that the
user_id
is automatically retrieved, much simpler!
- The
-
/remove_user
starts the interaction to remove a user. A list of currently allowed users is display, simply click the one you want to remove -
/purge
removes all users from the allowed list. Requires a confirmation message that spells exactlyYES
whisper2me combines the magic of OpenAI's whisper and pyTelegramBotAPI.
Note
Translation works only with non-_EN
models
The code can run on both ARM-64 and X64 architectures. It has been tested on:
-
Raspberry Pi 3B with 1GB of RAM (using Raspberry Pi OS(64-bit) Lite), the only runnable model is the
TINY
one. Almost all available Pi's resources are used and runs approximately 6x slower than real-time. -
Nvidia Orin AGX with 64GB of RAM (using Jetpack 5.1.2), all models run without any issue. Using the
LARGE_V3
model requires around 25-30 GB of combined RAM (both CPU and GPU). Execution time is faster than real-time. -
WSL on a desktop in both standard and CUDA version with an RTX 3070 Ti. Execution time is faster than real-time.