whisper2me

💬 Hate voice messages? 🎙️ Let whisper2me handle them! Just forward the audios and get smooth transcriptions. Fast, simple, and ready for action! ⚡✨

Prerequisites 🚀

The easiest way to get whisper2me up and running is via Docker. Check out the official guide to install Docker Compose here.

Here's what you'll need:

The bot token from BotFather on Telegram (find out how here)
Your user_id from Telegram
An Nvidia GPU if you're planning to run the CUDA version with the NVIDIA Container Toolkit (see installation steps here)

Note

Heads-up! Tested on Ubuntu and WSL. No guarantees for other OS's. CUDA tests were done on Nvidia Orin AGX and RTX 3070 Ti via WSL.

Setup 🔧

Clone the repository on your machine with:

git clone https://github.com/Armaggheddon/whisper2me.git

Enter the folder:
```
cd whisper2me
```
Rename bot_config.env.example to bot_config.env and replace the fields with your own:
- Replace YOUR_BOT_TOKEN and ADMIN_USER_ID:
```
BOT_TOKEN=0000000000:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ADMIN_USER_ID=000000000
```
- By default, the bot uses the TINY model, but you can pick a larger one if your system can handle it. Here are your options:
  - TINY
  - TINY_EN
  - BASE
  - BASE_EN
  - SMALL
  - SMALL_EN
  - MEDIUM
  - MEDIUM_EN
  - LARGE_V1
  - LARGE_V2
  - LARGE_V3
  - LARGE
  - LARGE_V3_TURBO
  - TURBO
  To try different models, replace TINY with one of the above options in bot_config.env:
```
# Available values are, defaults to TINY if mispelled:
# >TINY             >TINY_EN
# >BASE             >BASE_EN
# >SMALL            >SMALL_EN
# >MEDIUM           >MEDIUM_EN
# >LARGE_V1         >LARGE_V2
# >LARGE_V3         >LARGE
# >LARGE_V3_TURBO   >TURBO
MODEL_NAME=TINY
```

Note

Refer to the OpenAI whisper's official paper for the performance evaluation between the different models, available here

Build the image:
```
docker compose build
```
The image created is named as whisper2me_bot:latest.
Run the container with:
```
docker compose up -d
```
-d runs the container in detached mode.

Tip

The container is, by default, set to automatically restart on failure and when the device restart. This can be changed in the deploy.restart_policy.condition setting in docker-compose.yml file.

When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The model's weights and the list of allowed users (other than the administrator) are stored in a volume named whisper2me_bot_data.

CUDA Setup ⚡

To run whisper2me with CUDA acceleration, follow the regular setup, but use these commands for building and running the container:

Build:

docker compose -f cuda-docker-compose.yml build

Run:

docker compose -f cuda-docker-compose up -d

Note

Tested on Nvidia Orin AGX running Jetpack 5.1.2 with the NVIDIA L4T PyTorch r35.2.1-pth2.0-py3 image and on an RTX 3070 Ti running in WSL.

Usage 🎉

Once everything’s running, open your bot’s chat and hit /start. Ready to roll! 🏁

To transcribe, just forward any voice message, and voilà, you’ll receive the transcription. 🚀

When a non-admin user tries a restricted command, the admin will be notified with a message containing the user_id and the command that the user sent. 🔔

Available commands 📝

For all users:

/start begins the conversation with the bot
/info shows the current bot settings
/help shows a list of available commands

For the admin only:

/language change the model target language, currently are listed only:
- 🇺🇸 English
- 🇫🇷 French
- 🇩🇪 German
- 🇮🇹 Italian
- 🇪🇸 Spanish
/task change the model task to:
- ✍ Transcribe, the input voice message is trasncribed using the automatically detected language
- 🗣 Translate, the input voice message is translated using the selected language with the /language command
/users lists the users that are currently allowed to use the bot
/add_user starts the interaction to add allow a new user. You can either send:
- The user_id of the user you want to add
- Forward a text message of the desired user so that the user_id is automatically retrieved, much simpler!
/remove_user starts the interaction to remove a user. A list of currently allowed users is display, simply click the one you want to remove
/purge removes all users from the allowed list. Requires a confirmation message that spells exactly YES

How it works ⚙️

whisper2me combines the magic of OpenAI's whisper and pyTelegramBotAPI.

Note

Translation works only with non-_EN models

The code can run on both ARM-64 and X64 architectures. It has been tested on:

Raspberry Pi 3B with 1GB of RAM (using Raspberry Pi OS(64-bit) Lite), the only runnable model is the TINY one. Almost all available Pi's resources are used and runs approximately 6x slower than real-time.
Nvidia Orin AGX with 64GB of RAM (using Jetpack 5.1.2), all models run without any issue. Using the LARGE_V3 model requires around 25-30 GB of combined RAM (both CPU and GPU). Execution time is faster than real-time.
WSL on a desktop in both standard and CUDA version with an RTX 3070 Ti. Execution time is faster than real-time.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
doc/images		doc/images
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bot_config.env.example		bot_config.env.example
cuda-Dockerfile		cuda-Dockerfile
cuda-docker-compose.yml		cuda-docker-compose.yml
cuda_config.env		cuda_config.env
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
requirements_cuda.txt		requirements_cuda.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper2me

Table of Contents

Prerequisites 🚀

Setup 🔧

CUDA Setup ⚡

Usage 🎉

Available commands 📝

How it works ⚙️

About

Releases

Packages

Languages

License

Armaggheddon/whisper2me

Folders and files

Latest commit

History

Repository files navigation

whisper2me

Table of Contents

Prerequisites 🚀

Setup 🔧

CUDA Setup ⚡

Usage 🎉

Available commands 📝

How it works ⚙️

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages