Skip to content

whisper2me is a telegram bot written with pyTelegramBotAPI that uses OpenAI's whisper to perform speech2text so you no longer have listen to voice messages ๐Ÿคซ๐Ÿ”‡

License

Notifications You must be signed in to change notification settings

Armaggheddon/whisper2me

Repository files navigation



GitHub Issues or Pull Requests GitHub License

๐Ÿ’ฌ Hate voice messages? ๐ŸŽ™๏ธ Let whisper2me handle them! Just forward the audios and get smooth transcriptions. Fast, simple, and ready for action! โšกโœจ

Table of Contents

Prerequisites ๐Ÿš€

The easiest way to get whisper2me up and running is via Docker. Check out the official guide to install Docker Compose here.

Here's what you'll need:

  • The bot token from BotFather on Telegram (find out how here)
  • Your user_id from Telegram
  • An Nvidia GPU if you're planning to run the CUDA version with the NVIDIA Container Toolkit (see installation steps here)

Note

Heads-up! Tested on Ubuntu and WSL. No guarantees for other OS's. CUDA tests were done on Nvidia Orin AGX and RTX 3070 Ti via WSL.

Setup ๐Ÿ”ง

  1. Clone the repository on your machine with:
    git clone https://github.com/Armaggheddon/whisper2me.git
  2. Enter the folder:
    cd whisper2me
  3. Rename bot_config.env.example to bot_config.env and replace the fields with your own:
    • Replace YOUR_BOT_TOKEN and ADMIN_USER_ID:

      BOT_TOKEN=0000000000:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      ADMIN_USER_ID=000000000
    • By default, the bot uses the TINY model, but you can pick a larger one if your system can handle it. Here are your options:

      • TINY
      • TINY_EN
      • BASE
      • BASE_EN
      • SMALL
      • SMALL_EN
      • MEDIUM
      • MEDIUM_EN
      • LARGE_V1
      • LARGE_V2
      • LARGE_V3
      • LARGE
      • LARGE_V3_TURBO
      • TURBO

      To try different models, replace TINY with one of the above options in bot_config.env:

      # Available values are, defaults to TINY if mispelled:
      # >TINY             >TINY_EN
      # >BASE             >BASE_EN
      # >SMALL            >SMALL_EN
      # >MEDIUM           >MEDIUM_EN
      # >LARGE_V1         >LARGE_V2
      # >LARGE_V3         >LARGE
      # >LARGE_V3_TURBO   >TURBO
      MODEL_NAME=TINY

Note

Refer to the OpenAI whisper's official paper for the performance evaluation between the different models, available here

  1. Build the image:

    docker compose build

    The image created is named as whisper2me_bot:latest.

  2. Run the container with:

    docker compose up -d

    -d runs the container in detached mode.

Tip

The container is, by default, set to automatically restart on failure and when the device restart. This can be changed in the deploy.restart_policy.condition setting in docker-compose.yml file.

  1. When the container starts the model is downloaded. Depending on your internet connection and the selected model, this might take a while. The model's weights and the list of allowed users (other than the administrator) are stored in a volume named whisper2me_bot_data.

CUDA Setup โšก

To run whisper2me with CUDA acceleration, follow the regular setup, but use these commands for building and running the container:

  • Build:

    docker compose -f cuda-docker-compose.yml build
  • Run:

    docker compose -f cuda-docker-compose up -d

Note

Tested on Nvidia Orin AGX running Jetpack 5.1.2 with the NVIDIA L4T PyTorch r35.2.1-pth2.0-py3 image and on an RTX 3070 Ti running in WSL.

Usage ๐ŸŽ‰

Once everythingโ€™s running, open your botโ€™s chat and hit /start. Ready to roll! ๐Ÿ

To transcribe, just forward any voice message, and voilร , youโ€™ll receive the transcription. ๐Ÿš€

When a non-admin user tries a restricted command, the admin will be notified with a message containing the user_id and the command that the user sent. ๐Ÿ””

Available commands ๐Ÿ“

For all users:

  • /start begins the conversation with the bot
  • /info shows the current bot settings
  • /help shows a list of available commands

For the admin only:

  • /language change the model target language, currently are listed only:

    • ๐Ÿ‡บ๐Ÿ‡ธ English
    • ๐Ÿ‡ซ๐Ÿ‡ท French
    • ๐Ÿ‡ฉ๐Ÿ‡ช German
    • ๐Ÿ‡ฎ๐Ÿ‡น Italian
    • ๐Ÿ‡ช๐Ÿ‡ธ Spanish
  • /task change the model task to:

    • โœ Transcribe, the input voice message is trasncribed using the automatically detected language
    • ๐Ÿ—ฃ Translate, the input voice message is translated using the selected language with the /language command
  • /users lists the users that are currently allowed to use the bot

  • /add_user starts the interaction to add allow a new user. You can either send:

    • The user_id of the user you want to add
    • Forward a text message of the desired user so that the user_id is automatically retrieved, much simpler!
  • /remove_user starts the interaction to remove a user. A list of currently allowed users is display, simply click the one you want to remove

  • /purge removes all users from the allowed list. Requires a confirmation message that spells exactly YES

How it works โš™๏ธ

whisper2me combines the magic of OpenAI's whisper and pyTelegramBotAPI.

Note

Translation works only with non-_EN models

The code can run on both ARM-64 and X64 architectures. It has been tested on:

  • Raspberry Pi 3B with 1GB of RAM (using Raspberry Pi OS(64-bit) Lite), the only runnable model is the TINY one. Almost all available Pi's resources are used and runs approximately 6x slower than real-time.

  • Nvidia Orin AGX with 64GB of RAM (using Jetpack 5.1.2), all models run without any issue. Using the LARGE_V3 model requires around 25-30 GB of combined RAM (both CPU and GPU). Execution time is faster than real-time.

  • WSL on a desktop in both standard and CUDA version with an RTX 3070 Ti. Execution time is faster than real-time.

About

whisper2me is a telegram bot written with pyTelegramBotAPI that uses OpenAI's whisper to perform speech2text so you no longer have listen to voice messages ๐Ÿคซ๐Ÿ”‡

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published