Skip to content

JarikDem-Bot/ai-waifu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Waifu (VTuber)

GitHub GitHub top language Static Badge

Anime AI Waifu is an AI powered voice assistant with VTuber's model, that combines the charm of anime characters with cutting-edge technologies. This project is meant to create an engaging experience where you can interact with desired character in real-time without powerful hardware.

Features

  • 🎤 Voice Interaction: Speak to your AI waifu and get instant (almost) responses.

    • Whisper - openai's paid speech recognition.
    • Google sr - free speech recognition alternative.
    • Console - if you don't want use microphone just type prompts with your keyboard.
  • 🤖 AI Chatbot Integration: Conversations are powered by an AI chatbot, ensuring engaging and dynamic interactions.

    • Openai's 'gpt-3.5-turbo' or any other available model.
    • File with personality and behaviour description.
    • Remembers previous messages.
  • 📢 Text-to-Speech: Hear your AI waifu's responses as she speaks back to you, creating an immersive experience.

    • Google tts - free and simple solution.
    • ElevenLabs - amazing results, tons of voices.
    • Console - get text responses in your console (but VTube model will be just idle).
  • 🌐 Integration with VTube Studio: Seamlessly connect your AI waifu to VTube Studio for an even more lifelike and visually engaging interaction.

    • Lipsync while talking.

Showcase

Video demonstration

*Demonstration in real time without cutouts or speed up. This is real delay in answers.

Installation

To run this project, you need:

  1. Install Python 3.10.5 if you don't already have it installed.

  2. Clone the repository by running git clone https://github.com/JarikDem-Bot/ai-waifu.git

  3. Install the required Python packages by running pip install -r requirements.txt in the project directory.

  4. Create .env file inside the project directory and enter your API keys

    .env template
    OPENAI_API_KEY='YOUR_OPEN_AI_KEY'
    ELEVENLABS_API_KEY='YOUR_ELEVENLABS_KEY'
  5. Install VB-Cable

  6. Install and set VTube Studio

    Settings:
    • Select CABLE Output as microphone. Select Preview microphone audio to hear waifu's answers

    • Select input and output for Mouth Open. Optionally you can set "breathing" to get idle movents.

  7. Select your required settings in main.py in waifu.initialize

    Arguments:
    • user_input_service (str) - the way to interact with Waifu

      • "whisper" - OpenAI's whisper speech to text service; paid, requires OpanAi API key.
      • "google" - free google speech to text service.
      • "console" - type your promt in console with text (absoulutely free).
      • None or unspecified - default value is "whisper".
    • stt_duration (float) - the maximum number of seconds that it will dynamically adjust the threshold for before returning. This value should be at least 0.5 in order to get a representative sample of the ambient noise. Default value is 0.5.

    • mic_index (int) - index of the device to use for audio input. If None or unspecified will use default microphone.

    • chatbot_service (str) - service that will generate responses

      • "openai" - OpenAI text generation servise; paid, requires OpanAi API key.
      • "test" - returns prewritten message; used as dummy text for developement to reduce time and cost of testings.
      • None or unspecified - default value is "openai".
    • chatbot_model (str) - model used for text generation. List of available models you can find here. Default value is "gpt-3.5-turbo".

    • chatbot_temperature (float) - determines creativity of the generated text. A higher value leads to more creative result. A lower value leads to less creative and more similar results. Default value is 0.5.

    • personality_file (str) - relative path to txt file with waifu's description. Default value is "personality.txt".

    • tts_service (str) - service that "reads" Waifu's responses

      • "google" - free Google's tts, voice feels very "robotic".
      • "elevenlabs" - ElevenLabs tts with good quality; paid, requires ElevenLabs API key.
      • "console" - output will be printed in console (free).
      • None or unspecified - default value is "google".
    • output_device - (int) output device ID or (str) output device name substring. If VB-Cable is used, you need to find device, that will start with CABLE Input (VB-Audio Virtual using sd.query_devices() command.

    • tts_voice (str) - ElevenLabs voice name. Default value is "Elli".

    • tts_model (str) - ElevenLabs model. Recommended values are "eleven_monolingual_v1" and "eleven_multilingual_v1". Default value is "eleven_monolingual_v1".

  8. Run the project by executing python main.py in the project directory.


Warning

Depending on the selected input mode, program may send all recorded sounds or other data to the 3-rd parties such as: Google (stt, tts), OpenAI (stt, text generation), ElevenLabs (tts).

License

MIT