A powerful voice assistant that uses Mistral AI for natural language processing, Whisper for speech recognition, and text-to-speech capabilities. The assistant can understand voice commands, search the web, and provide natural conversational responses.
- 🎙️ Voice activation with wake word detection ("Hey Mistral")
- 🗣️ Natural speech recognition using Whisper
- 🤖 AI-powered responses using Mistral
- 🌐 Web search integration for real-time information
- 🔊 High-quality text-to-speech synthesis
- ⚡ Interrupt capability during responses
- 🎯 Context-aware conversations
- Python 3.8 or higher
- Mistral AI running locally (via Ollama)
- Audio input device (microphone)
- Audio output device (speakers)
- Clone this repository:
git clone [your-repository-url]
cd voice_assistant
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Install the required packages:
pip install -r requirements.txt
- Make sure you have Ollama installed and the Mistral model running locally:
ollama run mistral
- Activate your virtual environment if not already activated:
source venv/bin/activate # On Windows, use: venv\Scripts\activate
- Run the voice assistant:
python main.py
-
Wake up the assistant by saying "Hey Mistral" or any of the following wake words:
- "Hi Mistral"
- "Hello Mistral"
- "Mistral"
- "Arise"
-
Ask your question or give a command after the assistant acknowledges you
-
To exit, simply say "exit" or press Ctrl+C
main.py
: Core application logic and voice assistant implementationstt.py
: Speech-to-text functionalityrequirements.txt
: Project dependencies
The assistant listens for wake words and becomes active only when called upon, preserving system resources and privacy.
- Uses Mistral AI for generating contextual and intelligent responses
- Integrates web search capabilities for real-time information
- Streams responses sentence by sentence for natural conversation flow
- Users can interrupt the assistant's response by speaking
- Smart detection of user input during response playback
- Graceful handling of interruptions and conversation flow
- Automatic detection of queries requiring current information
- Integration with DuckDuckGo for web searches
- Contextual responses incorporating web search results
- sounddevice==0.4.6: Audio input/output handling
- numpy==1.22.0: Numerical operations and audio processing
- requests==2.31.0: HTTP requests for web integration
- faster-whisper==0.10.0: Speech recognition
- TTS==0.22.0: Text-to-speech synthesis
Contributions are welcome! Please feel free to submit a Pull Request.
[Your chosen license]
- Mistral AI for the language model
- Whisper for speech recognition
- Mozilla TTS for speech synthesis
- DuckDuckGo for web search capabilities