AgentX is an advanced AI-powered assistant that integrates voice processing, document retrieval, conversational memory, and generative AI capabilities. This agent is designed to handle a wide range of tasks, from answering queries to processing voice commands.
-
Voice Recognition
- Converts speech to text using Vosk models.
- Processes voice commands and generates audio responses via Google Text-to-Speech (gTTS).
-
Document Retrieval
- Utilizes FAISS for efficient document embedding and retrieval.
- Supports multiple document types for context-aware responses.
-
Generative AI
- Powered by OpenAI GPT models for conversational capabilities.
- Generates meaningful responses based on retrieved documents and context.
-
Conversational Memory
- Maintains context across interactions using LangChain's
ConversationBufferMemory
.
- Maintains context across interactions using LangChain's
-
Integration Ready
- Modular design to integrate additional tools like Whisper API, advanced LangChain tools, or scheduling tasks.
Tool | Purpose |
---|---|
LangChain | Conversational chains and memory handling |
OpenAI GPT | Language generation and query answering |
Vosk | Speech-to-text offline transcription |
gTTS | Text-to-speech conversion |
FAISS | Document embedding and vector retrieval |
playsound | Audio playback |
dotenv | Environment variable management |
pytest | Unit testing framework |
Logging | Structured logging for debugging |
.
├── config
│ ├── constants.py # Reusable constants
│ ├── settings.py # Application settings and environment variables
├── core
│ ├── agent.py # Main agent logic
│ ├── document_handler.py # Document loading and splitting
│ ├── memory.py # Conversational memory handler
│ ├── voice_processor.py # Voice processing logic
├── data
│ ├── documents # Sample documents for testing
│ ├── audio # Audio files (input/output)
├── model
│ └── vosk-model-small-en-us-0.15 # Speech recognition model
├── scripts
│ ├── run_agent.sh # Script to run the agent
│ ├── setup_env.sh # Script to set up the environment
├── tests
│ ├── test_agent.py # Tests for agent.py
│ ├── test_memory.py # Tests for memory handler
│ ├── test_voice_processor.py # Tests for voice processing
├── Dockerfile # Docker setup
├── README.md # Project documentation
├── requirements.txt # Python dependencies
└── main.py # Entry point for the application
- Python 3.11 or later
- Vosk Model: Download and place the model in the
model
directory. - API Keys:
- OpenAI API Key
- Whisper API Key (Optional)
-
Clone the repository:
git clone https://github.com/sattyamjjain/AgentX.git cd AgentX
-
Set up the environment:
./scripts/setup_env.sh
-
Create a
.env
file:OPENAI_API_KEY=<your_openai_api_key> WHISPER_API_KEY=<your_whisper_api_key> DEBUG=True
-
Run the application:
python3 main.py
Run the test suite using pytest
:
pytest tests/
-
Running the Agent:
python3 main.py
-
Interactive Voice Commands:
- Speak into the microphone, and AgentX will respond.
-
Document Retrieval:
- Place
.txt
files indata/documents
to make them available for querying.
- Place
- Integration with Whisper API for high-quality transcription.
- Multi-modal capabilities for image and video processing.
- Improved conversational flows with dynamic memory management.
- Sattyam Jain
- Open to contributions!
For issues or feature requests, please open an issue on the GitHub repository.