This project implements a Single Agent RAG (Retrieval-Augmented Generation) Router system, demonstrating an efficient approach to combining different AI capabilities through a unified routing mechanism. The system intelligently routes queries to appropriate tools based on query type, enhancing response accuracy and capability range.
The system implements a single-agent RAG architecture where a central agent routes queries to different tools based on query analysis:
- Query Router: Central intelligence that determines query type and appropriate tool
- Vector Store: Manages knowledge base using Qdrant
- LLM Integration: Uses Ollama with Llama2 for generation and reasoning
- Tool Integration: Web search, calculator, and vector search capabilities
- Knowledge Retrieval: Access to stored knowledge base
- Web Search: Current information retrieval
- Calculations: Mathematical operations
- Direct Responses: Immediate LLM-based answers
- FastAPI: Main application framework
- Qdrant: Vector database for knowledge storage
- Ollama: Local LLM deployment
- Docker: Containerization and deployment
- Python 3.10+: Core development language
- Docker Desktop (Windows/macOS)
- Windows requires WSL 2 (Windows Subsystem for Linux)
- macOS requires Docker Desktop for Mac
- Git
- Minimum 8GB RAM recommended for running the services
- 4 CPU cores recommended for optimal performance
- Docker Desktop Resource Configuration:
- Memory: Minimum 10.8 GB
- CPU: Minimum 10 cores
- Swap: At least 1 GB
- Virtual disk limit: At least 64 GB
Important Note: Docker Desktop resource configuration differs slightly between platforms:
For Windows:
- Open Docker Desktop
- Go to Settings (⚙️)
- Navigate to "Resources" > "WSL 2" on Windows
- Adjust the memory, CPU, and swap limits to at least the minimum values listed above
- Click "Apply & Restart"
For macOS:
- Open Docker Desktop
- Go to Settings (⚙️)
- Navigate to "Resources" > "Advanced"
- Adjust the memory, CPU, and swap limits to at least the minimum values listed above
- Click "Apply & Restart"
- Clone the repository:
# Windows (PowerShell or Command Prompt)
git clone [repository-url]
cd single-agent-rag-project
# macOS/Linux (Terminal)
git clone [repository-url]
cd single-agent-rag-project
- Start the services:
# Works the same on all platforms
docker compose up -d
- Verify installation:
# Windows PowerShell
Invoke-WebRequest http://localhost:8000/health
# Windows Command Prompt
curl http://localhost:8000/health
# macOS/Linux
curl http://localhost:8000/health
- Ensure WSL 2 is installed and configured
- Use PowerShell or Command Prompt for running commands
- If using PowerShell and curl commands don't work, use
Invoke-WebRequest
instead - Line endings in text files should be handled automatically by Git
- Ensure Docker Desktop has permission to access your file system
- Terminal commands should work as shown in examples
- If permission issues occur, check Docker Desktop's file sharing settings
- If services fail to start, check Docker Desktop resource allocation
- Ensure all ports (8000, 6333, 11434) are available
- If using antivirus software, you may need to add Docker to the allowed applications
- Clone the repository:
git clone [repository-url]
cd single-agent-rag-project
- Start the services:
docker compose up -d
- Verify installation:
curl http://localhost:8000/health
Environment variables can be configured in the docker-compose.yml
file.
The project runs Ollama in a Docker container, so you don't need to install the Ollama desktop application. Everything is handled through Docker containers.
After starting the services, you'll need to ensure the required LLM model is available:
- Pull the Llama2 model:
# For Windows PowerShell
docker exec -it single-agent-rag-project-ollama-1 ollama pull llama2
# For macOS/Linux Terminal
docker exec -it single-agent-rag-project-ollama-1 ollama pull llama2
- Verify model availability:
docker exec -it single-agent-rag-project-ollama-1 ollama list
Note:
- The initial model download might take several minutes depending on your internet connection
- The Llama2 model is approximately 4GB in size
- All models are stored in a Docker volume, not on your local system
- No local Ollama installation is required - everything runs in containers
[Rest of the Ollama setup section remains the same...]
After starting the services, you'll need to populate the vector stores with documents before you can perform knowledge retrieval queries.
The project includes sample documents in data/sample_data/
:
- Technical documentation:
data/sample_data/technical_docs/rag_systems_overview.txt
data/sample_data/technical_docs/vector-databases-explained.md
- Business documentation:
data/sample_data/business_docs/generic-company-profile.md
data/sample_data/business_docs/rag_market_analysis.txt
- Upload Technical Documents
Using Postman, create a new POST request:
- URL:
http://localhost:8000/upload/files
- Request Type: POST
- Body: form-data
- Form Fields:
- Key:
files
(Type: File) Value: Selectrag_systems_overview.txt
- Key:
store_name
(Type: Text) Value:technical_docs
- Key:
metadata
(Type: Text) Value:{"category": "technical", "subject": "rag"}
- Key:
Repeat the same process for vector_databases_explained.txt
, updating the file selection.
- Upload Business Documents
Create another POST request with the same URL but different form data:
- URL:
http://localhost:8000/upload/files
- Request Type: POST
- Body: form-data
- Form Fields:
- Key:
files
(Type: File) Value: Selectgeneric-company-profile.md
- Key:
store_name
(Type: Text) Value:business_docs
- Key:
metadata
(Type: Text) Value:{"category": "business", "type": "company_profile"}
- Key:
Repeat for rag_market_analysis.txt
, updating the file selection.
In Postman:
- Create a GET request to
http://localhost:8000/documents/technical_docs
- Create a GET request to
http://localhost:8000/documents/business_docs
These requests will show you all documents stored in each collection.
In Postman:
- Create a POST request to
http://localhost:8000/query
- Set Content-Type header to
application/json
- In the request body (raw, JSON), enter:
{
"query": "What is RAG and how does it work?"
}
For business queries, use:
{
"query": "What are the current market trends for RAG systems?"
}
- URL:
http://localhost:8000/query
- Method: POST
- Headers:
- Content-Type: application/json
- Body (raw, JSON):
{
"query": "Your question here"
}
Example queries:
{
"query": "What is RAG and how does it work?"
}
{
"query": "What is 15 multiplied by 25?"
}
{
"query": "What are the latest developments in AI?"
}
- URL:
http://localhost:8000/upload/files
- Method: POST
- Body: form-data
- Form Fields:
- file: (Select File)
- store_name: (technical_docs/business_docs)
- metadata: (JSON object with document metadata)
- URL:
http://localhost:8000/search/{store_name}
- Method: GET
- Query Params: query (your search term)
- URL:
http://localhost:8000/health
- Method: GET
- URL:
http://localhost:8000/stores
- Method: GET
Comprehensive testing documentation is available in TESTING.md, including:
- Functional testing results
- System component validation
- Performance observations
- API endpoint testing
- Example queries and responses
SINGLE-AGENT-RAG-PROJECT/
├── config/ # Configuration management
├── src/ # Core source code
│ ├── data_pipeline/ # Document processing
│ ├── integration/ # Service integration
│ ├── query_processing/# Query handling
│ ├── router/ # Query routing logic
│ ├── tools/ # External tools
│ ├── vector_db/ # Vector store management
│ └── main.py # Application entry point
├── docker-compose.yml # Service orchestration
└── Dockerfile # Container configuration
This project was developed with a focus on:
- Clean, maintainable code structure
- Robust error handling
- Clear separation of concerns
- Extensible architecture
- Comprehensive documentation
Potential areas for expansion:
- Advanced query planning
- Advanced routing capabilities
- Additional tool integration
- Performance optimization
- Enhanced caching mechanisms