- Conversational Interface: Engage with the system using natural language queries to receive responses directly sourced from the PDFs.
- Direct Citation: Every response from the system includes a direct link to the source PDF page, ensuring traceability and verification.
- PDF Directory: A predefined set of key PDF documents, currently including WHO recommendations on major health topics such as schistosomiasis and malaria.
- 🦙 llama-3.1-8b-instant: Experience instant, efficient responses with this model optimized for quick interactions.
- 🦙 llama-3.1-70b-versatile: Utilize this versatile model for a wide range of complex language tasks with high accuracy.
- 📘 gpt-3.5-turbo: Engage with advanced, human-like interactions suitable for varied conversational tasks.
- 🦙 llama3-70b-8192: Enjoy high-end performance with this large-scale model, ideal for deep learning insights.
- 🦙 llama3-8b-8192: Harness robust capabilities with this more accessible version of Llama3, perfect for a wide range of AI applications.
- 🌟 mixtral-8x7b-32768: Leverage the power of ensemble modeling with Mixtral's extensive capacity for nuanced understanding and response generation.
- 💎 gemma-7b-it: Explore specialized interactions and tech-focused solutions with Gemma, tailored for IT and technical content.
The application utilizes a combination of OpenAI embeddings, Pinecone vector search, and a conversational interface to provide a seamless retrieval experience. When a query is made, the system:
- Converts the query into embeddings.
- Searches for the most relevant document sections using Pinecone's vector search.
- Returns the answer along with citations and links to the source documents.
-
Clone the repository:
git clone https://github.com/yourusername/RAG-nificent.git
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables in a
.env
(also see.env.example
file:PINECONE_INDEX_NAME
PINECONE_NAME_SPACE
OPENAI_API_KEY
PINECONE_API_KEY
GROQ_API_KEY
-
Create a Pinecone index with the same name as
PINECONE_INDEX_NAME
. Set it up withdimensions=1536
andmetric=cosine
. -
Place your PDFs in the
pdf_data
directory and rundata_ingestion.py
-
Run the application:
chainlit run src/app.py
The system currently includes guidelines from the following PDFs with direct links to the documents: