This project implements RAG using OpenAI's embedding models and LangChain's Python library. The aim is to make a user-friendly RAG application with the ability to ingest data from multiple sources (word, pdf, txt, youtube, wikipedia)
Domain areas include:
- Document splitting
- Embeddings (OpenAI)
- Vector database (Chroma / FAISS)
- Semantic search types
- Retrieval chain
- Introduce conversation retriever and memory states
- Log embedding costs
- Added a logger
- Added callback to record cost of queries
- rewrote main operations into OOP
- added resource caching
- archived old code
- refactored code to use tempfile to utilize langchain's loaders
- added functionality to allow srt files
- added webbaseloader and youtube loader
- added an option to use Wikipedia as the retriever instead
- added brief documentation
- added debug mode (exceptions will be raised)
- refactored to work on modules
- allowed for wikipedia query with RAG
- refactored to use yaml config file
- allowed for txt and docx files
- added status spinners
- updated tooltips
- Incorporated types of different query chains - restricted query, creative query
- Incorporated temperature settings
- Restructured functions to get functions
- Included explanations on the frontend and backend workings
- Included examples
- chroma was changed to 0.3.29 for streamlit - did not work, reverted
- switched to FAISS vector db from Chroma db due to compatibility issues with Streamlit (sqlite versioning)
- removed pywin32 from library, streamlit is unable to install this dependency