Enhancing Educational Question Answering Systems Using Retrieval-Augmented Generation (RAG) of Subject-Related Materials
This project aims to enhance educational question answering systems by integrating Retrieval-Augmented Generation (RAG) techniques to provide more accurate and contextually relevant responses. The system leverages a Python backend for processing and generating answers and will feature a frontend for user interaction.
Traditional question answering systems in education often struggle with providing contextually relevant answers. This project seeks to improve these systems by using Retrieval-Augmented Generation (RAG), which combines the retrieval of relevant documents with generative models to produce better answers.
- RAG Integration: Combines document retrieval with generation for improved answer quality.
- Python Backend: Built with a focus on scalability and efficiency.
- Dependency Management: Uses Poetry for managing dependencies.
- Future Frontend: A user-friendly interface for students and educators (to be designed).
.
├── README.md
├── ed_rag
│ ├── README.md
│ ├── app.py
│ ├── celery_worker.py
│ ├── data
│ │ └── pdf files
│ ├── ed_rag
│ │ ├── __init__.py
│ │ ├── rag.py
│ │ ├── test.csv
│ │ ├── test.py
│ │ └── trainer.py
│ ├── poetry.lock
│ ├── pyproject.toml
│ ├── tests
│ │ ├── __init__.py
│ │ └── test.py
│ └── web
│ ├── __init__.py
│ ├── apis
│ │ ├── __init__.py
│ │ ├── chats.py
│ │ ├── db
│ │ │ ├── __init__.py
│ │ │ └── session.py
│ │ ├── healthcheck.py
│ │ ├── history.py
│ │ ├── tasks.py
│ │ └── upload.py
│ ├── core.py
│ └── models
│ ├── Chat.py
│ ├── __init__.py
├── ed_rag_fe
│ ├── README.md
│ ├── package-lock.json
│ ├── package.json
│ ├── public
│ │ ├── favicon.ico
│ │ ├── index.html
│ │ ├── logo192.png
│ │ ├── logo512.png
│ │ ├── manifest.json
│ │ └── robots.txt
│ └── src
│ ├── App.css
│ ├── App.js
│ ├── App.test.js
│ ├── Components
│ │ ├── ChatBoxComponent
│ │ │ ├── ChatBoxComponent.css
│ │ │ └── ChatBoxComponent.js
│ │ ├── ChatInterfaceComponent
│ │ │ ├── ChatInterfaceComponent.css
│ │ │ └── ChatInterfaceComponent.js
│ │ ├── Dock
│ │ │ └── DockComponent.js
│ │ ├── MainComponent
│ │ │ ├── MainComponent.css
│ │ │ └── MainComponent.js
│ │ └── SidebarComponent
│ │ ├── SidebarComponent.css
│ │ └── SidebarComponent.js
│ ├── app
│ │ ├── api.js
│ │ ├── features
│ │ │ ├── chatSlice.js
│ │ │ └── historySlice.js
│ │ └── store.js
│ ├── index.css
│ ├── index.js
│ ├── logo.svg
│ ├── reportWebVitals.js
│ ├── services
│ │ └── MessageService.js
│ └── setupTests.js
To set up the project, ensure you have Poetry installed. Then, follow these steps:
-
Clone the repository:
git clone https://github.com/Durgnan/educational_rag_msc_project.git cd educational_rag_msc_project
-
Navigate to the Backend Directory:
cd ed_rag
-
Install dependencies:
poetry install
-
Start Redis server:
redis-server
-
Start Celery Worker:
celery -A web.apis.tasks worker --loglevel=info --pool=solo
-
Activate the virtual environment:
poetry shell
-
Start the Flask server:
flask run --port 8080
-
RAG Initial Training:
Initially the RAG needs to be trained based on the Resources you want to add. This can be done from either Frontend or directly uploading files in data/ folder in ed_rag/data/ folder. The files needs to be in PDF as of now. once the files are copied in data folder. Follow these steps to run the trainer pipeline by giving the src in Trainer Object in main function. The Trainer object can either take a Folder or Specific files.
# Please disragard the db param rag = Trainer(r"../data/*", db="faiss_db_900") # For All files inside the folder. rag = Trainer(r"../data/dp-II-annotated.pdf", db="faiss_db_900") # For a specific file inside the folder.
-
Navigate to the Frontend Directory:
cd ed_rag_fe
-
Set up the
.env
file for the Backend Server:Ensure the
.env
file in the frontend directory contains the correct backend server URL.Example
.env
:REACT_APP_BASE_URL=http://127.0.0.1:8080
-
Install dependencies:
npm install
-
Start the Frontend:
npm start
The Frontend will be live on the first available port starting from 3000. For e.g. http://localhost:3000
To use the system, first start both the backend and frontend services following the steps above. Then, interact with the system via the frontend interface to ask educational questions and receive contextually enhanced answers.
This project is aimed to work more on implementing RAG related features like
- Hybrid search
- Reciprocal Rank fusion
- Chunking best practices and
- Cross support for different type of files.
- Agentic RAG and Evaluation.
- STT TTS Implementation.
- MultiModel RAG.
For any inquiries, please contact the project maintainer at [email protected].