Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .cursor/rules/branch-rule.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
alwaysApply: true
---
You always prefer to use branch development. Before writing any code - you create a feature branch to hold those changes.

After you are done - provide instructions in a "MERGE.md" file that explains how to merge the changes back to main with both a GitHub PR route and a GitHub CLI route.
187 changes: 187 additions & 0 deletions MERGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
# RAG PDF System Feature - Merge Instructions

This document provides instructions for merging the RAG (Retrieval-Augmented Generation) PDF system from the `feature/rag-pdf-system` branch back to the `main` branch.

## Changes Made

### API Changes (`api/`)
- Added `PyPDF2==3.0.1` and `python-dotenv==1.0.0` dependencies to `requirements.txt`
- Enhanced `app.py` with comprehensive RAG functionality:
- **New imports**: Added aimakerspace library components and file upload handling
- **PDF Upload Endpoint** (`/api/upload-pdf-rag`):
- Accepts PDF file uploads
- Uses `aimakerspace.text_utils.PDFLoader` to extract text
- Uses `aimakerspace.text_utils.CharacterTextSplitter` to chunk text (1000 chars, 200 overlap)
- Uses `aimakerspace.vectordatabase.VectorDatabase` to create embeddings
- Returns PDF ID and processing status
- **RAG Chat Endpoint** (`/api/chat-rag`):
- Accepts user questions and PDF ID
- Uses vector similarity search to find relevant chunks
- Uses `aimakerspace.openai_utils.chatmodel.ChatOpenAI` for AI responses
- Uses `aimakerspace.openai_utils.prompts` for structured prompting
- Returns streaming responses with context-aware answers
- **PDF List Endpoint** (`/api/pdfs`):
- Lists all uploaded PDFs with metadata
- **Global storage**: In-memory storage for PDF documents and vector databases

### Frontend Changes (`frontend/`)
- **PdfUploadRAG Component** (`frontend/src/app/components/PdfUploadRAG.tsx`):
- Drag-and-drop PDF upload interface
- File validation (PDF only, max 10MB)
- Real-time upload status and error handling
- PDF selection interface with visual feedback
- Integration with parent component for state management
- **RAGChat Component** (`frontend/src/app/components/RAGChat.tsx`):
- Chat interface specifically for PDF Q&A
- Streaming responses from RAG system
- Clear chat functionality
- Empty state when no PDF is selected
- **RAGSystem Component** (`frontend/src/app/components/RAGSystem.tsx`):
- Main orchestrator component combining upload and chat
- State management for selected PDF
- Clean separation of concerns
- **Updated Main Page** (`frontend/src/app/page.tsx`):
- Added RAG system above regular chat
- Updated description to mention PDF Q&A functionality

### Aimakerspace Library Integration
- **Text Processing**: Uses `PDFLoader` and `CharacterTextSplitter` for document processing
- **Vector Database**: Uses `VectorDatabase` with cosine similarity for semantic search
- **Embeddings**: Uses OpenAI's text-embedding-3-small model via `EmbeddingModel`
- **Chat Model**: Uses GPT-4o-mini via `ChatOpenAI` with streaming support
- **Prompting**: Uses structured prompts with `SystemRolePrompt` and `UserRolePrompt`

## Key Features Implemented

### RAG System Capabilities
- ✅ **PDF Text Extraction**: Robust text extraction from PDF files
- ✅ **Intelligent Chunking**: 1000-character chunks with 200-character overlap
- ✅ **Semantic Search**: Vector similarity search using OpenAI embeddings
- ✅ **Context-Aware Responses**: AI responses based on relevant document chunks
- ✅ **Streaming Responses**: Real-time streaming of AI responses
- ✅ **Multiple PDF Support**: Upload and chat with multiple PDFs
- ✅ **Error Handling**: Comprehensive error handling throughout the system

### User Experience
- ✅ **Drag-and-Drop Upload**: Intuitive file upload interface
- ✅ **Real-time Feedback**: Loading states and progress indicators
- ✅ **PDF Selection**: Easy switching between uploaded PDFs
- ✅ **Streaming Chat**: Real-time chat responses
- ✅ **Responsive Design**: Works on all screen sizes
- ✅ **Error Recovery**: Clear error messages and recovery options

## Merge Instructions

### Option 1: GitHub Pull Request (Recommended)

1. **Push the feature branch to GitHub:**
```bash
git push origin feature/rag-pdf-system
```

2. **Create a Pull Request:**
- Go to your GitHub repository
- Click "Compare & pull request" for the `feature/rag-pdf-system` branch
- Set the base branch to `main`
- Add a descriptive title: "Add RAG system with PDF upload and intelligent document Q&A"
- Add description of the changes made
- Review the changes and create the PR

3. **Merge the Pull Request:**
- Review any CI/CD checks
- Click "Merge pull request"
- Delete the feature branch when prompted

4. **Update local main branch:**
```bash
git checkout main
git pull origin main
```

### Option 2: GitHub CLI

1. **Push the feature branch:**
```bash
git push origin feature/rag-pdf-system
```

2. **Create and merge PR using GitHub CLI:**
```bash
# Create the pull request
gh pr create --title "Add RAG system with PDF upload and intelligent document Q&A" --body "This PR adds a comprehensive RAG system that allows users to upload PDFs and chat with them using AI-powered document understanding."

# Merge the pull request
gh pr merge --merge

# Switch back to main and pull changes
git checkout main
git pull origin main
```

## Testing the Feature

After merging, you can test the RAG system:

1. **Start the API server:**
```bash
cd api
pip install -r requirements.txt
python app.py
```

2. **Start the frontend:**
```bash
cd frontend
npm install
npm run dev
```

3. **Test the RAG system:**
- Upload a PDF file using the drag-and-drop interface
- Wait for indexing to complete
- Select the uploaded PDF
- Ask questions about the PDF content
- Verify that responses are contextually relevant
- Test with multiple PDFs

## Environment Variables Required

Ensure these environment variables are set:
- `OPENAI_API_KEY`: Your OpenAI API key for embeddings and chat
- `NEXT_PUBLIC_OPENAI_API_KEY`: Frontend access to OpenAI API key

## Deployment Notes

- The API endpoints will be automatically deployed to Vercel when merged to main
- The frontend changes will also be deployed automatically
- Ensure the `PyPDF2` and `python-dotenv` dependencies are properly installed
- The aimakerspace library is included in the repository and will be deployed

## Performance Considerations

- **Memory Usage**: PDF documents and vector databases are stored in memory
- **File Size Limits**: 10MB maximum PDF size
- **Chunking Strategy**: 1000-character chunks with 200-character overlap
- **Search Results**: Top 3 most relevant chunks used for context
- **Streaming**: Real-time response streaming for better UX

## Cleanup

After successful merge:
```bash
# Delete the local feature branch
git branch -d feature/rag-pdf-system

# Delete the remote feature branch (if using GitHub PR method)
git push origin --delete feature/rag-pdf-system
```

## Future Enhancements

Potential improvements for future iterations:
- Persistent storage for PDF documents and vector databases
- Support for more document types (DOCX, TXT, etc.)
- Advanced chunking strategies (semantic chunking)
- User authentication and document ownership
- Conversation history persistence
- Advanced search filters and metadata
Empty file added aimakerspace/__init__.py
Empty file.
Empty file.
45 changes: 45 additions & 0 deletions aimakerspace/openai_utils/chatmodel.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from openai import OpenAI, AsyncOpenAI
from dotenv import load_dotenv
import os

load_dotenv()


class ChatOpenAI:
def __init__(self, model_name: str = "gpt-4o-mini"):
self.model_name = model_name
self.openai_api_key = os.getenv("OPENAI_API_KEY")
if self.openai_api_key is None:
raise ValueError("OPENAI_API_KEY is not set")

def run(self, messages, text_only: bool = True, **kwargs):
if not isinstance(messages, list):
raise ValueError("messages must be a list")

client = OpenAI()
response = client.chat.completions.create(
model=self.model_name, messages=messages, **kwargs
)

if text_only:
return response.choices[0].message.content

return response

async def astream(self, messages, **kwargs):
if not isinstance(messages, list):
raise ValueError("messages must be a list")

client = AsyncOpenAI()

stream = await client.chat.completions.create(
model=self.model_name,
messages=messages,
stream=True,
**kwargs
)

async for chunk in stream:
content = chunk.choices[0].delta.content
if content is not None:
yield content
59 changes: 59 additions & 0 deletions aimakerspace/openai_utils/embedding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
from dotenv import load_dotenv
from openai import AsyncOpenAI, OpenAI
import openai
from typing import List
import os
import asyncio


class EmbeddingModel:
def __init__(self, embeddings_model_name: str = "text-embedding-3-small"):
load_dotenv()
self.openai_api_key = os.getenv("OPENAI_API_KEY")
self.async_client = AsyncOpenAI()
self.client = OpenAI()

if self.openai_api_key is None:
raise ValueError(
"OPENAI_API_KEY environment variable is not set. Please set it to your OpenAI API key."
)
openai.api_key = self.openai_api_key
self.embeddings_model_name = embeddings_model_name

async def async_get_embeddings(self, list_of_text: List[str]) -> List[List[float]]:
embedding_response = await self.async_client.embeddings.create(
input=list_of_text, model=self.embeddings_model_name
)

return [embeddings.embedding for embeddings in embedding_response.data]

async def async_get_embedding(self, text: str) -> List[float]:
embedding = await self.async_client.embeddings.create(
input=text, model=self.embeddings_model_name
)

return embedding.data[0].embedding

def get_embeddings(self, list_of_text: List[str]) -> List[List[float]]:
embedding_response = self.client.embeddings.create(
input=list_of_text, model=self.embeddings_model_name
)

return [embeddings.embedding for embeddings in embedding_response.data]

def get_embedding(self, text: str) -> List[float]:
embedding = self.client.embeddings.create(
input=text, model=self.embeddings_model_name
)

return embedding.data[0].embedding


if __name__ == "__main__":
embedding_model = EmbeddingModel()
print(asyncio.run(embedding_model.async_get_embedding("Hello, world!")))
print(
asyncio.run(
embedding_model.async_get_embeddings(["Hello, world!", "Goodbye, world!"])
)
)
78 changes: 78 additions & 0 deletions aimakerspace/openai_utils/prompts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import re


class BasePrompt:
def __init__(self, prompt):
"""
Initializes the BasePrompt object with a prompt template.

:param prompt: A string that can contain placeholders within curly braces
"""
self.prompt = prompt
self._pattern = re.compile(r"\{([^}]+)\}")

def format_prompt(self, **kwargs):
"""
Formats the prompt string using the keyword arguments provided.

:param kwargs: The values to substitute into the prompt string
:return: The formatted prompt string
"""
matches = self._pattern.findall(self.prompt)
return self.prompt.format(**{match: kwargs.get(match, "") for match in matches})

def get_input_variables(self):
"""
Gets the list of input variable names from the prompt string.

:return: List of input variable names
"""
return self._pattern.findall(self.prompt)


class RolePrompt(BasePrompt):
def __init__(self, prompt, role: str):
"""
Initializes the RolePrompt object with a prompt template and a role.

:param prompt: A string that can contain placeholders within curly braces
:param role: The role for the message ('system', 'user', or 'assistant')
"""
super().__init__(prompt)
self.role = role

def create_message(self, format=True, **kwargs):
"""
Creates a message dictionary with a role and a formatted message.

:param kwargs: The values to substitute into the prompt string
:return: Dictionary containing the role and the formatted message
"""
if format:
return {"role": self.role, "content": self.format_prompt(**kwargs)}

return {"role": self.role, "content": self.prompt}


class SystemRolePrompt(RolePrompt):
def __init__(self, prompt: str):
super().__init__(prompt, "system")


class UserRolePrompt(RolePrompt):
def __init__(self, prompt: str):
super().__init__(prompt, "user")


class AssistantRolePrompt(RolePrompt):
def __init__(self, prompt: str):
super().__init__(prompt, "assistant")


if __name__ == "__main__":
prompt = BasePrompt("Hello {name}, you are {age} years old")
print(prompt.format_prompt(name="John", age=30))

prompt = SystemRolePrompt("Hello {name}, you are {age} years old")
print(prompt.create_message(name="John", age=30))
print(prompt.get_input_variables())
Loading