AI-Maker-Space · rkv338 · Jun 4, 2025 · Jun 20, 2025 · Jun 20, 2025 · Jun 20, 2025
diff --git a/.cursor/rules/branch-rule.mdc b/.cursor/rules/branch-rule.mdc
@@ -0,0 +1,6 @@
+---
+alwaysApply: true
+---
+You always prefer to use branch development. Before writing any code - you create a feature branch to hold those changes. 
+
+After you are done - provide instructions in a "MERGE.md" file that explains how to merge the changes back to main with both a GitHub PR route and a GitHub CLI route.
diff --git a/MERGE.md b/MERGE.md
@@ -0,0 +1,187 @@
+# RAG PDF System Feature - Merge Instructions
+
+This document provides instructions for merging the RAG (Retrieval-Augmented Generation) PDF system from the `feature/rag-pdf-system` branch back to the `main` branch.
+
+## Changes Made
+
+### API Changes (`api/`)
+- Added `PyPDF2==3.0.1` and `python-dotenv==1.0.0` dependencies to `requirements.txt`
+- Enhanced `app.py` with comprehensive RAG functionality:
+  - **New imports**: Added aimakerspace library components and file upload handling
+  - **PDF Upload Endpoint** (`/api/upload-pdf-rag`):
+    - Accepts PDF file uploads
+    - Uses `aimakerspace.text_utils.PDFLoader` to extract text
+    - Uses `aimakerspace.text_utils.CharacterTextSplitter` to chunk text (1000 chars, 200 overlap)
+    - Uses `aimakerspace.vectordatabase.VectorDatabase` to create embeddings
+    - Returns PDF ID and processing status
+  - **RAG Chat Endpoint** (`/api/chat-rag`):
+    - Accepts user questions and PDF ID
+    - Uses vector similarity search to find relevant chunks
+    - Uses `aimakerspace.openai_utils.chatmodel.ChatOpenAI` for AI responses
+    - Uses `aimakerspace.openai_utils.prompts` for structured prompting
+    - Returns streaming responses with context-aware answers
+  - **PDF List Endpoint** (`/api/pdfs`):
+    - Lists all uploaded PDFs with metadata
+  - **Global storage**: In-memory storage for PDF documents and vector databases
+
+### Frontend Changes (`frontend/`)
+- **PdfUploadRAG Component** (`frontend/src/app/components/PdfUploadRAG.tsx`):
+  - Drag-and-drop PDF upload interface
+  - File validation (PDF only, max 10MB)
+  - Real-time upload status and error handling
+  - PDF selection interface with visual feedback
+  - Integration with parent component for state management
+- **RAGChat Component** (`frontend/src/app/components/RAGChat.tsx`):
+  - Chat interface specifically for PDF Q&A
+  - Streaming responses from RAG system
+  - Clear chat functionality
+  - Empty state when no PDF is selected
+- **RAGSystem Component** (`frontend/src/app/components/RAGSystem.tsx`):
+  - Main orchestrator component combining upload and chat
+  - State management for selected PDF
+  - Clean separation of concerns
+- **Updated Main Page** (`frontend/src/app/page.tsx`):
+  - Added RAG system above regular chat
+  - Updated description to mention PDF Q&A functionality
+
+### Aimakerspace Library Integration
+- **Text Processing**: Uses `PDFLoader` and `CharacterTextSplitter` for document processing
+- **Vector Database**: Uses `VectorDatabase` with cosine similarity for semantic search
+- **Embeddings**: Uses OpenAI's text-embedding-3-small model via `EmbeddingModel`
+- **Chat Model**: Uses GPT-4o-mini via `ChatOpenAI` with streaming support
+- **Prompting**: Uses structured prompts with `SystemRolePrompt` and `UserRolePrompt`
+
+## Key Features Implemented
+
+### RAG System Capabilities
+- ✅ **PDF Text Extraction**: Robust text extraction from PDF files
+- ✅ **Intelligent Chunking**: 1000-character chunks with 200-character overlap
+- ✅ **Semantic Search**: Vector similarity search using OpenAI embeddings
+- ✅ **Context-Aware Responses**: AI responses based on relevant document chunks
+- ✅ **Streaming Responses**: Real-time streaming of AI responses
+- ✅ **Multiple PDF Support**: Upload and chat with multiple PDFs
+- ✅ **Error Handling**: Comprehensive error handling throughout the system
+
+### User Experience
+- ✅ **Drag-and-Drop Upload**: Intuitive file upload interface
+- ✅ **Real-time Feedback**: Loading states and progress indicators
+- ✅ **PDF Selection**: Easy switching between uploaded PDFs
+- ✅ **Streaming Chat**: Real-time chat responses
+- ✅ **Responsive Design**: Works on all screen sizes
+- ✅ **Error Recovery**: Clear error messages and recovery options
+
+## Merge Instructions
+
+### Option 1: GitHub Pull Request (Recommended)
+
+1. **Push the feature branch to GitHub:**
+   ```bash
+   git push origin feature/rag-pdf-system
+   ```
+
+2. **Create a Pull Request:**
+   - Go to your GitHub repository
+   - Click "Compare & pull request" for the `feature/rag-pdf-system` branch
+   - Set the base branch to `main`
+   - Add a descriptive title: "Add RAG system with PDF upload and intelligent document Q&A"
+   - Add description of the changes made
+   - Review the changes and create the PR
+
+3. **Merge the Pull Request:**
+   - Review any CI/CD checks
+   - Click "Merge pull request"
+   - Delete the feature branch when prompted
+
+4. **Update local main branch:**
+   ```bash
+   git checkout main
+   git pull origin main
+   ```
+
+### Option 2: GitHub CLI
+
+1. **Push the feature branch:**
+   ```bash
+   git push origin feature/rag-pdf-system
+   ```
+
+2. **Create and merge PR using GitHub CLI:**
+   ```bash
+   # Create the pull request
+   gh pr create --title "Add RAG system with PDF upload and intelligent document Q&A" --body "This PR adds a comprehensive RAG system that allows users to upload PDFs and chat with them using AI-powered document understanding."
+
+   # Merge the pull request
+   gh pr merge --merge
+
+   # Switch back to main and pull changes
+   git checkout main
+   git pull origin main
+   ```
+
+## Testing the Feature
+
+After merging, you can test the RAG system:
+
+1. **Start the API server:**
+   ```bash
+   cd api
+   pip install -r requirements.txt
+   python app.py
+   ```
+
+2. **Start the frontend:**
+   ```bash
+   cd frontend
+   npm install
+   npm run dev
+   ```
+
+3. **Test the RAG system:**
+   - Upload a PDF file using the drag-and-drop interface
+   - Wait for indexing to complete
+   - Select the uploaded PDF
+   - Ask questions about the PDF content
+   - Verify that responses are contextually relevant
+   - Test with multiple PDFs
+
+## Environment Variables Required
+
+Ensure these environment variables are set:
+- `OPENAI_API_KEY`: Your OpenAI API key for embeddings and chat
+- `NEXT_PUBLIC_OPENAI_API_KEY`: Frontend access to OpenAI API key
+
+## Deployment Notes
+
+- The API endpoints will be automatically deployed to Vercel when merged to main
+- The frontend changes will also be deployed automatically
+- Ensure the `PyPDF2` and `python-dotenv` dependencies are properly installed
+- The aimakerspace library is included in the repository and will be deployed
+
+## Performance Considerations
+
+- **Memory Usage**: PDF documents and vector databases are stored in memory
+- **File Size Limits**: 10MB maximum PDF size
+- **Chunking Strategy**: 1000-character chunks with 200-character overlap
+- **Search Results**: Top 3 most relevant chunks used for context
+- **Streaming**: Real-time response streaming for better UX
+
+## Cleanup
+
+After successful merge:
+```bash
+# Delete the local feature branch
+git branch -d feature/rag-pdf-system
+
+# Delete the remote feature branch (if using GitHub PR method)
+git push origin --delete feature/rag-pdf-system
+```
+
+## Future Enhancements
+
+Potential improvements for future iterations:
+- Persistent storage for PDF documents and vector databases
+- Support for more document types (DOCX, TXT, etc.)
+- Advanced chunking strategies (semantic chunking)
+- User authentication and document ownership
+- Conversation history persistence
+- Advanced search filters and metadata 
diff --git a/aimakerspace/__init__.py b/aimakerspace/__init__.py
diff --git a/aimakerspace/openai_utils/__init__.py b/aimakerspace/openai_utils/__init__.py
diff --git a/aimakerspace/openai_utils/chatmodel.py b/aimakerspace/openai_utils/chatmodel.py
@@ -0,0 +1,45 @@
+from openai import OpenAI, AsyncOpenAI
+from dotenv import load_dotenv
+import os
+
+load_dotenv()
+
+
+class ChatOpenAI:
+    def __init__(self, model_name: str = "gpt-4o-mini"):
+        self.model_name = model_name
+        self.openai_api_key = os.getenv("OPENAI_API_KEY")
+        if self.openai_api_key is None:
+            raise ValueError("OPENAI_API_KEY is not set")
+
+    def run(self, messages, text_only: bool = True, **kwargs):
+        if not isinstance(messages, list):
+            raise ValueError("messages must be a list")
+
+        client = OpenAI()
+        response = client.chat.completions.create(
+            model=self.model_name, messages=messages, **kwargs
+        )
+
+        if text_only:
+            return response.choices[0].message.content
+
+        return response
+
+    async def astream(self, messages, **kwargs):
+        if not isinstance(messages, list):
+            raise ValueError("messages must be a list")
+
+        client = AsyncOpenAI()
+
+        stream = await client.chat.completions.create(
+            model=self.model_name,
+            messages=messages,
+            stream=True,
+            **kwargs
+        )
+
+        async for chunk in stream:
+            content = chunk.choices[0].delta.content
+            if content is not None:
+                yield content
diff --git a/aimakerspace/openai_utils/embedding.py b/aimakerspace/openai_utils/embedding.py
@@ -0,0 +1,59 @@
+from dotenv import load_dotenv
+from openai import AsyncOpenAI, OpenAI
+import openai
+from typing import List
+import os
+import asyncio
+
+
+class EmbeddingModel:
+    def __init__(self, embeddings_model_name: str = "text-embedding-3-small"):
+        load_dotenv()
+        self.openai_api_key = os.getenv("OPENAI_API_KEY")
+        self.async_client = AsyncOpenAI()
+        self.client = OpenAI()
+
+        if self.openai_api_key is None:
+            raise ValueError(
+                "OPENAI_API_KEY environment variable is not set. Please set it to your OpenAI API key."
+            )
+        openai.api_key = self.openai_api_key
+        self.embeddings_model_name = embeddings_model_name
+
+    async def async_get_embeddings(self, list_of_text: List[str]) -> List[List[float]]:
+        embedding_response = await self.async_client.embeddings.create(
+            input=list_of_text, model=self.embeddings_model_name
+        )
+
+        return [embeddings.embedding for embeddings in embedding_response.data]
+
+    async def async_get_embedding(self, text: str) -> List[float]:
+        embedding = await self.async_client.embeddings.create(
+            input=text, model=self.embeddings_model_name
+        )
+
+        return embedding.data[0].embedding
+
+    def get_embeddings(self, list_of_text: List[str]) -> List[List[float]]:
+        embedding_response = self.client.embeddings.create(
+            input=list_of_text, model=self.embeddings_model_name
+        )
+
+        return [embeddings.embedding for embeddings in embedding_response.data]
+
+    def get_embedding(self, text: str) -> List[float]:
+        embedding = self.client.embeddings.create(
+            input=text, model=self.embeddings_model_name
+        )
+
+        return embedding.data[0].embedding
+
+
+if __name__ == "__main__":
+    embedding_model = EmbeddingModel()
+    print(asyncio.run(embedding_model.async_get_embedding("Hello, world!")))
+    print(
+        asyncio.run(
+            embedding_model.async_get_embeddings(["Hello, world!", "Goodbye, world!"])
+        )
+    )
diff --git a/aimakerspace/openai_utils/prompts.py b/aimakerspace/openai_utils/prompts.py
@@ -0,0 +1,78 @@
+import re
+
+
+class BasePrompt:
+    def __init__(self, prompt):
+        """
+        Initializes the BasePrompt object with a prompt template.
+
+        :param prompt: A string that can contain placeholders within curly braces
+        """
+        self.prompt = prompt
+        self._pattern = re.compile(r"\{([^}]+)\}")
+
+    def format_prompt(self, **kwargs):
+        """
+        Formats the prompt string using the keyword arguments provided.
+
+        :param kwargs: The values to substitute into the prompt string
+        :return: The formatted prompt string
+        """
+        matches = self._pattern.findall(self.prompt)
+        return self.prompt.format(**{match: kwargs.get(match, "") for match in matches})
+
+    def get_input_variables(self):
+        """
+        Gets the list of input variable names from the prompt string.
+
+        :return: List of input variable names
+        """
+        return self._pattern.findall(self.prompt)
+
+
+class RolePrompt(BasePrompt):
+    def __init__(self, prompt, role: str):
+        """
+        Initializes the RolePrompt object with a prompt template and a role.
+
+        :param prompt: A string that can contain placeholders within curly braces
+        :param role: The role for the message ('system', 'user', or 'assistant')
+        """
+        super().__init__(prompt)
+        self.role = role
+
+    def create_message(self, format=True, **kwargs):
+        """
+        Creates a message dictionary with a role and a formatted message.
+
+        :param kwargs: The values to substitute into the prompt string
+        :return: Dictionary containing the role and the formatted message
+        """
+        if format:
+            return {"role": self.role, "content": self.format_prompt(**kwargs)}
+
+        return {"role": self.role, "content": self.prompt}
+
+
+class SystemRolePrompt(RolePrompt):
+    def __init__(self, prompt: str):
+        super().__init__(prompt, "system")
+
+
+class UserRolePrompt(RolePrompt):
+    def __init__(self, prompt: str):
+        super().__init__(prompt, "user")
+
+
+class AssistantRolePrompt(RolePrompt):
+    def __init__(self, prompt: str):
+        super().__init__(prompt, "assistant")
+
+
+if __name__ == "__main__":
+    prompt = BasePrompt("Hello {name}, you are {age} years old")
+    print(prompt.format_prompt(name="John", age=30))
+
+    prompt = SystemRolePrompt("Hello {name}, you are {age} years old")
+    print(prompt.create_message(name="John", age=30))
+    print(prompt.get_input_variables())