🌟 Multimodal Document Processing RAG with LangChain 🌟

This project is a Streamlit application for processing multimodal documents and querying a Milvus database. It leverages cutting-edge tools like LangChain, transformers, EasyOCR, and others for processing, storing, and querying text extracted from various file types. 🚀

✨ Features

🗂️ Upload File Processing:

Supports multiple file types: audio, video, image, text, csv, yaml, json, docx, and pdf.
Extracts text content using:
- 🔊 Audio: speech_recognition and pydub.
- 🎥 Video: Custom extraction logic.
- 🖼️ Image: EasyOCR.
- 📄 Text/Logs/Documents: LangChain loaders.

🛠️ Milvus Integration:

🗃️ Stores processed document embeddings for similarity-based querying.
🧠 Utilizes HuggingFaceEmbeddings for generating vector representations.

🔍 Query Interface:

Natural language query interface.
Implements a Retrieval-Augmented Generation (RAG) pipeline for AI-driven responses.

🛠️ Installation

🔧 Prerequisites

Python 3.8+
pip or conda package manager
CUDA-compatible GPU (optional, for faster processing)

📥 Fork and Clone the Repository

Fork the repository: Navigate to RAG-Architecture GitHub Repository and click Fork.

Clone the forked repository:

git clone https://github.com/<your-username>/RAG-Architecture.git
cd RAG-Architecture

📦 Install Dependencies

pip install -r requirements.txt

🚀 Usage

🖥️ Start the Application

Run the Streamlit app:

streamlit run app.py

🔄 Application Modes

📤 Upload Files:

Upload a file to process and store its content in Milvus.
Displays extracted content and stores embeddings in the database.

❓ Query:

Enter a question to search and retrieve relevant information from the Milvus database.
Returns AI-generated responses using LangChain's RAG pipeline.

📁 File Structure

## 📁 **File Structure**

```bash
project/
│
├── app.py                      # 🎯 Main Streamlit application
├── requirements.txt            # 📦 Python dependencies
├── utils/                      # 🛠️ Utility modules
│   ├── audio_utils.py          # 🎵 Audio file processing
│   ├── video_utils.py          # 📹 Video file processing
│   ├── image_utils.py          # 🖼️ Image file processing
│   ├── document_loaders.py     # 📜 Document processing loaders
│   ├── milvus_client.py        # 🗄️ Initializes Milvus database
│
├── milvus_database.db          # 🗃️ Milvus database file (auto-created)
├── Dataset                     # 📂 Folder to store datasets
├── Images                      # 📁 Folder for storing images

🔑 Key Modules

`app.py`

🧩 Main application logic

Handles file uploads, document processing, and querying.

`utils/`

🎵 Audio: Splits audio into chunks and transcribes text.
📹 Video: Processes video files to extract and analyze content.
🖼️ Image: Uses EasyOCR for extracting text.
📜 Logs/Documents: Processes CSV, YAML, JSON, and PDF files into structured LangChain documents.

🛠️ Example Workflow

📤 Uploading a File

Select "Upload Files" mode.
Upload a file (e.g., example.pdf).
Process and store the file in the database.

❓ Querying the Database

Select "Query" mode.
Enter a natural language question.
Receive a concise, fact-based response.

🌟 Future Improvements

🔍 Add more advanced query capabilities.
📂 Enhance support for additional file types and embeddings.
⚡ Improve scalability for larger datasets.

📜 License This project is licensed under the MIT License.

🙌 Acknowledgments

🌐 Streamlit for the interactive UI.
📚 LangChain and Milvus for document processing, retrieval and vector db.
🤖 Transformers for embedding generation.
🖼️ EasyOCR for image text extraction.
📹 Moviepy for video processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌟 Multimodal Document Processing RAG with LangChain 🌟

✨ Features

🗂️ Upload File Processing:

🛠️ Milvus Integration:

🔍 Query Interface:

🛠️ Installation

🔧 Prerequisites

📥 Fork and Clone the Repository

📦 Install Dependencies

🚀 Usage

🖥️ Start the Application

🔄 Application Modes

📤 Upload Files:

❓ Query:

📁 File Structure

`app.py`

`utils/`

📤 Uploading a File

❓ Querying the Database

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
Dataset		Dataset
Images		Images
utils		utils
README.md		README.md
app.py		app.py
milvus_database.db		milvus_database.db
requirements.txt		requirements.txt
test.py		test.py

pacificrm/RAG-Architecture

Folders and files

Latest commit

History

Repository files navigation

🌟 Multimodal Document Processing RAG with LangChain 🌟

✨ Features

🗂️ Upload File Processing:

🛠️ Milvus Integration:

🔍 Query Interface:

🛠️ Installation

🔧 Prerequisites

📥 Fork and Clone the Repository

📦 Install Dependencies

🚀 Usage

🖥️ Start the Application

🔄 Application Modes

📤 Upload Files:

❓ Query:

📁 File Structure

app.py

utils/

📤 Uploading a File

❓ Querying the Database

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`app.py`

`utils/`

Packages