No OCR product (#2)

kyryl-opens-ml · Jan 10, 2025 · 249cc8e · 249cc8e
1 parent d56ab2d
commit 249cc8e
Show file tree

Hide file tree

Showing 77 changed files with 7,455 additions and 914 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,31 @@
+---
+name: Bug Report
+about: Create a report to help us improve
+title: '[BUG] '
+labels: bug
+assignees: ''
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+1. Go to '...'
+2. Click on '....'
+3. Scroll down to '....'
+4. See error
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Environment (please complete the following information):**
+- OS: [e.g. iOS]
+- Browser: [e.g. chrome, safari]
+- Version: [e.g. 22]
+
+**Additional context**
+Add any other context about the problem here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,19 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: '[FEATURE]'
+labels: enhancement
+assignees: ''
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,74 @@
+name: CI
+
+on:
+  push:
+    branches:
+      - main
+      - no-ocr-dev
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  docker-build:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      packages: write
+    steps:
+
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@v2
+
+      - name: Log in to the Container registry
+        uses: docker/login-action@v3
+        with:
+          registry: ghcr.io
+          username: ${{ github.actor }}
+          password: ${{ secrets.GITHUB_TOKEN }}
+
+
+      - name: Build and push docker image UI
+        uses: docker/build-push-action@v6
+        with:
+          context: no-ocr-ui
+          push: true
+          tags: ghcr.io/kyryl-opens-ml/no-ocr-ui:latest
+          build-args: |
+            VITE_SUPABASE_URL=${{ secrets.VITE_SUPABASE_URL }}
+            VITE_SUPABASE_ANON_KEY=${{ secrets.VITE_SUPABASE_ANON_KEY }}
+            VITE_REACT_APP_API_URI=${{ secrets.VITE_REACT_APP_API_URI }}
+          cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache
+          cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache,mode=max
+
+      - name: Build and push docker image API
+        uses: docker/build-push-action@v6
+        with:
+          context: no-ocr-api
+          push: true
+          tags: ghcr.io/kyryl-opens-ml/no-ocr-api:latest
+          cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache
+          cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache,mode=max
+
+  deploy:
+    runs-on: ubuntu-latest
+    needs: [docker-build]
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Install Railway
+        run: rm -rf package-lock.json && npm i -g @railway/cli
+
+      - name: Deploy UI
+        run: railway redeploy --service no-ocr-ui --yes
+        env:
+          RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }} 
+
+      - name: Deploy API 
+        run: railway redeploy --service no-ocr-api --yes
+        env:
+          RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}           
diff --git a/.gitignore b/.gitignore
@@ -164,4 +164,7 @@ README.p.md
 colpali/
 data/
 .DS_Store
-
+no-ocr-api/storage
+example/
+no-ocr-api/vllm_cache/
+RDEV.md
diff --git a/README.md b/README.md
@@ -1,55 +1,172 @@
-# Vision Retrieval
+# No OCR
 
-## Presentation
+A simple tool for exploring documents with AI, no fancy text extraction required. Just upload your files, then quickly search or ask questions about content across multiple collections.
 
-https://docs.google.com/presentation/d/1LY3KxUjuLAoCvKh9UyXtQupqiSTi_BFNUQ_Cdl6-o_g/edit#slide=id.g2f6ce2a35cc_0_103
+## Release blog with details 
 
+Here is a blog with release details about this project: [No-OCR Product](https://kyrylai.com/2025/01/10/no-ocr-product/)
 
-## TLDR
+## Demo
 
-![alt text](./docs/tldr.png)
+Here's a quick GIF demonstrating the basic flow of using No OCR:
 
-## Setup
+![No OCR Flow](./docs/flow.gif)
 
-```
-pip install modal
-modal setup
-```
+> **Table of Contents**
+> 1. [Overview](#overview)  
+> 2. [Key Features](#key-features)  
+> 3. [Architecture](#architecture)  
+> 4. [Flow](#flow)  
+> 5. [Roadmap](#roadmap)  
+> 6. [Prerequisites](#prerequisites)  
+> 7. [Dev Installation](#dev-installation)  
 
-## Run 
+## Overview
 
-```
-modal run vision_retrieval/infra.py
-```
+The core purpose of "No OCR" is to simplify AI-based PDF processing:
+- Process and store PDF pages without relying on OCR.  
+- Perform text and/or visual queries using modern embeddings.  
+- Use open source models for advanced question-answering on document-based diagrams, text, and more.
 
-## Deploy 
+Key technologies:
+- React-based front end (no-ocr-ui) for uploading, managing, and searching documents.  
+- Python-based API (no-ocr-api) that coordinates ingestion, indexing, and searching.  
+- Qdrant for efficient vector search and retrieval.  
+- ColPali & Qwen2-VL handle inference tasks (both text and vision-based).  
 
+## Key Features
 
-```
-modal deploy vision_retrieval/infra.py
-```
+- Create and manage PDF/document collections, also referred to as "cases".  
+- Automated ingestion to build Hugging Face-style datasets (HF_Dataset).  
+- Vector-based search over PDF pages (and relevant images) in Qdrant.  
+- Visual question-answering on images and diagrams via Qwen2-VL.  
+- Deployable via Docker for both the backend (Python) and UI (React).
 
-## Develop
+## Architecture
 
-```
-modal shell vision_retrieval/infra.py
-```
+Below is a high-level workflow overview:
 
-## Orchestrate
+![Architecture](./docs/architecture.png)
 
+## Flow
 
-```
-pip install dagster dagster-webserver -U
-dagster dev -f vision_retrieval/pipeline.py -p 3000 -h 0.0.0.0
-```
+Create case:
 
+```mermaid
+sequenceDiagram
+    participant User
+    participant no-ocr-ui (CreateCase)
+    participant no-ocr-api
+    participant HF_Dataset
+    participant IngestClient
+    participant Qdrant
 
+    User->>no-ocr-ui (CreateCase): Upload PDFs & specify case name
+    no-ocr-ui (CreateCase)->>no-ocr-api: POST /create_case with PDFs
+    no-ocr-api->>no-ocr-api: Save PDFs to local storage
+    no-ocr-api->>no-ocr-api: Spawn background task (process_case)
+    no-ocr-api->>HF_Dataset: Convert PDFs to HF dataset
+    HF_Dataset-->>no-ocr-api: Return dataset
+    no-ocr-api->>IngestClient: Ingest dataset
+    IngestClient->>Qdrant: Create collection & upload points
+    Qdrant-->>IngestClient: Acknowledge ingestion
+    IngestClient-->>no-ocr-api: Done ingestion
+    no-ocr-api->>no-ocr-api: Mark case status as 'done'
+    no-ocr-api-->>no-ocr-ui (CreateCase): Return creation response
+    no-ocr-ui (CreateCase)-->>User: Display success message
+```
 
-## References
+Search:
+
+```mermaid
+sequenceDiagram
+    participant User
+    participant no-ocr-ui
+    participant SearchClient
+    participant Qdrant
+    participant HF_Dataset
+    participant VLLM
+
+    User->>no-ocr-ui: Enter search query and select case
+    no-ocr-ui->>SearchClient: Search images by text
+    SearchClient->>Qdrant: Query collection with text embedding
+    Qdrant-->>SearchClient: Return search results
+    SearchClient-->>no-ocr-ui: Provide search results
+    no-ocr-ui->>HF_Dataset: Load dataset for collection
+    HF_Dataset-->>no-ocr-ui: Return dataset
+    no-ocr-ui->>VLLM: Process images with VLLM
+    VLLM-->>no-ocr-ui: Return VLLM output
+    no-ocr-ui-->>User: Display search results and VLLM output
+```
 
-- [ColPali](https://arxiv.org/abs/2407.01449)
-- [LanceDB](https://lancedb.com/)
-- [ModalLab](https://modal.com/)
-- [Dagster](https://dagster.io/)
-- [Beyond Text: The Rise of Vision-Driven Document Retrieval for RAG](https://blog.vespa.ai/the-rise-of-vision-driven-document-retrieval-for-rag/)
-- [PDF Retrieval with Vision Language Models](https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/)
+## Roadmap 
+
+- Better models for reasoning and retrieval 72B and QVQ.
+- Agentic workflows - go beyond search and toward complete peace of work.
+- Training models per case - turn your workflow into data moat and train unique models.
+- UI/UX improvement - simplify, simplify, simplify.
+
+
+## Prerequisites
+- Python 3.x
+- Node.js 18.x
+- Docker (optional for containerized deployments)
+- Superbase 
+  - Create an account at https://app.supabase.io/
+  - Create a `.env` file in the `no-ocr-ui` directory
+  - Add the following variables to the `.env` file:
+    ```
+    VITE_SUPABASE_URL=""
+    VITE_SUPABASE_ANON_KEY=""
+    VITE_REACT_APP_API_URI=""
+    ```
+- Modal 
+  - Create an account at https://modal.com/
+  - Deploy models:
+    ```bash
+    pip install modal
+    modal setup
+
+    modal run no-ocr-llms/llm_serving_load_models.py --model-name Qwen/Qwen2-VL-7B-Instruct --model-revision 51c47430f97dd7c74aa1fa6825e68a813478097f
+    modal run no-ocr-llms/llm_serving_load_models.py --model-name vidore/colqwen2-v1.0-merged --model-revision 364a4f5df97231e233e15cbbaf0b9dbe352ba92c
+
+
+    modal deploy no-ocr-llms/llm_serving.py
+    modal deploy no-ocr-llms/llm_serving_colpali.py
+    ```
+  - Create a `.env` file in the `no-ocr-api` directory
+  - Update the environment variables.
+
+## Dev Installation
+
+1. Clone the repository:
+   ```bash
+   git clone https://github.com/kyryl-opens-ml/no-ocr
+   ```
+
+2. (API) Install dependencies:
+   ```bash
+   cd no-ocr-api
+   pip install -r requirements.txt
+   ```
+
+2. (API) Run server:
+   ```bash
+   cd no-ocr-api
+   fastapi dev api.py
+   ```
+
+4. (UI) Install dependencies:
+   ```bash
+   cd no-ocr-ui
+   npm install
+   ```
+4. (UI) Run UI:
+   ```bash
+   cd no-ocr-ui
+   npm run dev
+   ```
+5. (Qdrant) Run qdrant
+   ```bash
+   docker run -p 6333:6333 qdrant/qdrant:v1.12.5
+   ```
diff --git a/docker-compose.yml b/docker-compose.yml
@@ -0,0 +1,43 @@
+version: '3.8'
+
+services:
+  ui:
+    build:
+      context: ./no-ocr-ui
+      dockerfile: Dockerfile
+      args:
+        VITE_SUPABASE_URL: "https://cdazhclrvpqparjhcihs.supabase.co"
+        VITE_SUPABASE_ANON_KEY: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImNkYXpoY2xydnBxcGFyamhjaWhzIiwicm9sZSI6ImFub24iLCJpYXQiOjE3MzQxMzE3MjMsImV4cCI6MjA0OTcwNzcyM30.Hl1CGJVLG0awBGtZXpNNYZfZ8VWWG31diffcQqbZozk"
+        VITE_REACT_APP_API_URI: "http://localhost:8000"
+    env_file:
+      - ./no-ocr-ui/.env
+    ports:
+      - "5173:5173"
+    depends_on:
+      - api
+
+  api:
+    build:
+      context: ./no-ocr-api
+      dockerfile: Dockerfile
+    env_file:
+      - ./no-ocr-api/.env
+    volumes:
+      - api-storage:/app/storage
+    ports:
+      - "8000:8000"
+    depends_on:
+      - qdrant
+    environment:
+      QDRANT_HOST: "qdrant"
+
+  qdrant:
+    image: qdrant/qdrant:v1.12.5
+    volumes:
+      - qdrant-storage:/qdrant/storage
+    ports:
+      - "6333:6333"
+
+volumes:
+  api-storage:
+  qdrant-storage:
diff --git a/docs/architecture.png b/docs/architecture.png
diff --git a/docs/create-case.png b/docs/create-case.png
diff --git a/docs/flow.gif b/docs/flow.gif
diff --git a/docs/search-case.png b/docs/search-case.png
diff --git a/docs/tldr.png b/docs/tldr.png