Skip to content

Commit

Permalink
No OCR product (#2)
Browse files Browse the repository at this point in the history
  • Loading branch information
truskovskiyk authored Jan 10, 2025
1 parent d56ab2d commit 249cc8e
Show file tree
Hide file tree
Showing 77 changed files with 7,455 additions and 914 deletions.
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: Bug Report
about: Create a report to help us improve
title: '[BUG] '
labels: bug
assignees: ''
---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment (please complete the following information):**
- OS: [e.g. iOS]
- Browser: [e.g. chrome, safari]
- Version: [e.g. 22]

**Additional context**
Add any other context about the problem here.
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: Feature request
about: Suggest an idea for this project
title: '[FEATURE]'
labels: enhancement
assignees: ''
---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
74 changes: 74 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: CI

on:
push:
branches:
- main
- no-ocr-dev
pull_request:
branches:
- main

jobs:
docker-build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:

- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}


- name: Build and push docker image UI
uses: docker/build-push-action@v6
with:
context: no-ocr-ui
push: true
tags: ghcr.io/kyryl-opens-ml/no-ocr-ui:latest
build-args: |
VITE_SUPABASE_URL=${{ secrets.VITE_SUPABASE_URL }}
VITE_SUPABASE_ANON_KEY=${{ secrets.VITE_SUPABASE_ANON_KEY }}
VITE_REACT_APP_API_URI=${{ secrets.VITE_REACT_APP_API_URI }}
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache,mode=max

- name: Build and push docker image API
uses: docker/build-push-action@v6
with:
context: no-ocr-api
push: true
tags: ghcr.io/kyryl-opens-ml/no-ocr-api:latest
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache,mode=max

deploy:
runs-on: ubuntu-latest
needs: [docker-build]
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install Railway
run: rm -rf package-lock.json && npm i -g @railway/cli

- name: Deploy UI
run: railway redeploy --service no-ocr-ui --yes
env:
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}

- name: Deploy API
run: railway redeploy --service no-ocr-api --yes
env:
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,7 @@ README.p.md
colpali/
data/
.DS_Store

no-ocr-api/storage
example/
no-ocr-api/vllm_cache/
RDEV.md
185 changes: 151 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,172 @@
# Vision Retrieval
# No OCR

## Presentation
A simple tool for exploring documents with AI, no fancy text extraction required. Just upload your files, then quickly search or ask questions about content across multiple collections.

https://docs.google.com/presentation/d/1LY3KxUjuLAoCvKh9UyXtQupqiSTi_BFNUQ_Cdl6-o_g/edit#slide=id.g2f6ce2a35cc_0_103
## Release blog with details

Here is a blog with release details about this project: [No-OCR Product](https://kyrylai.com/2025/01/10/no-ocr-product/)

## TLDR
## Demo

![alt text](./docs/tldr.png)
Here's a quick GIF demonstrating the basic flow of using No OCR:

## Setup
![No OCR Flow](./docs/flow.gif)

```
pip install modal
modal setup
```
> **Table of Contents**
> 1. [Overview](#overview)
> 2. [Key Features](#key-features)
> 3. [Architecture](#architecture)
> 4. [Flow](#flow)
> 5. [Roadmap](#roadmap)
> 6. [Prerequisites](#prerequisites)
> 7. [Dev Installation](#dev-installation)
## Run
## Overview

```
modal run vision_retrieval/infra.py
```
The core purpose of "No OCR" is to simplify AI-based PDF processing:
- Process and store PDF pages without relying on OCR.
- Perform text and/or visual queries using modern embeddings.
- Use open source models for advanced question-answering on document-based diagrams, text, and more.

## Deploy
Key technologies:
- React-based front end (no-ocr-ui) for uploading, managing, and searching documents.
- Python-based API (no-ocr-api) that coordinates ingestion, indexing, and searching.
- Qdrant for efficient vector search and retrieval.
- ColPali & Qwen2-VL handle inference tasks (both text and vision-based).

## Key Features

```
modal deploy vision_retrieval/infra.py
```
- Create and manage PDF/document collections, also referred to as "cases".
- Automated ingestion to build Hugging Face-style datasets (HF_Dataset).
- Vector-based search over PDF pages (and relevant images) in Qdrant.
- Visual question-answering on images and diagrams via Qwen2-VL.
- Deployable via Docker for both the backend (Python) and UI (React).

## Develop
## Architecture

```
modal shell vision_retrieval/infra.py
```
Below is a high-level workflow overview:

## Orchestrate
![Architecture](./docs/architecture.png)

## Flow

```
pip install dagster dagster-webserver -U
dagster dev -f vision_retrieval/pipeline.py -p 3000 -h 0.0.0.0
```
Create case:

```mermaid
sequenceDiagram
participant User
participant no-ocr-ui (CreateCase)
participant no-ocr-api
participant HF_Dataset
participant IngestClient
participant Qdrant
User->>no-ocr-ui (CreateCase): Upload PDFs & specify case name
no-ocr-ui (CreateCase)->>no-ocr-api: POST /create_case with PDFs
no-ocr-api->>no-ocr-api: Save PDFs to local storage
no-ocr-api->>no-ocr-api: Spawn background task (process_case)
no-ocr-api->>HF_Dataset: Convert PDFs to HF dataset
HF_Dataset-->>no-ocr-api: Return dataset
no-ocr-api->>IngestClient: Ingest dataset
IngestClient->>Qdrant: Create collection & upload points
Qdrant-->>IngestClient: Acknowledge ingestion
IngestClient-->>no-ocr-api: Done ingestion
no-ocr-api->>no-ocr-api: Mark case status as 'done'
no-ocr-api-->>no-ocr-ui (CreateCase): Return creation response
no-ocr-ui (CreateCase)-->>User: Display success message
```

## References
Search:

```mermaid
sequenceDiagram
participant User
participant no-ocr-ui
participant SearchClient
participant Qdrant
participant HF_Dataset
participant VLLM
User->>no-ocr-ui: Enter search query and select case
no-ocr-ui->>SearchClient: Search images by text
SearchClient->>Qdrant: Query collection with text embedding
Qdrant-->>SearchClient: Return search results
SearchClient-->>no-ocr-ui: Provide search results
no-ocr-ui->>HF_Dataset: Load dataset for collection
HF_Dataset-->>no-ocr-ui: Return dataset
no-ocr-ui->>VLLM: Process images with VLLM
VLLM-->>no-ocr-ui: Return VLLM output
no-ocr-ui-->>User: Display search results and VLLM output
```

- [ColPali](https://arxiv.org/abs/2407.01449)
- [LanceDB](https://lancedb.com/)
- [ModalLab](https://modal.com/)
- [Dagster](https://dagster.io/)
- [Beyond Text: The Rise of Vision-Driven Document Retrieval for RAG](https://blog.vespa.ai/the-rise-of-vision-driven-document-retrieval-for-rag/)
- [PDF Retrieval with Vision Language Models](https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/)
## Roadmap

- Better models for reasoning and retrieval 72B and QVQ.
- Agentic workflows - go beyond search and toward complete peace of work.
- Training models per case - turn your workflow into data moat and train unique models.
- UI/UX improvement - simplify, simplify, simplify.


## Prerequisites
- Python 3.x
- Node.js 18.x
- Docker (optional for containerized deployments)
- Superbase
- Create an account at https://app.supabase.io/
- Create a `.env` file in the `no-ocr-ui` directory
- Add the following variables to the `.env` file:
```
VITE_SUPABASE_URL=""
VITE_SUPABASE_ANON_KEY=""
VITE_REACT_APP_API_URI=""
```
- Modal
- Create an account at https://modal.com/
- Deploy models:
```bash
pip install modal
modal setup
modal run no-ocr-llms/llm_serving_load_models.py --model-name Qwen/Qwen2-VL-7B-Instruct --model-revision 51c47430f97dd7c74aa1fa6825e68a813478097f
modal run no-ocr-llms/llm_serving_load_models.py --model-name vidore/colqwen2-v1.0-merged --model-revision 364a4f5df97231e233e15cbbaf0b9dbe352ba92c
modal deploy no-ocr-llms/llm_serving.py
modal deploy no-ocr-llms/llm_serving_colpali.py
```
- Create a `.env` file in the `no-ocr-api` directory
- Update the environment variables.
## Dev Installation
1. Clone the repository:
```bash
git clone https://github.com/kyryl-opens-ml/no-ocr
```

2. (API) Install dependencies:
```bash
cd no-ocr-api
pip install -r requirements.txt
```

2. (API) Run server:
```bash
cd no-ocr-api
fastapi dev api.py
```

4. (UI) Install dependencies:
```bash
cd no-ocr-ui
npm install
```
4. (UI) Run UI:
```bash
cd no-ocr-ui
npm run dev
```
5. (Qdrant) Run qdrant
```bash
docker run -p 6333:6333 qdrant/qdrant:v1.12.5
```
43 changes: 43 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
version: '3.8'

services:
ui:
build:
context: ./no-ocr-ui
dockerfile: Dockerfile
args:
VITE_SUPABASE_URL: "https://cdazhclrvpqparjhcihs.supabase.co"
VITE_SUPABASE_ANON_KEY: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImNkYXpoY2xydnBxcGFyamhjaWhzIiwicm9sZSI6ImFub24iLCJpYXQiOjE3MzQxMzE3MjMsImV4cCI6MjA0OTcwNzcyM30.Hl1CGJVLG0awBGtZXpNNYZfZ8VWWG31diffcQqbZozk"
VITE_REACT_APP_API_URI: "http://localhost:8000"
env_file:
- ./no-ocr-ui/.env
ports:
- "5173:5173"
depends_on:
- api

api:
build:
context: ./no-ocr-api
dockerfile: Dockerfile
env_file:
- ./no-ocr-api/.env
volumes:
- api-storage:/app/storage
ports:
- "8000:8000"
depends_on:
- qdrant
environment:
QDRANT_HOST: "qdrant"

qdrant:
image: qdrant/qdrant:v1.12.5
volumes:
- qdrant-storage:/qdrant/storage
ports:
- "6333:6333"

volumes:
api-storage:
qdrant-storage:
Binary file added docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/create-case.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/flow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/search-case.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/tldr.png
Binary file not shown.
Loading

0 comments on commit 249cc8e

Please sign in to comment.