Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No OCR product #2

Merged
merged 46 commits into from
Jan 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
604a032
No OCR UI
truskovskiyk Dec 20, 2024
f148928
remvoe https
truskovskiyk Dec 20, 2024
c480d1f
fix auth
truskovskiyk Dec 20, 2024
b3c2d3f
fix auth
truskovskiyk Dec 20, 2024
3b5ab76
never do this again
truskovskiyk Dec 20, 2024
0b1cf46
never do this again: OKAY
truskovskiyk Dec 20, 2024
1f5c7ef
Add docker file
truskovskiyk Dec 20, 2024
e40c062
api
truskovskiyk Dec 20, 2024
0818298
api
truskovskiyk Dec 21, 2024
cc0b934
api in docker
truskovskiyk Dec 21, 2024
d2ac309
Create
truskovskiyk Dec 21, 2024
f95c1dc
connect ui + api
truskovskiyk Dec 21, 2024
8009712
connect ui + api
truskovskiyk Dec 22, 2024
168e1b7
never do this again: OKAY
truskovskiyk Dec 22, 2024
ad019fd
add images
truskovskiyk Dec 22, 2024
dcdd95c
add vllm call
truskovskiyk Dec 25, 2024
b7b9fa5
layout
truskovskiyk Dec 25, 2024
5925598
save state
truskovskiyk Dec 25, 2024
17f3fd9
layout
truskovskiyk Dec 25, 2024
28d0526
search
truskovskiyk Dec 25, 2024
70e9d2d
layout
truskovskiyk Dec 25, 2024
1bd8977
search page
truskovskiyk Dec 25, 2024
1850c9f
rename
truskovskiyk Dec 25, 2024
07044ee
type error
truskovskiyk Dec 25, 2024
2fedb3a
rename
truskovskiyk Dec 25, 2024
55aa19b
upload
truskovskiyk Dec 25, 2024
c8fb00e
rename
truskovskiyk Dec 25, 2024
c15c739
move LLMs
truskovskiyk Jan 1, 2025
ca196e4
about new
truskovskiyk Jan 1, 2025
2e84c26
about new
truskovskiyk Jan 1, 2025
49f5023
auth
truskovskiyk Jan 4, 2025
69101c1
auth
truskovskiyk Jan 4, 2025
4130ced
add measurment
truskovskiyk Jan 4, 2025
bbfcdaa
logs
truskovskiyk Jan 4, 2025
ac8e7e9
multi tenancy
truskovskiyk Jan 6, 2025
8f7dcfa
multi tenancy
truskovskiyk Jan 7, 2025
5858436
README
truskovskiyk Jan 7, 2025
79658f8
add drag and drop
truskovskiyk Jan 8, 2025
77e8238
add colors
truskovskiyk Jan 8, 2025
6a89c76
add redirect
truskovskiyk Jan 8, 2025
9770a6e
fix redirect
truskovskiyk Jan 8, 2025
217c4bc
fix redirect
truskovskiyk Jan 8, 2025
9d39b34
add gif
truskovskiyk Jan 9, 2025
7947dd9
add tempaltes
truskovskiyk Jan 10, 2025
04bc820
readme
truskovskiyk Jan 10, 2025
57f2dbb
update readme
truskovskiyk Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: Bug Report
about: Create a report to help us improve
title: '[BUG] '
labels: bug
assignees: ''
---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment (please complete the following information):**
- OS: [e.g. iOS]
- Browser: [e.g. chrome, safari]
- Version: [e.g. 22]

**Additional context**
Add any other context about the problem here.
19 changes: 19 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
name: Feature request
about: Suggest an idea for this project
title: '[FEATURE]'
labels: enhancement
assignees: ''
---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
74 changes: 74 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
name: CI

on:
push:
branches:
- main
- no-ocr-dev
pull_request:
branches:
- main

jobs:
docker-build:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:

- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}


- name: Build and push docker image UI
uses: docker/build-push-action@v6
with:
context: no-ocr-ui
push: true
tags: ghcr.io/kyryl-opens-ml/no-ocr-ui:latest
build-args: |
VITE_SUPABASE_URL=${{ secrets.VITE_SUPABASE_URL }}
VITE_SUPABASE_ANON_KEY=${{ secrets.VITE_SUPABASE_ANON_KEY }}
VITE_REACT_APP_API_URI=${{ secrets.VITE_REACT_APP_API_URI }}
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache,mode=max

- name: Build and push docker image API
uses: docker/build-push-action@v6
with:
context: no-ocr-api
push: true
tags: ghcr.io/kyryl-opens-ml/no-ocr-api:latest
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache,mode=max

deploy:
runs-on: ubuntu-latest
needs: [docker-build]
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Install Railway
run: rm -rf package-lock.json && npm i -g @railway/cli

- name: Deploy UI
run: railway redeploy --service no-ocr-ui --yes
env:
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}

- name: Deploy API
run: railway redeploy --service no-ocr-api --yes
env:
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }}
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -164,4 +164,7 @@ README.p.md
colpali/
data/
.DS_Store

no-ocr-api/storage
example/
no-ocr-api/vllm_cache/
RDEV.md
185 changes: 151 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,172 @@
# Vision Retrieval
# No OCR

## Presentation
A simple tool for exploring documents with AI, no fancy text extraction required. Just upload your files, then quickly search or ask questions about content across multiple collections.

https://docs.google.com/presentation/d/1LY3KxUjuLAoCvKh9UyXtQupqiSTi_BFNUQ_Cdl6-o_g/edit#slide=id.g2f6ce2a35cc_0_103
## Release blog with details

Here is a blog with release details about this project: [No-OCR Product](https://kyrylai.com/2025/01/10/no-ocr-product/)

## TLDR
## Demo

![alt text](./docs/tldr.png)
Here's a quick GIF demonstrating the basic flow of using No OCR:

## Setup
![No OCR Flow](./docs/flow.gif)

```
pip install modal
modal setup
```
> **Table of Contents**
> 1. [Overview](#overview)
> 2. [Key Features](#key-features)
> 3. [Architecture](#architecture)
> 4. [Flow](#flow)
> 5. [Roadmap](#roadmap)
> 6. [Prerequisites](#prerequisites)
> 7. [Dev Installation](#dev-installation)

## Run
## Overview

```
modal run vision_retrieval/infra.py
```
The core purpose of "No OCR" is to simplify AI-based PDF processing:
- Process and store PDF pages without relying on OCR.
- Perform text and/or visual queries using modern embeddings.
- Use open source models for advanced question-answering on document-based diagrams, text, and more.

## Deploy
Key technologies:
- React-based front end (no-ocr-ui) for uploading, managing, and searching documents.
- Python-based API (no-ocr-api) that coordinates ingestion, indexing, and searching.
- Qdrant for efficient vector search and retrieval.
- ColPali & Qwen2-VL handle inference tasks (both text and vision-based).

## Key Features

```
modal deploy vision_retrieval/infra.py
```
- Create and manage PDF/document collections, also referred to as "cases".
- Automated ingestion to build Hugging Face-style datasets (HF_Dataset).
- Vector-based search over PDF pages (and relevant images) in Qdrant.
- Visual question-answering on images and diagrams via Qwen2-VL.
- Deployable via Docker for both the backend (Python) and UI (React).

## Develop
## Architecture

```
modal shell vision_retrieval/infra.py
```
Below is a high-level workflow overview:

## Orchestrate
![Architecture](./docs/architecture.png)

## Flow

```
pip install dagster dagster-webserver -U
dagster dev -f vision_retrieval/pipeline.py -p 3000 -h 0.0.0.0
```
Create case:

```mermaid
sequenceDiagram
participant User
participant no-ocr-ui (CreateCase)
participant no-ocr-api
participant HF_Dataset
participant IngestClient
participant Qdrant

User->>no-ocr-ui (CreateCase): Upload PDFs & specify case name
no-ocr-ui (CreateCase)->>no-ocr-api: POST /create_case with PDFs
no-ocr-api->>no-ocr-api: Save PDFs to local storage
no-ocr-api->>no-ocr-api: Spawn background task (process_case)
no-ocr-api->>HF_Dataset: Convert PDFs to HF dataset
HF_Dataset-->>no-ocr-api: Return dataset
no-ocr-api->>IngestClient: Ingest dataset
IngestClient->>Qdrant: Create collection & upload points
Qdrant-->>IngestClient: Acknowledge ingestion
IngestClient-->>no-ocr-api: Done ingestion
no-ocr-api->>no-ocr-api: Mark case status as 'done'
no-ocr-api-->>no-ocr-ui (CreateCase): Return creation response
no-ocr-ui (CreateCase)-->>User: Display success message
```

## References
Search:

```mermaid
sequenceDiagram
participant User
participant no-ocr-ui
participant SearchClient
participant Qdrant
participant HF_Dataset
participant VLLM

User->>no-ocr-ui: Enter search query and select case
no-ocr-ui->>SearchClient: Search images by text
SearchClient->>Qdrant: Query collection with text embedding
Qdrant-->>SearchClient: Return search results
SearchClient-->>no-ocr-ui: Provide search results
no-ocr-ui->>HF_Dataset: Load dataset for collection
HF_Dataset-->>no-ocr-ui: Return dataset
no-ocr-ui->>VLLM: Process images with VLLM
VLLM-->>no-ocr-ui: Return VLLM output
no-ocr-ui-->>User: Display search results and VLLM output
```

- [ColPali](https://arxiv.org/abs/2407.01449)
- [LanceDB](https://lancedb.com/)
- [ModalLab](https://modal.com/)
- [Dagster](https://dagster.io/)
- [Beyond Text: The Rise of Vision-Driven Document Retrieval for RAG](https://blog.vespa.ai/the-rise-of-vision-driven-document-retrieval-for-rag/)
- [PDF Retrieval with Vision Language Models](https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/)
## Roadmap

- Better models for reasoning and retrieval 72B and QVQ.
- Agentic workflows - go beyond search and toward complete peace of work.
- Training models per case - turn your workflow into data moat and train unique models.
- UI/UX improvement - simplify, simplify, simplify.


## Prerequisites
- Python 3.x
- Node.js 18.x
- Docker (optional for containerized deployments)
- Superbase
- Create an account at https://app.supabase.io/
- Create a `.env` file in the `no-ocr-ui` directory
- Add the following variables to the `.env` file:
```
VITE_SUPABASE_URL=""
VITE_SUPABASE_ANON_KEY=""
VITE_REACT_APP_API_URI=""
```
- Modal
- Create an account at https://modal.com/
- Deploy models:
```bash
pip install modal
modal setup

modal run no-ocr-llms/llm_serving_load_models.py --model-name Qwen/Qwen2-VL-7B-Instruct --model-revision 51c47430f97dd7c74aa1fa6825e68a813478097f
modal run no-ocr-llms/llm_serving_load_models.py --model-name vidore/colqwen2-v1.0-merged --model-revision 364a4f5df97231e233e15cbbaf0b9dbe352ba92c


modal deploy no-ocr-llms/llm_serving.py
modal deploy no-ocr-llms/llm_serving_colpali.py
```
- Create a `.env` file in the `no-ocr-api` directory
- Update the environment variables.

## Dev Installation

1. Clone the repository:
```bash
git clone https://github.com/kyryl-opens-ml/no-ocr
```

2. (API) Install dependencies:
```bash
cd no-ocr-api
pip install -r requirements.txt
```

2. (API) Run server:
```bash
cd no-ocr-api
fastapi dev api.py
```

4. (UI) Install dependencies:
```bash
cd no-ocr-ui
npm install
```
4. (UI) Run UI:
```bash
cd no-ocr-ui
npm run dev
```
5. (Qdrant) Run qdrant
```bash
docker run -p 6333:6333 qdrant/qdrant:v1.12.5
```
43 changes: 43 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
version: '3.8'

services:
ui:
build:
context: ./no-ocr-ui
dockerfile: Dockerfile
args:
VITE_SUPABASE_URL: "https://cdazhclrvpqparjhcihs.supabase.co"
VITE_SUPABASE_ANON_KEY: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImNkYXpoY2xydnBxcGFyamhjaWhzIiwicm9sZSI6ImFub24iLCJpYXQiOjE3MzQxMzE3MjMsImV4cCI6MjA0OTcwNzcyM30.Hl1CGJVLG0awBGtZXpNNYZfZ8VWWG31diffcQqbZozk"
VITE_REACT_APP_API_URI: "http://localhost:8000"
env_file:
- ./no-ocr-ui/.env
ports:
- "5173:5173"
depends_on:
- api

api:
build:
context: ./no-ocr-api
dockerfile: Dockerfile
env_file:
- ./no-ocr-api/.env
volumes:
- api-storage:/app/storage
ports:
- "8000:8000"
depends_on:
- qdrant
environment:
QDRANT_HOST: "qdrant"

qdrant:
image: qdrant/qdrant:v1.12.5
volumes:
- qdrant-storage:/qdrant/storage
ports:
- "6333:6333"

volumes:
api-storage:
qdrant-storage:
Binary file added docs/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/create-case.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/flow.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/search-case.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/tldr.png
Binary file not shown.
Loading
Loading