-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d56ab2d
commit 249cc8e
Showing
77 changed files
with
7,455 additions
and
914 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
name: Bug Report | ||
about: Create a report to help us improve | ||
title: '[BUG] ' | ||
labels: bug | ||
assignees: '' | ||
--- | ||
|
||
**Describe the bug** | ||
A clear and concise description of what the bug is. | ||
|
||
**To Reproduce** | ||
Steps to reproduce the behavior: | ||
1. Go to '...' | ||
2. Click on '....' | ||
3. Scroll down to '....' | ||
4. See error | ||
|
||
**Expected behavior** | ||
A clear and concise description of what you expected to happen. | ||
|
||
**Screenshots** | ||
If applicable, add screenshots to help explain your problem. | ||
|
||
**Environment (please complete the following information):** | ||
- OS: [e.g. iOS] | ||
- Browser: [e.g. chrome, safari] | ||
- Version: [e.g. 22] | ||
|
||
**Additional context** | ||
Add any other context about the problem here. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
--- | ||
name: Feature request | ||
about: Suggest an idea for this project | ||
title: '[FEATURE]' | ||
labels: enhancement | ||
assignees: '' | ||
--- | ||
|
||
**Is your feature request related to a problem? Please describe.** | ||
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] | ||
|
||
**Describe the solution you'd like** | ||
A clear and concise description of what you want to happen. | ||
|
||
**Describe alternatives you've considered** | ||
A clear and concise description of any alternative solutions or features you've considered. | ||
|
||
**Additional context** | ||
Add any other context or screenshots about the feature request here. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
name: CI | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
- no-ocr-dev | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
docker-build: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
contents: read | ||
packages: write | ||
steps: | ||
|
||
- name: Checkout repository | ||
uses: actions/checkout@v4 | ||
|
||
- name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
|
||
- name: Log in to the Container registry | ||
uses: docker/login-action@v3 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
|
||
|
||
- name: Build and push docker image UI | ||
uses: docker/build-push-action@v6 | ||
with: | ||
context: no-ocr-ui | ||
push: true | ||
tags: ghcr.io/kyryl-opens-ml/no-ocr-ui:latest | ||
build-args: | | ||
VITE_SUPABASE_URL=${{ secrets.VITE_SUPABASE_URL }} | ||
VITE_SUPABASE_ANON_KEY=${{ secrets.VITE_SUPABASE_ANON_KEY }} | ||
VITE_REACT_APP_API_URI=${{ secrets.VITE_REACT_APP_API_URI }} | ||
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache | ||
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-ui:buildcache,mode=max | ||
|
||
- name: Build and push docker image API | ||
uses: docker/build-push-action@v6 | ||
with: | ||
context: no-ocr-api | ||
push: true | ||
tags: ghcr.io/kyryl-opens-ml/no-ocr-api:latest | ||
cache-from: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache | ||
cache-to: type=registry,ref=ghcr.io/kyryl-opens-ml/no-ocr-api:buildcache,mode=max | ||
|
||
deploy: | ||
runs-on: ubuntu-latest | ||
needs: [docker-build] | ||
steps: | ||
- name: Checkout | ||
uses: actions/checkout@v4 | ||
|
||
- name: Install Railway | ||
run: rm -rf package-lock.json && npm i -g @railway/cli | ||
|
||
- name: Deploy UI | ||
run: railway redeploy --service no-ocr-ui --yes | ||
env: | ||
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }} | ||
|
||
- name: Deploy API | ||
run: railway redeploy --service no-ocr-api --yes | ||
env: | ||
RAILWAY_TOKEN: ${{ secrets.RAILWAY_TOKEN }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -164,4 +164,7 @@ README.p.md | |
colpali/ | ||
data/ | ||
.DS_Store | ||
|
||
no-ocr-api/storage | ||
example/ | ||
no-ocr-api/vllm_cache/ | ||
RDEV.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,172 @@ | ||
# Vision Retrieval | ||
# No OCR | ||
|
||
## Presentation | ||
A simple tool for exploring documents with AI, no fancy text extraction required. Just upload your files, then quickly search or ask questions about content across multiple collections. | ||
|
||
https://docs.google.com/presentation/d/1LY3KxUjuLAoCvKh9UyXtQupqiSTi_BFNUQ_Cdl6-o_g/edit#slide=id.g2f6ce2a35cc_0_103 | ||
## Release blog with details | ||
|
||
Here is a blog with release details about this project: [No-OCR Product](https://kyrylai.com/2025/01/10/no-ocr-product/) | ||
|
||
## TLDR | ||
## Demo | ||
|
||
![alt text](./docs/tldr.png) | ||
Here's a quick GIF demonstrating the basic flow of using No OCR: | ||
|
||
## Setup | ||
![No OCR Flow](./docs/flow.gif) | ||
|
||
``` | ||
pip install modal | ||
modal setup | ||
``` | ||
> **Table of Contents** | ||
> 1. [Overview](#overview) | ||
> 2. [Key Features](#key-features) | ||
> 3. [Architecture](#architecture) | ||
> 4. [Flow](#flow) | ||
> 5. [Roadmap](#roadmap) | ||
> 6. [Prerequisites](#prerequisites) | ||
> 7. [Dev Installation](#dev-installation) | ||
## Run | ||
## Overview | ||
|
||
``` | ||
modal run vision_retrieval/infra.py | ||
``` | ||
The core purpose of "No OCR" is to simplify AI-based PDF processing: | ||
- Process and store PDF pages without relying on OCR. | ||
- Perform text and/or visual queries using modern embeddings. | ||
- Use open source models for advanced question-answering on document-based diagrams, text, and more. | ||
|
||
## Deploy | ||
Key technologies: | ||
- React-based front end (no-ocr-ui) for uploading, managing, and searching documents. | ||
- Python-based API (no-ocr-api) that coordinates ingestion, indexing, and searching. | ||
- Qdrant for efficient vector search and retrieval. | ||
- ColPali & Qwen2-VL handle inference tasks (both text and vision-based). | ||
|
||
## Key Features | ||
|
||
``` | ||
modal deploy vision_retrieval/infra.py | ||
``` | ||
- Create and manage PDF/document collections, also referred to as "cases". | ||
- Automated ingestion to build Hugging Face-style datasets (HF_Dataset). | ||
- Vector-based search over PDF pages (and relevant images) in Qdrant. | ||
- Visual question-answering on images and diagrams via Qwen2-VL. | ||
- Deployable via Docker for both the backend (Python) and UI (React). | ||
|
||
## Develop | ||
## Architecture | ||
|
||
``` | ||
modal shell vision_retrieval/infra.py | ||
``` | ||
Below is a high-level workflow overview: | ||
|
||
## Orchestrate | ||
![Architecture](./docs/architecture.png) | ||
|
||
## Flow | ||
|
||
``` | ||
pip install dagster dagster-webserver -U | ||
dagster dev -f vision_retrieval/pipeline.py -p 3000 -h 0.0.0.0 | ||
``` | ||
Create case: | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant User | ||
participant no-ocr-ui (CreateCase) | ||
participant no-ocr-api | ||
participant HF_Dataset | ||
participant IngestClient | ||
participant Qdrant | ||
User->>no-ocr-ui (CreateCase): Upload PDFs & specify case name | ||
no-ocr-ui (CreateCase)->>no-ocr-api: POST /create_case with PDFs | ||
no-ocr-api->>no-ocr-api: Save PDFs to local storage | ||
no-ocr-api->>no-ocr-api: Spawn background task (process_case) | ||
no-ocr-api->>HF_Dataset: Convert PDFs to HF dataset | ||
HF_Dataset-->>no-ocr-api: Return dataset | ||
no-ocr-api->>IngestClient: Ingest dataset | ||
IngestClient->>Qdrant: Create collection & upload points | ||
Qdrant-->>IngestClient: Acknowledge ingestion | ||
IngestClient-->>no-ocr-api: Done ingestion | ||
no-ocr-api->>no-ocr-api: Mark case status as 'done' | ||
no-ocr-api-->>no-ocr-ui (CreateCase): Return creation response | ||
no-ocr-ui (CreateCase)-->>User: Display success message | ||
``` | ||
|
||
## References | ||
Search: | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant User | ||
participant no-ocr-ui | ||
participant SearchClient | ||
participant Qdrant | ||
participant HF_Dataset | ||
participant VLLM | ||
User->>no-ocr-ui: Enter search query and select case | ||
no-ocr-ui->>SearchClient: Search images by text | ||
SearchClient->>Qdrant: Query collection with text embedding | ||
Qdrant-->>SearchClient: Return search results | ||
SearchClient-->>no-ocr-ui: Provide search results | ||
no-ocr-ui->>HF_Dataset: Load dataset for collection | ||
HF_Dataset-->>no-ocr-ui: Return dataset | ||
no-ocr-ui->>VLLM: Process images with VLLM | ||
VLLM-->>no-ocr-ui: Return VLLM output | ||
no-ocr-ui-->>User: Display search results and VLLM output | ||
``` | ||
|
||
- [ColPali](https://arxiv.org/abs/2407.01449) | ||
- [LanceDB](https://lancedb.com/) | ||
- [ModalLab](https://modal.com/) | ||
- [Dagster](https://dagster.io/) | ||
- [Beyond Text: The Rise of Vision-Driven Document Retrieval for RAG](https://blog.vespa.ai/the-rise-of-vision-driven-document-retrieval-for-rag/) | ||
- [PDF Retrieval with Vision Language Models](https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/) | ||
## Roadmap | ||
|
||
- Better models for reasoning and retrieval 72B and QVQ. | ||
- Agentic workflows - go beyond search and toward complete peace of work. | ||
- Training models per case - turn your workflow into data moat and train unique models. | ||
- UI/UX improvement - simplify, simplify, simplify. | ||
|
||
|
||
## Prerequisites | ||
- Python 3.x | ||
- Node.js 18.x | ||
- Docker (optional for containerized deployments) | ||
- Superbase | ||
- Create an account at https://app.supabase.io/ | ||
- Create a `.env` file in the `no-ocr-ui` directory | ||
- Add the following variables to the `.env` file: | ||
``` | ||
VITE_SUPABASE_URL="" | ||
VITE_SUPABASE_ANON_KEY="" | ||
VITE_REACT_APP_API_URI="" | ||
``` | ||
- Modal | ||
- Create an account at https://modal.com/ | ||
- Deploy models: | ||
```bash | ||
pip install modal | ||
modal setup | ||
modal run no-ocr-llms/llm_serving_load_models.py --model-name Qwen/Qwen2-VL-7B-Instruct --model-revision 51c47430f97dd7c74aa1fa6825e68a813478097f | ||
modal run no-ocr-llms/llm_serving_load_models.py --model-name vidore/colqwen2-v1.0-merged --model-revision 364a4f5df97231e233e15cbbaf0b9dbe352ba92c | ||
modal deploy no-ocr-llms/llm_serving.py | ||
modal deploy no-ocr-llms/llm_serving_colpali.py | ||
``` | ||
- Create a `.env` file in the `no-ocr-api` directory | ||
- Update the environment variables. | ||
## Dev Installation | ||
1. Clone the repository: | ||
```bash | ||
git clone https://github.com/kyryl-opens-ml/no-ocr | ||
``` | ||
|
||
2. (API) Install dependencies: | ||
```bash | ||
cd no-ocr-api | ||
pip install -r requirements.txt | ||
``` | ||
|
||
2. (API) Run server: | ||
```bash | ||
cd no-ocr-api | ||
fastapi dev api.py | ||
``` | ||
|
||
4. (UI) Install dependencies: | ||
```bash | ||
cd no-ocr-ui | ||
npm install | ||
``` | ||
4. (UI) Run UI: | ||
```bash | ||
cd no-ocr-ui | ||
npm run dev | ||
``` | ||
5. (Qdrant) Run qdrant | ||
```bash | ||
docker run -p 6333:6333 qdrant/qdrant:v1.12.5 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
version: '3.8' | ||
|
||
services: | ||
ui: | ||
build: | ||
context: ./no-ocr-ui | ||
dockerfile: Dockerfile | ||
args: | ||
VITE_SUPABASE_URL: "https://cdazhclrvpqparjhcihs.supabase.co" | ||
VITE_SUPABASE_ANON_KEY: "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6ImNkYXpoY2xydnBxcGFyamhjaWhzIiwicm9sZSI6ImFub24iLCJpYXQiOjE3MzQxMzE3MjMsImV4cCI6MjA0OTcwNzcyM30.Hl1CGJVLG0awBGtZXpNNYZfZ8VWWG31diffcQqbZozk" | ||
VITE_REACT_APP_API_URI: "http://localhost:8000" | ||
env_file: | ||
- ./no-ocr-ui/.env | ||
ports: | ||
- "5173:5173" | ||
depends_on: | ||
- api | ||
|
||
api: | ||
build: | ||
context: ./no-ocr-api | ||
dockerfile: Dockerfile | ||
env_file: | ||
- ./no-ocr-api/.env | ||
volumes: | ||
- api-storage:/app/storage | ||
ports: | ||
- "8000:8000" | ||
depends_on: | ||
- qdrant | ||
environment: | ||
QDRANT_HOST: "qdrant" | ||
|
||
qdrant: | ||
image: qdrant/qdrant:v1.12.5 | ||
volumes: | ||
- qdrant-storage:/qdrant/storage | ||
ports: | ||
- "6333:6333" | ||
|
||
volumes: | ||
api-storage: | ||
qdrant-storage: |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Oops, something went wrong.