Typhoon OCR is a model for extracting structured markdown from images or PDFs. It supports document layout analysis and table extraction, returning results in markdown or HTML. This repo provides a simple Gradio website to demonstrate Typhoon OCR.
This fork focuses on Windows 10/11. For macOS/Linux setup, please refer to the official Typhoon OCR repository.
- Upload a PDF or image
- Multi-page selection for PDFs: all pages, a page range, or a single page
- Preview gallery for selected pages
- Progress bar during OCR
- Reconstructs content as Markdown (tables as Markdown in
default
, as HTML instructure
) - Two prompt modes:
default
,structure
- Languages: English, Thai
- Uses a remote OpenAI-compatible API (e.g., opentyphoon.ai) or a local endpoint (e.g., vLLM on WSL2)
- Windows 10/11 with Python 3.10+
pip install typhoon-ocr
# or to run the demo app:
pip install -r requirements.txt
Create a .env
file in the project root:
# OpenAI-compatible endpoint (remote provider or local vLLM on WSL2)
TYPHOON_BASE_URL=https://api.opentyphoon.ai/v1
# API key for that endpoint
TYPHOON_API_KEY=YOUR_KEY
# Model name exposed by that endpoint
TYPHOON_OCR_MODEL=typhoon-ocr
Running vLLM in WSL2: set
TYPHOON_BASE_URL=http://127.0.0.1:8101/v1
andTYPHOON_API_KEY=dummy
.
Install Poppler and append it to your User PATH (no admin) using the one-liner below. This provides the required pdfinfo
and pdftoppm
tools.
iwr -useb https://github.com/oschwartz10612/poppler-windows/releases/download/v25.07.0-0/Release-25.07.0-0.zip -OutFile $env:TEMP\poppler.zip; rm C:\poppler -Recurse -Force -ErrorAction SilentlyContinue; Expand-Archive $env:TEMP\poppler.zip C:\poppler -Force; $bin=(Get-ChildItem C:\poppler -Recurse -Filter pdfinfo.exe | Select-Object -First 1).DirectoryName; if(-not $bin){throw "pdfinfo.exe not found under C:\poppler"}; $u=[Environment]::GetEnvironmentVariable('Path','User'); if([string]::IsNullOrEmpty($u)){$u=''}; if($u -notlike "*$bin*"){[Environment]::SetEnvironmentVariable('Path', ($u.TrimEnd(';')+';'+$bin).Trim(';'), 'User')}; $env:Path+=';'+$bin; pdfinfo -v
Verify:
pdfinfo -v
pdftoppm -v
python app.py
Local inference with vLLM is not supported natively on Windows. Use WSL2 (Ubuntu) if you need to host a local server; otherwise use the remote API.
On Windows, run this inside WSL2 (Ubuntu). Native Windows is not supported.
vllm serve scb10x/typhoon-ocr-7b --served-model-name typhoon-ocr --dtype bfloat16 --port 8101
- Multi-page OCR for PDFs (all / range / single) with a gallery preview and progress bar
- Robust output parsing (falls back to raw text when JSON parsing fails)
- Uses
TYPHOON_BASE_URL
,TYPHOON_API_KEY
,TYPHOON_OCR_MODEL
env vars - Adds
pypdf
for page counting
- Add multi-page PDF workflow (all / range / single) with gallery preview and progress bar
- More robust output parsing: fall back to raw text if JSON parsing fails
- Documented and standardized env vars:
TYPHOON_BASE_URL
,TYPHOON_API_KEY
,TYPHOON_OCR_MODEL
- Breaking: replace
MODEL
withTYPHOON_OCR_MODEL
- Breaking: replace
- Add
pypdf
for page counting - UI/UX: tabs (File/Parameters), Thai-first copy, scrollable output, small CSS tweaks
- Docs: Windows-only focus, TH/EN badges, Poppler one-liner (without
setx
truncation), clarified vLLM via WSL2
- openai
- python-dotenv
- ftfy
- pypdf
- gradio
- pillow
- vllm (for hosting an inference server on WSL2; not native on Windows)
- If
Error processing document
occurs on Windows, ensure Poppler is installed and onPATH
(see Windows Setup above) and verify with:pdfinfo -v pdftoppm -v
This project is licensed under the Apache 2.0 License. See individual datasets and checkpoints for their respective licenses.