A powerful document processing service built on Docling that converts various document formats (PDF, DOCX, PPTX, images, etc.) into structured text with advanced enrichments like table extraction, formula recognition, and image analysis.
- PDF Documents - Advanced processing with OCR, table extraction, and layout analysis
- Microsoft Office - Word (DOCX), PowerPoint (PPTX), Excel (XLSX)
- Images - PNG, JPEG with vision-language model processing
- Text/Markup - Markdown, HTML, AsciiDoc
- Structured Data - CSV, JSON
- Scientific - USPTO Patents (XML), JATS XML
- Audio - Audio file processing
- ๐ OCR - Text extraction from images and scanned documents
- ๐ Table Structure - Intelligent table detection and extraction
- ๐งฎ Formula Recognition - Mathematical formula detection and conversion
- ๐ผ๏ธ Image Analysis - Picture classification and description using vision models
- ๐ Code Detection - Code block identification and extraction
- ๐ Layout Analysis - Document structure understanding
- Markdown - Clean, structured markdown output
- HTML - Rich HTML with preserved formatting
- JSON - Structured data with metadata
- Plain Text - Simple text extraction
Deploy as a serverless worker on RunPod for automatic scaling and GPU acceleration.
# Build and deploy
docker build --platform linux/amd64 -t your-registry/doc-processor .
docker push your-registry/doc-processor
Run as a standalone web service with REST API.
# Install dependencies
pip install -r requirements.txt
# Download models (first time only)
python preloader.py
# Start the service
python app.py
# Build
docker build -t doc-processor .
# Run FastAPI service
docker run -p 8000:8000 -e SERVICE=fastapi doc-processor
# Run RunPod handler
docker run -e SERVICE=runpod doc-processor
import runpod
# Submit job
job = runpod.submit({
"input": {
"document_url": "https://example.com/document.pdf"
}
})
# Get result
result = runpod.get_job(job['id'])
print(result['output']['content']) # Processed document content
# Health check
curl http://localhost:8000/
# Process document
curl -X POST "http://localhost:8000/process" \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"document_url": "https://example.com/document.pdf"
}'
import httpx
# Using Python client
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:8000/process",
headers={"Authorization": "Bearer your-api-key"},
json={"document_url": "https://example.com/document.pdf"}
)
result = response.json()
print(result['result']['content'])
Variable | Default | Description |
---|---|---|
DEVICE_CAPABILITY |
high |
Processing capability level (low , medium , high ) |
API_KEY |
admin |
API authentication key for FastAPI service |
SERVICE |
runpod |
Service mode (runpod or fastapi ) |
WORKERS |
1 |
Number of FastAPI workers |
- โ Basic OCR and table extraction
- โ Formula recognition disabled
- โ Image classification disabled
- โ Picture description disabled
- ๐ก Best for: Simple text extraction, resource-constrained environments
- โ Code and formula enrichment
- โ Advanced OCR and table structure
- โ Image processing disabled
- ๐ก Best for: Most document types without heavy image analysis
- โ All enrichments enabled
- โ Vision-language model for image description
- โ Advanced table analysis with cell matching
- โ High-resolution image generation
- ๐ก Best for: Complete document understanding, research papers
{
"content": "# Document Title\n\nProcessed content in markdown...",
"metadata": {
"source": "https://example.com/document.pdf",
"filename": "document.pdf",
"page_count": 10,
"export_format": "markdown",
"device_capability": "high",
"enrichments_applied": {
"code_enrichment": true,
"formula_enrichment": true,
"picture_classification": true,
"picture_description": true,
"table_structure": true,
"ocr": true
},
"enrichment_stats": {
"code_blocks": 5,
"formulas": 12,
"images": 8,
"tables": 3
}
},
"status": "success"
}
# Test with sample input
python handler.py
# Custom test input
echo '{"input": {"document_url": "your-url-here"}}' > test_input.json
python handler.py
# Download all models (required for first run)
python preloader.py
# Models are stored in ./models/ directory
# Includes: layout detection, table extraction, OCR, vision models
Core dependencies:
docling
- Document processing frameworkfastapi
- Web framework for API servicerunpod
- Serverless platform integrationhttpx
- HTTP client for document downloadinguvicorn
- ASGI server
- Memory: 8GB RAM
- Storage: 10GB for models
- Python: 3.12+
- GPU: NVIDIA GPU with CUDA support
- VRAM: 8GB+ for full capability mode
- CUDA: 12.6+ (included in Docker image)
# Process arXiv paper
result = process_document("http://arxiv.org/pdf/1706.03762")
# Extracts: formulas, tables, figures, code snippets, references
# Process financial reports, contracts, presentations
result = process_document("https://company.com/annual-report.pdf")
# Extracts: structured tables, charts, key metrics
# Convert between formats while preserving structure
# PDF โ Markdown, DOCX โ HTML, etc.
- Fork the repository
- Create a feature branch
- Make your changes
- Test with different document types
- Submit a pull request
This project is part of the larger application ecosystem. See the main repository for license information.