OCR preprocessing app, walkthrough, and demo

Explore the commonly overlooked pre-processing steps that help make Optical Character Recognition (OCR) models work properly in practice.

This repository contains code, a walkthrough notebook (ocr_preprocessing_walkthrough.ipynb), and streamlit demo app for playing around with common ocr pre-processing steps, and seeing their resulting effects on ocr quality.

All processing - from the various pre-processing steps to the ocr itself (here using the popular / classic tesseract model - are performed locally.

Installation instructions

To create a handy tool for your own memes pull the repo and install the requirements file

pip install -r requirements.txt

Starting the streamlit app

Start the streamlit app by pasting the following in your terminal

python -m streamlit run ocr/app.py

Ocr your own images

Note: you can drag and drop any desired image directly into the streamlit app, and play around with how pre-processing steps effect the final ocr output.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data/input		data/input
ocr		ocr
.gitignore		.gitignore
README.md		README.md
ocr_preprocessing_walkthrough.ipynb		ocr_preprocessing_walkthrough.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR preprocessing app, walkthrough, and demo

Installation instructions

Starting the streamlit app

Ocr your own images

About

Releases

Packages

Languages

neonwatty/ocr_preprocessing

Folders and files

Latest commit

History

Repository files navigation

OCR preprocessing app, walkthrough, and demo

Installation instructions

Starting the streamlit app

Ocr your own images

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages