pdfs-to-txt

a simple script that recognizes text from pictures in a .pdf file and forms a .txt file.

How to use

clone this repository
install poppler. a stable version for Windows
install tesseract-ocr
set variables in the script:

POPPLER_PATH: should be the path to \bin in the Poppler installation folder

pytesseract.pytesseract.tesseract_cmd: should be the path to tesseract.exe in the Tesseract-OCR installation folder

PROJECT_PATH: path to the project folder (or any other folder for generated .txts)

PDFS_SOURCE: path to the folder which contains .pdfs to be converted

install requirements.txt and run the script

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
pdfs-to-txt.py		pdfs-to-txt.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdfs-to-txt

How to use

About

Releases

Packages

Languages

redd4ford/pdfs-to-txt

Folders and files

Latest commit

History

Repository files navigation

pdfs-to-txt

How to use

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages