Skip to content

a simple script that recognizes text from pictures in a .pdf file and forms a .txt file.

Notifications You must be signed in to change notification settings

redd4ford/pdfs-to-txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

pdfs-to-txt

a simple script that recognizes text from pictures in a .pdf file and forms a .txt file.

How to use

POPPLER_PATH: should be the path to \bin in the Poppler installation folder

pytesseract.pytesseract.tesseract_cmd: should be the path to tesseract.exe in the Tesseract-OCR installation folder

PROJECT_PATH: path to the project folder (or any other folder for generated .txts)

PDFS_SOURCE: path to the folder which contains .pdfs to be converted
  • install requirements.txt and run the script

About

a simple script that recognizes text from pictures in a .pdf file and forms a .txt file.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages