Skip to content

neskie/python-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python Interface to Tesseract

PyTesseract creates a PythonType that uses TessBaseAPI to perform OCR on an
Image.  You can use any of the multitude of Python modules to open and convert
an image to a format that tesseract can read.

There is very little error checking, or getters and setters, but that can be
fixed in time.

= Installation =

python setup.py build
sudo python setup.py install

= Dependencies =

For this to build on Ubuntu you need

libiulib-dev
libtiff
tesseract-2.04

= Usage =

import tesseract

t = tesseract.Tesseract(lang="eng")

#Produced Text Output
t.filename="image1.tif"
t.mode = 0
t.get_text()

#Produced Box output
t.filename="image2.tif"
t.mode = 1
t.get_text()

Copyright 2009 Neskie Manuel.

Most of this code is mine, except for the part I lifted from
tesseractmain.cpp.

** Licensed under the Apache License, Version 2.0 (the "License");
** you may not use this file except in compliance with the License.
** You may obtain a copy of the License at
** http://www.apache.org/licenses/LICENSE-2.0
** Unless required by applicable law or agreed to in writing, software
** distributed under the License is distributed on an "AS IS" BASIS,
** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
** See the License for the specific language governing permissions and
** limitations under the License.
  

About

Python Tesseract

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published