Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetTextDirection + MapWordConfidences crash python #324

Open
NewUserHa opened this issue Oct 3, 2023 · 2 comments
Open

GetTextDirection + MapWordConfidences crash python #324

NewUserHa opened this issue Oct 3, 2023 · 2 comments

Comments

@NewUserHa
Copy link

NewUserHa commented Oct 3, 2023

import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
    print(api.SetImage(image))
    print(api.GetTextDirection()) # comment this line out will make python not crash
    print(api.MapWordConfidences())

output:

None
(0, -0.0)
best_choice != nullptr:Error:Assert failed:in file C:\projects\tesserocr-windows-build\tesseract\src\ccmain\ltrresultiterator.cpp, line 51

additional:
even if the image has a character in and GetUTF8Text() returned the result, the MapWordConfidences() still returns empty [] if there's a DetectOS() before it.

version:
tesseract 5.3.1
leptonica-1.83.1 (Jun 13 2023, 19:19:21) [MSC v.1935 LIB Release x64]
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.3.0 : libopenjp2 2.5.0
python 3.11.2

@NewUserHa NewUserHa changed the title crash python GetTextDirection + MapWordConfidences crash python Oct 3, 2023
@sirfz
Copy link
Owner

sirfz commented Oct 4, 2023

The error in your output is in tesseract not tesserocr but it's worth noting that Recognize() should be called before calling AllWords(), otherwise it'll always return an empty list (according to the method's docstring).

@NewUserHa
Copy link
Author

NewUserHa commented Oct 4, 2023

adding api.Recognize() still the same.

import tesserocr
from PIL import Image
image = Image.new('RGB', (100, 100), 255)
with tesserocr.PyTessBaseAPI(r"C:\Program Files\Tesseract-OCR\tessdata", 'chi_sim', 10) as api:
    print(api.SetImage(image))
    api.Recognize()
    print(api.GetTextDirection())  # comment this line out will make python not crash
    print(api.MapWordConfidences())

and crash reason/error msg the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants