Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tesserocr 2.7.0 is much noisier than previous versions #353

Open
asottile opened this issue Jul 8, 2024 · 0 comments
Open

tesserocr 2.7.0 is much noisier than previous versions #353

asottile opened this issue Jul 8, 2024 · 0 comments
Labels

Comments

@asottile
Copy link

asottile commented Jul 8, 2024

discussed a bit in #348 -- here's the separate issue

some minimal code to reproduce the problem:

import numpy
import tessdata
import tesserocr

img = numpy.zeros((20, 20))

tessapi = tesserocr.PyTessBaseAPI(
    tessdata.data_path(),
    'eng',
    psm=tesserocr.PSM.SINGLE_LINE,
)

tessapi.SetImageBytes(
    img.tobytes(),
    width=img.shape[1],
    height=img.shape[0],
    bytes_per_pixel=1,
    bytes_per_line=img.shape[0],
)
tessapi.GetUTF8Text()

prior to 2.7.0 I get no output, but after 2.7.0 I get some pretty annoying terminal spam by default:

$ python3 t.py
Bottom=0, top=20, base=0, x=0

Total count=0
Min=0.00 Really=0
Lower quartile=0.00
Median=0.00, ile(0.5)=0.00
Upper quartile=0.00
Max=0.00 Really=0
Range=1
Mean= 0.00
SD= 0.00

I also get this ~sometimes on shutdown for a different program -- but I haven't been able to make a minimal reproduction of that:

ObjectCache(0x7f12443ae1e0)::~ObjectCache(): WARNING! LEAK! object 0x5634c6dbf0e0 still has count 1 (id .../venv/share/tessdata/eng.traineddatalstm-punc-dawg)
ObjectCache(0x7f12443ae1e0)::~ObjectCache(): WARNING! LEAK! object 0x5634c6e35e40 still has count 1 (id .../venv/share/tessdata/eng.traineddatalstm-word-dawg)
ObjectCache(0x7f12443ae1e0)::~ObjectCache(): WARNING! LEAK! object 0x5634c6e579d0 still has count 1 (id .../venv/share/tessdata/eng.traineddatalstm-number-dawg)

I don't think either of these outputs should be shown by default, I prefer the 2.6.3 behaviour

there is a workaround here though ideally I wouldn't need to add this to every program using tesserocr to silence some debug-level messages

I tried using set_leptonica_log_level to silence this but as far as I can tell it's not possible to use this function?

>>> import tesserocr                
>>> tesserocr.set_leptonica_log_level(tesserocr.LeptLogLevel.ERROR)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Argument 'level' has incorrect type (expected tesserocr.tesserocr.LeptLogLevel, got int)

and even with that function working I would then have to require my users to upgrade tesserocr to >=2.7.0

@sirfz sirfz added the bug label Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants