Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to show tesseract and leptonica messages (easily) #348

Closed
zdenop opened this issue Mar 28, 2024 · 11 comments
Closed

Allow to show tesseract and leptonica messages (easily) #348

zdenop opened this issue Mar 28, 2024 · 11 comments

Comments

@zdenop
Copy link
Contributor

zdenop commented Mar 28, 2024

Be default messages from leptonica and tesseract are silenced .

However, when things do not work, it is useful to turn them on.
Is there a way to set custom Leptonica message severity?

For tesseract I can use api.SetVariable('debug_file', 'tessocr.log'), but there is some strange behavior. E.g:

import tesserocr
from PIL import Image

image = Image.open('Arial.png')

with tesserocr.PyTessBaseAPI(oem=tesserocr.OEM.TESSERACT_ONLY) as api:
    api.SetImage(image)
    api.SetVariable('debug_file', 'tesserocr.debug1.log')
    fist_result = api.GetUTF8Text()
    print(fist_result)

with tesserocr.PyTessBaseAPI(oem=tesserocr.OEM.LSTM_ONLY) as api:
    api.SetImage(image)
    api.SetVariable('debug_file', 'tesserocr.debug2.log')
    second_result = api.GetUTF8Text()
    print(second_result)

Following code will create only tesserocr.debug1.log, but puts there 2 times the same information (Estimating resolution as 487)

@sirfz
Copy link
Owner

sirfz commented Apr 3, 2024

I guess we can expose a utility function to call leptonica's setMsgSeverity.

@sirfz
Copy link
Owner

sirfz commented Apr 20, 2024

Just pushed the changes that expose a new function set_leptonica_log_level which allows setting Leptonica's message severity level.

I also did some tests setting the debug_file variable and noticed the same issue you described. Once the variable is set on the initial API, it won't be changed with subsequent initialized API instances. In addition, the file itself won't be populated until I exit my ipython session and the buffer is flushed (which I would expect to happen on api.End() instead).

Frankly, I tried to experiment with possible changes to see if it would affect this behavior but nothing worked. This looks like an issue with tesseract's API and not the way we're wrapping it. Any thoughts?

@sirfz sirfz closed this as completed Apr 27, 2024
@asottile
Copy link

asottile commented Jun 1, 2024

can we change the default back to silenced? this was quite a surprised to see a lot of (noisy) output when upgrading tesserocr that I was not seeing before!

@sirfz
Copy link
Owner

sirfz commented Jun 3, 2024

You can api.SetVariable('debug_file', '/dev/null') to suppress these messages

@asottile
Copy link

asottile commented Jun 3, 2024

I understand I can do that, but the console spew is a regression from previous versions and I would prefer it to not do spew by default

@sirfz
Copy link
Owner

sirfz commented Jun 3, 2024

while I agree with you that it's an inconvenience, I think the current behavior should've been the default from the start. tesserocr is just a wrapper around tesseract and should minimize any mutation to tesseract's default behavior as much as possible

@asottile
Copy link

asottile commented Jun 3, 2024

the tesseract cli doesn't spew output either so I don't see your point

@sirfz
Copy link
Owner

sirfz commented Jun 3, 2024

the tesseract cli is a tool that calls into tesseract's API, you can code the same functionality using tesserocr for example (and include suppressing messages).

@asottile
Copy link

asottile commented Jun 3, 2024

I cannot imagine anyone that would want the behavior you've changed it to to be the default. I would expect the api to have a sensible default (like it did). now every user of your library needs to add two more lines to restore the old desirable behavior.

I think the defaults that the tesseract cli should be a strong indicator that this change is undesirable. nobody wants an API that sends a bunch of junk output by default

@sirfz
Copy link
Owner

sirfz commented Jun 3, 2024

Users of the C++ API would have to do exactly that. However, I agree that in the typical Python dev's use-case it's probably preferable to not have all this output (reminiscent of all the annoying messages spewed by torch/tensorflow for example). I'm not opposed to going back to disabling it, you're the first person to complain tho so I'm not sure people are as bothered. Perhaps you can open a separate issue about it and see if more people are interested

@zdenop
Copy link
Contributor Author

zdenop commented Jun 4, 2024

Just note: my original intention was to have option turn on messages, so if anything goes wrong, user/developer can have a look at the messages without recompiling tesserocr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants