-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getConfidenceBySymbol crashes for some images #56
Comments
I was able to reproduce it after adding:
The "save_blob_choices" appears to be (I haven't yet looked beyond a cursory glance) related to using the ChoiceIterator. The implemented solution for the getConfidenceBySymbol, however, only relies on the ResultIterator. I understand that setting the "save_blob_choices" to "T" shouldn't cause the ResultIterator--or anything else, for that matter--to crash; as a workaround, if what you're looking for is strictly the confidence values of the recognized characters, have you tried without this parameter setting? For what it's worth, a c++ implementation was able to run without error on this image using tesseract 3.03. I haven't been able to test the results of the same program against version 3.02.02 (which is what Tesseract-OCR-iOS tries to build). That would at least help in determining where the issue lies. |
Thanks for your response, @ko. If I don't include the parameter "save_blob_choices" set to True, then the confidences returned by getConfidenceBySymbol for each character are the same for characters that are part of the same word recognized by Tesseract. For example, here's the confidences returned by running the Template Framework Project (HEAD commit) on the included image_sample.jpg WITHOUT "save_blob_choices" set to True:
Now here are the confidences returned for image_sample.jpg WITH "save_blob_choices" set to True:
For my particular application, there isn't really a concept of "words", as I am just trying to OCR random character strings, so the word confidence values aren't useful to me. :( Could you try running the C++ program you mentioned under the version of Tesseract that this iOS library uses (3.02.02) to try to find a better error message than Xcode's BAD ACCESS? Thanks so much for your help! |
Ah that is interesting; didn't expect "save_blob_choices" to have that effect. So, I tested tesseract 3.02.02 with leptonica 1.69 with installations done via homebrew. It appears to have worked without error. That has fingers pointing to something on the iOS side. Started taking things out and found that removing the blackAndWhite call appears to have stopped the crashing with the sample image provided.
My results:
Yeah... it's yet another workaround. Haven't been able to dig into the "blackAndWhite" or "grayScale" function yet, though, defined in "UIImage+Filters.m". |
That's interesting that The good news is that I was able to resolve my issue by updating the Tesseract and Leptonica libs for this project to 3.03RC and 1.70, respectively. Now I've submitted this pull request so hopefully it will be useful for others. Thanks again for your help, @ko! |
Nice! Glad that moving the libraries up to 3.03RC/1.70 worked out. Although I do find it odd that 3.02.02/1.69 would fail to handle the resulting UIImage from blackAndWhite (and grayScale); I suppose with over a year between 3.02.02 and 3.03RC, there are bound to be some bug and stability fixes. |
Using
getConfidenceBySymbol
to print out the confidence values for each character recognized works for the image_sample.jpg file included in the Template Framework Project, but try swapping it with the following image and it crashes every time:(I know this isn't an ideal image for Tesseract, but it still shouldn't cause it to crash)
I really need to be able to use the symbol confidence values, so I really appreciate any ideas this repo's maintainers have about how to fix this issue.
Here's the code I am using to print out the confidence values:
Here's the commit I added to my fork on top of HEAD to produce this crash, if you guys want to check it out for yourselves: https://github.com/kevincon/Tesseract-OCR-iOS/commit/4bcf1ac4d2c49872381aadb867e2ce6878edd8d6
Also, +@ko since he added getConfidenceBySymbol in a pull request and might have some ideas about why it crashes for some images.
The text was updated successfully, but these errors were encountered: