-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error opening data file ".traineddata" when two languages defined and first has a tessedit_load_sublangs param #4002
Comments
Hello team, in case there is any possibility to include such a tiny fix into next release - that would be highly appreciated, as this bug causes critical issue (crash) in our usage scenario. Thank you so much in advance! |
@stweil, I hope you can fix this regression. |
...great to see it assigned, hopefully it can be resolved soon! :) |
I don't see a |
It looks like commit 9091055 tried to fix loading of sublangs, but instead of that broke it completely. So there was no longer a warning message, but the sublangs were simply not loaded. This regression should affect 5.0.0-rc2 and all following releases. Therefore I wonder how 5.0.0 could produce the warning. |
The regression (and this issue here) should be fixed by pull request #4141. |
Yes, I am using tessdata_best, didn't mention that clearly, but can be seeing from the output log. Thanks!
Great if it is so! Is there any way to confirm that? ;) |
I'm afraid the only way to confirm that is currently using your own build of Tesseract with the patched code. |
@AndrewG10i Now that 5.3.3 has been released, you should be able to verify the fix. |
Thank you so much, guys! I will test it as soon as I can and reply back here! |
Tested this issue and it is resolved now! Sorry for delay with my reply! Thank you! |
Basic Information
tesseract 5.3.0
leptonica-1.83.0
libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.2.4 : libopenjp2 2.5.0
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Operating System
RHEL 8
Other Operating System
CentOS 8 Stream x86_64 with all updates.
uname -a
Linux dev1.local 4.18.0-448.el8.x86_64 #1 SMP Wed Jan 18 15:02:46 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Compiler
N/A
Virtualization / Containers
VMWare Workstation 16
CPU
Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
Current Behavior
When tesseract set to use two languages (e.g.
-l chi_tra+eng
) and first language traineddata containstessedit_load_sublangs
param (which should not be equal 'eng') error notificationError opening data file /local/tessData/tessdata_best-4.1.0/.traineddata
is shown as follows:Sample file is not related to the issue, so any sample can be used
Expected Behavior
Command should be executed without warning, so languages should be loaded properly. The referenced case above works fine ( without
Error opening data file
message) by swapping languages order:Suggested Fix
Unfortunately I am not a C-guy, but seems something is wrong around the following lines of code:
https://github.com/tesseract-ocr/tesseract/blob/5.3.0/src/ccmain/tessedit.cpp#L295
Other Information
I have noticed that there were quite a few issues related to languages load lately (just for ref):
This issue comes from the another issue: Tesseract crash after glibc update on linux (when two languages selected)
Tested with versions 5.2 and 5.3 and for both issue repeats, also tested with version 4.1.1 and it works properly there.
The text was updated successfully, but these errors were encountered: