-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault on using -psm 0 when using fast eng.traineddata #1167
Comments
The eng is the default lang even when you use --psm 0. The OSD module is based on the legacy engine. 'fast' and 'best' traineddatas were trained for LSTM only. |
Try this:
|
The explicit Although it shouldn't crash even if the arguments are invalid. |
Suggested by amitdo in tesseract-ocr/tesseract#1167
Here's a more complete stack trace from issue #1258 with commit 000d027 with debugging enabled: #0 tesseract::Classify::CharNormClassifier (this=0x7ffff7fd1010, blob=0x5d3f080, sample=...,
adapt_results=0x5cc7630) at adaptmatch.cpp:1349
#1 0x00007ffff77052a8 in tesseract::Classify::DoAdaptiveMatch (this=0x7ffff7fd1010, Blob=0x5d3f080,
Results=0x5cc7630) at adaptmatch.cpp:1581
#2 0x00007ffff76fff89 in tesseract::Classify::AdaptiveClassifier (this=0x7ffff7fd1010, Blob=0x5d3f080,
Choices=0x7fffffffc0d0) at adaptmatch.cpp:192
#3 0x00007ffff75feb32 in os_detect_blob (bbox=0x5cd6870, o=0x7fffffffc170, s=0x7fffffffc180,
osr=0x7fffffffcc50, tess=0x7ffff7fd1010) at osdetect.cpp:354
#4 0x00007ffff75fe756 in os_detect_blobs (allowed_scripts=0x0, blob_list=0x7fffffffca00,
osr=0x7fffffffcc50, tess=0x7ffff7fd1010) at osdetect.cpp:305
#5 0x00007ffff75fe490 in os_detect (port_blocks=0x7fffffffcb90, osr=0x7fffffffcc50, tess=0x7ffff7fd1010)
at osdetect.cpp:264
#6 0x00007ffff75fe0b1 in orientation_and_script_detection (filename=..., osr=0x7fffffffcc50,
tess=0x7ffff7fd1010) at osdetect.cpp:225
#7 0x00007ffff75c047e in tesseract::TessBaseAPI::DetectOS (this=0x607360 <main::api>, osr=0x7fffffffcc50)
at baseapi.cpp:2382
#8 0x00007ffff75be7db in tesseract::TessBaseAPI::DetectOrientationScript (this=0x607360 <main::api>,
orient_deg=0x7fffffffd420, orient_conf=0x7fffffffd424, script_name=0x7fffffffd438,
script_conf=0x7fffffffd428) at baseapi.cpp:1896
#9 0x00007ffff75be8fd in tesseract::TessBaseAPI::GetOsdText (this=0x607360 <main::api>, page_number=0)
at baseapi.cpp:1928
#10 0x00007ffff75cb8bc in tesseract::TessOsdRenderer::AddImageHandler (this=0x81c890,
api=0x607360 <main::api>) at renderer.cpp:268
#11 0x00007ffff75cafe5 in tesseract::TessResultRenderer::AddImage (this=0x81c890, api=0x607360 <main::api>)
at renderer.cpp:86
#12 0x00007ffff75bbd11 in tesseract::TessBaseAPI::ProcessPage (this=0x607360 <main::api>, pix=0x40d8140,
page_index=0, filename=0x7fffffffde53 "/tmp/com.github.ocrmypdf.ec7wbvyw/000001.ocr.png",
retry_config=0x0, timeout_millisec=0, renderer=0x81c890) at baseapi.cpp:1224
#13 0x00007ffff75bb973 in tesseract::TessBaseAPI::ProcessPagesInternal (this=0x607360 <main::api>,
filename=0x7fffffffde53 "/tmp/com.github.ocrmypdf.ec7wbvyw/000001.ocr.png", retry_config=0x0,
timeout_millisec=0, renderer=0x81c890) at baseapi.cpp:1156
#14 0x00007ffff75bb385 in tesseract::TessBaseAPI::ProcessPages (this=0x607360 <main::api>,
filename=0x7fffffffde53 "/tmp/com.github.ocrmypdf.ec7wbvyw/000001.ocr.png", retry_config=0x0,
timeout_millisec=0, renderer=0x81c890) at baseapi.cpp:1056
#15 0x0000000000403ae6 in main (argc=11, argv=0x7fffffffda28) at tesseractmain.cpp:529 |
@stweil Should the program use -l osd by default internally for 4.0.0 when --psm 0 is used? |
Yes, IMO. |
I think this was fixed in commit 27ce472, so the issue can be closed. |
Environment
Current Behavior:
When using eng.traineddata from tessdata_fast in
-psm 0
mode Tesseract crashes for all input files. Example:Behaviour is the same using tessdata_best.
After replacing with tessdata/eng.traineddata, OSD works fine:
I discovered this in the Ubuntu 17.04 PPA (ppa:alex-p/tesseract-ocr) and replicated it on macOS tesseract built from source.
Stack trace
The text was updated successfully, but these errors were encountered: