Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floating point exception #3995

Closed
C0D3D3V opened this issue Jan 17, 2023 · 8 comments
Closed

floating point exception #3995

C0D3D3V opened this issue Jan 17, 2023 · 8 comments

Comments

@C0D3D3V
Copy link

C0D3D3V commented Jan 17, 2023

Basic Information

Password: dfghdfghgh
000042_ocr.zip

tesseract -l deu -c textonly_pdf=1 000042_ocr.png 000042_ocr_tess pdf txt
Image too small to scale!! (2x48 vs min width of 3)
Line cannot be recognized!!
[1]    78794 floating point exception (core dumped)  tesseract -l deu -c textonly_pdf=1 000042_ocr.png 000042_ocr_tess pdf txt

Other Operating System

Arch Linux 6.1.6-arch1-1

Compiler

https://archlinux.org/packages/community/x86_64/tesseract/

CPU

Intel(R) Core(TM) i5-6200U

@stweil
Copy link
Member

stweil commented Jan 17, 2023

Could you please upload 000042_ocr.png here? It is created by ocrmypdf and should be somewhere in /tmp.

I cannot reproduce the issue. It works for me without any problem. Could you please add the output from tesseract --version?

@C0D3D3V
Copy link
Author

C0D3D3V commented Jan 17, 2023

tesseract --version
tesseract 5.3.0
 leptonica-1.82.0
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.4) : libpng 1.6.39 : libtiff 4.5.0 : zlib 1.2.13 : libwebp 1.3.0 : libopenjp2 2.5.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.6.2 zlib/1.2.13 liblzma/5.2.9 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.2
 Found libcurl/7.87.0 OpenSSL/3.0.7 zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.4 libpsl/0.21.2 (+libidn2/2.3.4) libssh2/1.10.0 nghttp2/1.51.0

@C0D3D3V
Copy link
Author

C0D3D3V commented Jan 17, 2023

grafik

I will upload the dump file later...
https://easyupload.io/0s0489

@C0D3D3V
Copy link
Author

C0D3D3V commented Jan 17, 2023

pwndbg> bt
#0  0x00007fc92dc0ee78 in tesseract::LanguageModel::ExtractFeaturesFromPath(tesseract::ViterbiStateEntry const&, float*) () from /usr/lib/libtesseract.so.5
#1  0x00007fc92dc11381 in tesseract::LanguageModel::ComputeAdjustedPathCost(tesseract::ViterbiStateEntry*) () from /usr/lib/libtesseract.so.5
#2  0x00007fc92dc16e0c in tesseract::LanguageModel::AddViterbiStateEntry(unsigned char, float, bool, int, int, tesseract::BLOB_CHOICE*, tesseract::LanguageModelState*, tesseract::ViterbiStateEntry*, tesseract::LMPainPoints*, tesseract::WERD_RES*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#3  0x00007fc92dc17a36 in tesseract::LanguageModel::UpdateState(bool, int, int, tesseract::BLOB_CHOICE_LIST*, tesseract::LanguageModelState*, tesseract::LMPainPoints*, tesseract::WERD_RES*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#4  0x00007fc92dc181b5 in tesseract::Wordrec::UpdateSegSearchNodes(float, int, std::vector<tesseract::SegSearchPending, std::allocator<tesseract::SegSearchPending> >*, tesseract::WERD_RES*, tesseract::LMPainPoints*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#5  0x00007fc92dc186fc in tesseract::Wordrec::InitialSegSearch(tesseract::WERD_RES*, tesseract::LMPainPoints*, std::vector<tesseract::SegSearchPending, std::allocator<tesseract::SegSearchPending> >*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#6  0x00007fc92dc1896a in tesseract::Wordrec::SegSearch(tesseract::WERD_RES*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /usr/lib/libtesseract.so.5
#7  0x00007fc92dc09326 in tesseract::Wordrec::chop_word_main(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#8  0x00007fc92dc099f8 in tesseract::Wordrec::cc_recog(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#9  0x00007fc92daf9c90 in tesseract::Tesseract::recog_word_recursive(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#10 0x00007fc92dafad98 in tesseract::Tesseract::recog_word(tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#11 0x00007fc92dafb156 in tesseract::Tesseract::tess_segment_pass_n(int, tesseract::WERD_RES*) () from /usr/lib/libtesseract.so.5
#12 0x00007fc92daa8432 in tesseract::Tesseract::match_word_pass_n(int, tesseract::WERD_RES*, tesseract::ROW*, tesseract::BLOCK*) () from /usr/lib/libtesseract.so.5
#13 0x00007fc92dab038f in tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVector<tesseract::WERD_RES>*) () from /usr/lib/libtesseract.so.5
#14 0x00007fc92daa8e90 in tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVector<tesseract::WERD_RES>*), bool, tesseract::WERD_RES**, tesseract::PointerVector<tesseract::WERD_RES>*) () from /usr/lib/libtesseract.so.5
#15 0x00007fc92daa9941 in tesseract::Tesseract::classify_word_and_language(int, tesseract::PAGE_RES_IT*, tesseract::WordData*) () from /usr/lib/libtesseract.so.5
#16 0x00007fc92daa44ed in tesseract::Tesseract::RecogAllWordsPassN(int, tesseract::ETEXT_DESC*, tesseract::PAGE_RES_IT*, std::vector<tesseract::WordData, std::allocator<tesseract::WordData> >*) () from /usr/lib/libtesseract.so.5
#17 0x00007fc92daa5f61 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES*, tesseract::ETEXT_DESC*, tesseract::TBOX const*, char const*, int) () from /usr/lib/libtesseract.so.5
#18 0x00007fc92da88e66 in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) () from /usr/lib/libtesseract.so.5
#19 0x00007fc92da8c8fb in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#20 0x00007fc92da8d944 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#21 0x00007fc92da8df43 in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) () from /usr/lib/libtesseract.so.5
#22 0x00005612fca28154 in ?? ()
#23 0x00007fc92d03c290 in ?? () from /usr/lib/libc.so.6
#24 0x00007fc92d03c34a in __libc_start_main () from /usr/lib/libc.so.6
#25 0x00005612fca296b5 in ?? ()
pwndbg> info threads
  Id   Target Id                          Frame
* 1    Thread 0x7fc92c49ef80 (LWP 179286) 0x00007fc92dc0ee78 in tesseract::LanguageModel::ExtractFeaturesFromPath(tesseract::ViterbiStateEntry const&, float*) () from /usr/lib/libtesseract.so.5
  2    Thread 0x7fc927a1f6c0 (LWP 179295) do_spin (val=89808, addr=0x561301ad7fa4) at /usr/src/debug/gcc/libgomp/config/linux/wait.h:56
  3    Thread 0x7fc928a216c0 (LWP 179293) do_spin (val=89808, addr=0x561301ad7fa4) at /usr/src/debug/gcc/libgomp/config/linux/wait.h:56
  4    Thread 0x7fc9282206c0 (LWP 179294) do_spin (val=89808, addr=0x561301ad7fa4) at /usr/src/debug/gcc/libgomp/config/linux/wait.h:56

@C0D3D3V
Copy link
Author

C0D3D3V commented Jan 19, 2023

Anything I can do to help reproducing the crash?
It kind of looks like this is related to the language model, maybe my model is broken?

@stweil
Copy link
Member

stweil commented Jan 19, 2023

Why do you think that the model might be broken? Is it an offical one? Where did you get it from? How did you install it? Maybe you can upload it somewhere.

Of course you can also try a local debug build of tesseract and get a stack trace with debug symbols.

@stweil
Copy link
Member

stweil commented Jan 19, 2023

Update: I can reproduce the crash with tessdata/deu.traineddata. It does not occur if you add the option --oem 1, so it is related to the old legacy OCR engine.

stweil added a commit to stweil/tesseract that referenced this issue Jan 19, 2023
@stweil
Copy link
Member

stweil commented Jan 19, 2023

The next release 5.3.1 will include a fix for this bug (see pull request #3996). Thank you for your report!

stweil added a commit to stweil/tesseract that referenced this issue Jan 19, 2023
@amitdo amitdo added the legacy label Jan 20, 2023
stweil added a commit that referenced this issue Jan 20, 2023
@amitdo amitdo closed this as completed Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants