[BUG] tesseract returns SIGFPE Signal #1062

C0D3D3V · 2023-01-17T14:40:45Z

Describe the bug
tesseract returns SIGFPE Signal?

   41 Rasterize with png16m, rotation 0
   41 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=41', '-dLastPage=41', '-r599.441022x599.441022', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.29mqbkv2/origin.pdf']
   40 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
   40 Grafting
   40 Page rotation: (content, auto) -> page = (0, 0) -> 0
   41 Rotating output by 0
   41 resolution (599.4399999999999, 599.4399999999999)
   41 Running: ['tesseract', '-l', 'deu', '-c', 'textonly_pdf=1', '/tmp/ocrmypdf.io.29mqbkv2/000041_ocr.png', '/tmp/ocrmypdf.io.29mqbkv2/000041_ocr_tess', 'pdf', 'txt']
   42 Rasterize with png16m, rotation 0
   42 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=42', '-dLastPage=42', '-r599.441022x599.441022', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.29mqbkv2/origin.pdf']
   41 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
   41 Grafting
   41 Page rotation: (content, auto) -> page = (0, 0) -> 0
   42 Rotating output by 0
   42 resolution (599.4399999999999, 599.4399999999999)
   42 Running: ['tesseract', '-l', 'deu', '-c', 'textonly_pdf=1', '/tmp/ocrmypdf.io.29mqbkv2/000042_ocr.png', '/tmp/ocrmypdf.io.29mqbkv2/000042_ocr_tess', 'pdf', 'txt']
   42 [tesseract] Image too small to scale!! (2x48 vs min width of 3)
   42 [tesseract] Line cannot be recognized!!
   43 Rasterize with png16m, rotation 0
   43 Running: ['gs', '-dQUIET', '-dSAFER', '-dBATCH', '-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=43', '-dLastPage=43', '-r599.441022x599.441022', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None', '-f', '/tmp/ocrmypdf.io.29mqbkv2/origin.pdf']
   43 Rotating output by 0
   43 resolution (599.4399999999999, 599.4399999999999)
   43 Running: ['tesseract', '-l', 'deu', '-c', 'textonly_pdf=1', '/tmp/ocrmypdf.io.29mqbkv2/000043_ocr.png', '/tmp/ocrmypdf.io.29mqbkv2/000043_ocr_tess', 'pdf', 'txt']
OCR:  49%|█████████████████████████████████████████████████████████████████████████████████                                                                                     | 41.0/84.0 [09:55<10:24, 14.52s/page]
ExitCodeException
Traceback (most recent call last):
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_exec/tesseract.py", line 401, in generate_pdf
    p = run(args_tesseract, stdout=PIPE, stderr=STDOUT, timeout=timeout, check=True)
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/subprocess/__init__.py", line 57, in run
    proc = subprocess_run(args, env=env, check=check, **kwargs)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['tesseract', '-l', 'deu', '-c', 'textonly_pdf=1', '/tmp/ocrmypdf.io.29mqbkv2/000042_ocr.png', '/tmp/ocrmypdf.io.29mqbkv2/000042_ocr_tess', 'pdf', 'txt']' died with <Signals.SIGFPE: 8>.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_sync.py", line 393, in run_pipeline
    optimize_messages = exec_concurrent(context, executor)
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_sync.py", line 280, in exec_concurrent
    executor(
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_concurrent.py", line 87, in __call__
    self._execute(
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line 141, in _execute
    result = future.result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_sync.py", line 220, in exec_page_sync
    (ocr_out, text_out) = ocr_engine_textonly_pdf(ocr_image_out, page_context)
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_pipeline.py", line 661, in ocr_engine_textonly_pdf
    ocr_engine.generate_pdf(
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/builtin_plugins/tesseract_ocr.py", line 189, in generate_pdf
    tesseract.generate_pdf(
  File "/home/daniel/.local/lib/python3.10/site-packages/ocrmypdf/_exec/tesseract.py", line 413, in generate_pdf
    raise SubprocessOutputError() from e
ocrmypdf.exceptions.SubprocessOutputError

To Reproduce

ocrmypdf -v -l deu --jobs 1  'test.pdf' 'test.pdf'

I also tried without --jobs and with --force-ocr

Example file

This only happens with this test file, on 33 similar files it worked without problems.

test file is up for 30 days:
https://easyupload.io/as1sst

System

OS: Arch Linux 6.1.6-arch1-1
OCRmyPDF Version: 14.0.2
How did you install ocrmypdf? pip

The text was updated successfully, but these errors were encountered:

C0D3D3V · 2023-01-17T15:03:47Z

I made a issue on the tesseract repo too, I guess its not really related to OCRmyPDF
tesseract-ocr/tesseract#3995

C0D3D3V · 2023-01-17T15:20:33Z

An option to ignore tesseract errors would be nice. So that the page with an error is just skipped instead of crashing OCRmyPDF

jbarlow83 · 2023-01-17T20:29:35Z

I'm reluctant to add such an option because it could mask more serious issues than a one-time failure. I think it's reasonable for the program to ask for user intervention in this case, and an exception is a good way of doing that.

One could write a plugin to suppress errors from the OCR engine if needed.

C0D3D3V · 2023-01-17T22:12:33Z

just a side note gscan2pdf also issued the warnings

42 [tesseract] Image too small to scale!! (2x48 vs min width of 3)
42 [tesseract] Line cannot be recognized!!

for page 42 but did not crash and just created a complete pdf. It also uses tesseract, I tried to dig a little into the code of gsccan2pdf, to find a difference in the way it executes tesseract, but gave up... (I guess they have a fallback to cuneiform/gocr, not totally sure)

C0D3D3V changed the title ~~[BUG]~~ [BUG] tesseract returns SIGFPE Signal Jan 17, 2023

jbarlow83 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] tesseract returns SIGFPE Signal #1062

[BUG] tesseract returns SIGFPE Signal #1062

C0D3D3V commented Jan 17, 2023 •

edited

Loading

C0D3D3V commented Jan 17, 2023

C0D3D3V commented Jan 17, 2023

jbarlow83 commented Jan 17, 2023

C0D3D3V commented Jan 17, 2023 •

edited

Loading

[BUG] tesseract returns SIGFPE Signal #1062

[BUG] tesseract returns SIGFPE Signal #1062

Comments

C0D3D3V commented Jan 17, 2023 • edited Loading

C0D3D3V commented Jan 17, 2023

C0D3D3V commented Jan 17, 2023

jbarlow83 commented Jan 17, 2023

C0D3D3V commented Jan 17, 2023 • edited Loading

C0D3D3V commented Jan 17, 2023 •

edited

Loading

C0D3D3V commented Jan 17, 2023 •

edited

Loading