-
Notifications
You must be signed in to change notification settings - Fork 1
cis-ocropy-segment crashes without error log #7
Comments
the missing STDERR was reported this week, it's a bug in core I will try to fix asap. no idea about the exit code. but I generally discourage ocrd_ocropy, there is a much better version in ocrd_cis. |
I wasn't really aware that there is About the STDERR I am not sure, because running directly doesn't give any STDERR (and neither STDOUT). Or does |
@kba you mean OCR-D/core#592?
@beckstefan The main work on wrapping Ocropy for OCR-D and improving it was done in ocrd_cis, whereas ocrd_ocropy does not offer anything useful yet and is currently inactive. (I have no rights to transfer the issue to ocrd_cis, but also I am not sure it does belong there, as the problem seems to be in core's
All OCR-D wrappers (Python and bash based) use OCR-D/core. What @kba was saying was that the missing log messages are a problem specific to So, could you please run your workflow directly, by calling the individual processor CLIs instead? (So we at least know what led up to the exit -9?) For your workflow, that'll be: ocrd-olena-binarize -I OCR-D-IMG -O OCR-D-BIN
ocrd-cis-ocropy-deskew -I OCR-D-BIN -O OCR-D-DESKEW
ocrd-anybaseocr-crop -I OCR-D-DESKEW -O OCR-D-CROP
ocrd-cis-ocropy-segment -I OCR-D-CROP -O OCR-D-PAGE-SEG -P level-of-operation page
ocrd-tesserocr-recognize -I OCR-D-PAGE-SEG -O OCR-D-OCR -P model Fraktur |
|
Thanks @beckstefan, we are getting there... |
No change in output. (As I figured out, based on the above workflow, I also tried |
I see. Well turns out I was wrong, the profile message only appears after the processor ran – unless it crashed. Exit 137 could mean your container went out of memory for some reason, and We really need to get our hands on the
[loggers]
keys=root
[handlers]
keys=consoleHandler
[formatters]
keys=defaultFormatter
[logger_root]
level=DEBUG
handlers=consoleHandler
[handler_consoleHandler]
class=StreamHandler
formatter=defaultFormatter
args=(sys.stdout,)
[formatter_defaultFormatter]
format=%(levelname)s %(name)s - %(message)s
datefm=%H:%M:%S
docker run ... --mount type=bind,source=ocrd_logging.conf,destination=/etc/ocrd_logging.conf ... |
Shrugs. I should have looked up
Generally speaking, our standard is 400dpi and especially newspaper tend to be big, can you roughly estimate
I know, that there are no strict answers, but a rough tendency would be nice, knowing that in particular cases the statement won't apply. And for completeness the output:
|
Indeed. OCR-D of course was developed mainly focussing on printed books. Existing processors don't downscale by themselves, and we have not yet allowed making downscaled annotations as a preprocessing step. (That's because we first need PAGE-XML to support representing scale.) I don't think it's necessary to change the functional model to support crop-based partial processing though. Machines will become more powerful, while newspapers don't grow.
Phew, that's a tough (but good) question. We've made some runtime performance statistics, but without varying/factoring DPI and without looking at memory consumption yet. Generally most rule-based processors will use algorithms of at least
300 DPI should always be good enough. Some processors (esp. for preprocessing and segmentation) may even run suboptimal on larger (> 500 DPI) resolutions (if they are badly written, with fixed parameters assuming a certain density). |
Maybe you should open an issue on OCR-D/ocrd-website for documenting (rough estimates) of resource requirements. Can we close? |
With up-do-date docker I get when running
(The workflow is an attempt to get the three columns recognized correctly in http://tudigit.ulb.tu-darmstadt.de/show/Gue-11660-24)
Continuing manually to get the error:
Resultet in nothing happening (no output to terminal, no folder OCR-D-PAGE-SEG, except for an exit code of
137
The source images are relatively big (10MB, jpeg), but I can provide them in case of need as well.
The text was updated successfully, but these errors were encountered: