Test on a paragraph #301

Tailor2019 · 2022-01-06T02:58:27Z

Hello!
@ChWick @andbue
Please can I train calamari on lines of images and testing it on a paragraph?
Please how can I doing this?
Thanks in advance!!

andbue · 2022-01-06T08:54:34Z

As long as you're providing the coordinates of the lines with a PAGE XML file, you can run your model on the page or paragraph image.

Tailor2019 · 2022-01-06T09:40:39Z

Thanks a lot for your reply!
@andbue
But how can I prepare this PAGE XML file?
can you give me an example of this xml file? Does exist a tool that can help preparing these coordinates?
Thanks in advance!

andbue · 2022-01-06T09:52:27Z

Have a look here https://github.com/PRImA-Research-Lab/PAGE-XML
Some tools and libraries:
http://www.primaresearch.org/tools/Aletheia/
https://github.com/OCR4all/LAREX
https://github.com/qurator-spk/eynollah
https://github.com/OCR4all/OCR4all_helper-scripts/blob/master/ocr4all_helper_scripts/helpers/pagelineseg_helper.py

Tailor2019 · 2022-01-07T00:44:18Z

@andbue
If I have like this paragraph

:
How can I proceed for recognizing it with Calamari?
Thanks in advance!

andbue · 2022-01-07T11:29:00Z

As I said before, the image alone won't do it. You need a XML file structured according to the PAGE XML schema containing the line coordinates. How you achieve this is not the business of Calamari-OCR. Maybe the most simple way for you would be to just use Aletheia, this has been proposed here and over there PRImA-Research-Lab/PAGE-XML#30 (comment) as well.

Please don't try to save time by stealing other people's time e.g. by @-ing all developers you can get hold of at once in a github issue and then just asking them to explain to you the most basic workings of their code. We're doing our best to provide you with documentation for calamari, it is even included in a GUI at OCR4all. The internet is full of blog posts and tutorials explaining how the different steps of text recognition with line based OCR engines (calamari, kraken, ocropus, tesseract...) work. Please, do some reading first and then ask questions directly related to the project, showing that you've read the docs and that you've tried at least some google search on the topic.

stweil mentioned this issue Jan 6, 2022

using this tool PRImA-Research-Lab/PAGE-XML#30

Open

bertsky closed this as completed Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test on a paragraph #301

Test on a paragraph #301

Tailor2019 commented Jan 6, 2022

andbue commented Jan 6, 2022

Tailor2019 commented Jan 6, 2022 •

edited

Loading

andbue commented Jan 6, 2022

Tailor2019 commented Jan 7, 2022

andbue commented Jan 7, 2022

Test on a paragraph #301

Test on a paragraph #301

Comments

Tailor2019 commented Jan 6, 2022

andbue commented Jan 6, 2022

Tailor2019 commented Jan 6, 2022 • edited Loading

andbue commented Jan 6, 2022

Tailor2019 commented Jan 7, 2022

andbue commented Jan 7, 2022

Tailor2019 commented Jan 6, 2022 •

edited

Loading