Paragraphs instead of lines in gpageseg #228

xdsv · 2017-06-25T22:23:54Z

Is there a way to get paragraphs instead of lines in ocropy-gpageseg?
If that's not possible, from what I saw only the .pseg file contains the paragraph information encoded as a different color. Is there a utility to extract that information?

zuphilip · 2017-06-26T05:23:32Z

No, the page segmentation will always output lines. However during this process it tries to identify the columns (I am not sure that also paragraphs are identified in this step) and this information is saved in the pseg files, as you noted already. See in the wiki for more information about the file format.

Maybe try also to go further and create a hocr file with ocropus-hocr which should contain some paragraphs splits as <p />.

xdsv · 2017-06-26T12:39:39Z

Thanks for the reply, actually what I'm looking for is find blocks of text in an image. Can ocropy do this?

…

On Mon, Jun 26, 2017 at 8:23 AM, Philipp Zumstein ***@***.***> wrote: No, the page segmentation will always output lines. However during this process it tries to identify the columns (I am not sure that also paragraphs are identified in this step) and this information is saved in the pseg files, as you noted already. See in the wiki for more information about the file format <https://github.com/tmbdev/ocropy/wiki/OCRopus-File-Formats#physical-layout> . Maybe try also to go further and create a hocr file with ocropus-hocr which should contain some paragraphs splits as <p />. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#228 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AcUdDkJZfJsDCtwyCA59J1vR-fRHfSbjks5sH0BYgaJpZM4OEve-> .

Sultan91 · 2017-12-22T15:05:12Z

Also interested in this question. In addition, it is not quite clear how text line ordering works

zuphilip added the ✨ enhancement label Dec 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paragraphs instead of lines in gpageseg #228

Paragraphs instead of lines in gpageseg #228

xdsv commented Jun 25, 2017

zuphilip commented Jun 26, 2017

xdsv commented Jun 26, 2017 via email

Sultan91 commented Dec 22, 2017

Paragraphs instead of lines in gpageseg #228

Paragraphs instead of lines in gpageseg #228

Comments

xdsv commented Jun 25, 2017

zuphilip commented Jun 26, 2017

xdsv commented Jun 26, 2017 via email

Sultan91 commented Dec 22, 2017