You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there a way to get paragraphs instead of lines in ocropy-gpageseg?
If that's not possible, from what I saw only the .pseg file contains the paragraph information encoded as a different color. Is there a utility to extract that information?
The text was updated successfully, but these errors were encountered:
No, the page segmentation will always output lines. However during this process it tries to identify the columns (I am not sure that also paragraphs are identified in this step) and this information is saved in the pseg files, as you noted already. See in the wiki for more information about the file format.
Maybe try also to go further and create a hocr file with ocropus-hocr which should contain some paragraphs splits as <p />.
Is there a way to get paragraphs instead of lines in ocropy-gpageseg?
If that's not possible, from what I saw only the .pseg file contains the paragraph information encoded as a different color. Is there a utility to extract that information?
The text was updated successfully, but these errors were encountered: