Skip to content

Document Layout Analysis - Text edges extractor#50

Merged
EliotJones merged 6 commits intoUglyToad:masterfrom
BobLd:master
Aug 8, 2019
Merged

Document Layout Analysis - Text edges extractor#50
EliotJones merged 6 commits intoUglyToad:masterfrom
BobLd:master

Conversation

@BobLd
Copy link
Copy Markdown
Collaborator

@BobLd BobLd commented Aug 6, 2019

Text edges are where words have either there BoundingBox's left, right or mid coordinates aligned on the same vertical line. Useful to detect tables, justified text, lists, etc.

Example uf use:

        using (PdfDocument document = PdfDocument.Open(path))
        {
            for (var i = 0; i < document.NumberOfPages; i++)
            {
                var page = document.GetPage(i + 1);
                var words = page.GetWords();
                var edges = TextEdgesExtractor.GetEdges(words);

                foreach (var edge in edges["left"])
                {
                    // Do something with left edges
                }

                foreach (var edge in edges["mid"])
                {
                    // Do something with mid edges
                }

                foreach (var edge in edges["right"])
                {
                    // Do something with right edges
                }
            }
        }

Example of result:
Random 2 Columns Lists
Blue: Left edge
Green: Mid edge (see dotted list)
Pink: Right edge

BobLd and others added 2 commits August 6, 2019 15:24
Text edges are where words have either there BoundingBox's left, right or mid coordinate aligned on the same vertical line.
Useful to detect tables, justified text, lists, etc.
Copy link
Copy Markdown
Member

@EliotJones EliotJones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thanks for taking the time to add it. Just some general comments I think the only one I'd hope to change is the using an enum as the public interface for this but if you disagree tell me so and I'll approve :)

@BobLd
Copy link
Copy Markdown
Collaborator Author

BobLd commented Aug 7, 2019

Should be good to go!

@EliotJones EliotJones merged commit fe270aa into UglyToad:master Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants