Detect font style attributes #305

alexgg94 · 2020-10-16T10:33:58Z

Hi,
I'm using image_to_data method to extract text from images and organize it into a pandas dataframe.
I was just wondering if there is any way to make image_to_data method also give me information about the word font style attributes (bold, italic, font size...).

Cheers

bozhodimitrov · 2020-10-16T16:29:26Z

HI @alexgg94
Is it possible for tesseract itself to report this information via additional option or configuration?
Because pytesseract is just a wrapper and if tesseract doesn't support this option itself, it will be a lot harder to implement additional logic to provide this attributes.

Another downside could be, that extracting additional information like font attributes may require executing tesseract multiple times, which can make image_to_data very slow.

alexgg94 · 2020-10-16T17:54:22Z

Hi,

Tesseract itself can provide this logic by using WordFontAttributes method defined in ResultIterator.
I agree that this functionality could penalize time wise, but if somehow, it could be used along side image_to_data it would be so cool.

Just wondering if pytesseract right now can offer this feature or if it has been considered somehow.

PD. image_to_data makes my life so much easier, that's why I'm asking 😄

Cheers

bozhodimitrov · 2020-10-16T18:52:49Z

It seems that there is a problem with WordFontAttributes in the new engine: tesseract-ocr/tesseract#1074
A.k.a - this option might be available only with the old (original) Tesseract engine (config option --oem 0)

You can also try to use the more advanced wrapper tesserocr, but I am not sure if it supports it either.

bozhodimitrov closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect font style attributes #305

Detect font style attributes #305

alexgg94 commented Oct 16, 2020

bozhodimitrov commented Oct 16, 2020

alexgg94 commented Oct 16, 2020

bozhodimitrov commented Oct 16, 2020 •

edited

Loading

Detect font style attributes #305

Detect font style attributes #305

Comments

alexgg94 commented Oct 16, 2020

bozhodimitrov commented Oct 16, 2020

alexgg94 commented Oct 16, 2020

bozhodimitrov commented Oct 16, 2020 • edited Loading

bozhodimitrov commented Oct 16, 2020 •

edited

Loading