-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect font style attributes #305
Comments
HI @alexgg94 Another downside could be, that extracting additional information like font attributes may require executing tesseract multiple times, which can make image_to_data very slow. |
Hi, Tesseract itself can provide this logic by using WordFontAttributes method defined in ResultIterator. Just wondering if pytesseract right now can offer this feature or if it has been considered somehow. PD. image_to_data makes my life so much easier, that's why I'm asking 😄 Cheers |
It seems that there is a problem with WordFontAttributes in the new engine: tesseract-ocr/tesseract#1074 You can also try to use the more advanced wrapper tesserocr, but I am not sure if it supports it either. |
Hi,
I'm using image_to_data method to extract text from images and organize it into a pandas dataframe.
I was just wondering if there is any way to make image_to_data method also give me information about the word font style attributes (bold, italic, font size...).
Cheers
The text was updated successfully, but these errors were encountered: