-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating ALTO [enhancement] #419
Comments
Yes I have and the generated ALTO isn't valid. It can't be imported to the software I use and also the validator ocr-validate says it's not valid. |
Could you please create an issue for that project then and add more details about the problems which you encountered? |
A contribution of direct ALTO support would be welcome. Recommend implementing in a separate file api/altorenderer.cpp rather than adding to api/baseapi.cpp. |
There is now initial support for ALTO output in latest Git master. I keep this issue open nevertheless until issue altoxml/schema#54 was solved and more testing with Tesseract + ALTO was done. |
@stweil ResultIteratorTest shows that tesseract can identify superscripts, subscripts, small caps and drop caps. Alto schema seems to support these - see Does tesseract ALTO output include such Font styles? |
…a-dir option git-svn-id: https://tesseract-ocr.googlecode.com/svn/trunk@907 d0cd1f9f-072b-0410-8dd7-cf729c803f20
This feature request was implemented a long time ago. |
Are there any plans for creating ALTO support in tesseract? I was thinking about programming a module for it. I was searching some conversion tool but found nothing working. I need it to be working in linux terminal, I found just conversion tool from hOCR to ALTO but the output is wrong. It would be also better if tesseract would generate ALTO.
The text was updated successfully, but these errors were encountered: