-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
alto to text: too many spaces #129
Comments
You mean every |
I don't know where the two spaces exactly come from, but there should only 1 I'd say. |
ALTO-to-Text transformation is using @filak's XSLT (https://github.com/filak/hOCR-to-ALTO/blob/master/alto__text.xsl), this needs to be fixed upstream. Can you open an issue there as well pls? |
opened: filak/hOCR-to-ALTO#22 |
fix: #130 |
So is this issue fixed and can it be closed? |
Yes, it has been fixed. If you have an older installation of ocr-fileformat (before feb 2021), you'll need to re-clone hOCR-to-ALTO:
(we should really use git submodules to make tracking changes and updating easier) |
Example alto excerpt:
converts to text
The text was updated successfully, but these errors were encountered: