-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tesstrain.sh doesn't support vertical languages #2989
Comments
Vertical languages seem to be supported indirectly based on font names. Please see: tesseract/src/training/language-specific.sh Lines 862 to 869 in d8d2f6f
and tesseract/src/training/tesstrain_utils.sh Lines 274 to 281 in d8d2f6f
Try adding your font to the vertical fonts list as well as the language fonts list and try. |
@Shreeshrii so the _vert languages are made just by training on vertical fonts only or are there additional steps? |
I have not trained any CJK languages or any other scripts requiring vertical fonts. I just pointed out what I found by searching on vert in the training script. I suggest you give it a try. If you have more vertical fonts, they need to be added to both the lists. |
You can try contacting @zodiac3539 for pointers, see https://github.com/zodiac3539/jpn_vert |
I actually did try a couple months back but he doesn't wanna part with his secrets :) |
Having the same issue here. I tried adding the font to the vertical font list but all i get is:
Is it possible to train vertical languages? How was the jpn_vert.traineddata file in the tessdata_best repo made? |
Please see comment by Ray at #707 (comment) so, it's possible that the current code is using layout analysis for vertical text rather than a separate language. |
Environment
Current Behavior:
When passing a _vert language to tesstrain.sh in --lang
it throws an error:
as per this line
tesseract/src/training/language-specific.sh
Line 1170 in d8d2f6f
Expected Behavior:
Throw a more specific error:
ERROR: Error: vertical languages aren't supported
or add a config to generate data for vertical languages
Suggested Fix:
Add a case for _vert languages in https://github.com/tesseract-ocr/tesseract/blob/d8d2f6f48a8ddaf0b668eb1abf18fd6d08470041/src/training/language-specific.sh
The text was updated successfully, but these errors were encountered: