Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image too small to scale!! (2x48 vs min width of 3) #399

Open
mrobe opened this issue Sep 1, 2024 · 0 comments
Open

Image too small to scale!! (2x48 vs min width of 3) #399

mrobe opened this issue Sep 1, 2024 · 0 comments

Comments

@mrobe
Copy link

mrobe commented Sep 1, 2024

I've been getting very poor results with jpn_vert, but when training to improve it I'm getting a million errors (below).

I've checked all the existing issues on this here and tried all the suggestions, but I'm still stuck.

Trying this:

gmake training MODEL_NAME=jpn_vert_1 START_MODEL=jpn_vert FINETUNE_TYPE=Impact

I get endless errors like this:

Image data/jpn_vert_1-ground-truth/seg-017-0001.lstmf not trainable
Image too small to scale!! (2x48 vs min width of 3)
Line cannot be recognized!!
[...]

I don't understand this error, as none of my image files for training have these dimensions. They are all vertical strips of Japanese, generally around 120 x 3200 px, all PNG files at 600 dpi. Here is a ZIP (small) of my ground-truth folder.

EDIT: reading elsewhere on the site it sounds like tesstrain will scale every training image to 48px height — is that correct? If so, how should I train vertical Japanese? Rotate everything 90°? Otherwise, cutting the vertical images by hand would be a big pain. I know there are hocr tools for this but I have checked the hocr output and the bounding boxes cut through characters all the time (which I assume is why the OCR is giving poor results), so AFAICT that is really not a viable solution either.

How can I proceed?

My system: macOS Monterey v12.7.6.

tesseract --version
tesseract 5.4.1
   leptonica-1.84.1
   libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.11 : libwebp 1.4.0 : libopenjp2 2.5.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant