Decoding is slow when multiple languages are used #261

DorisGM · 2019-04-09T02:03:22Z

Summary:
Decoding is slow when multiple languages are used.Can I dynamically switch languages to decode images？ I want to support multi languages but only a language when decode image . Sometime eng or Sometime ara. Not one sentence include many languages.

Steps to reproduce the issue:

I want to support multi languages but only a language when decode image . Sometime eng or Sometime ara. Not one sentence include many languages.
I had init TessBaseApi by eng + ara + msa to decode several image which maybe english or arabic.
3.When I init only english , It decoded image fast. But if I init TessBaseApi by eng + ara + msa, it decoded it very slow by a same English sentence.

Expected result:
I want when I init TessBaseApi by eng + ara + msa can fast as only init by eng. Or maybe I need to switch language dynamically by myself when I decode different language image. And If I switch init different language dynamically, whether it will influence decode performance and should I invoke TessBaseApi.clear before I switch.

Actual result:
Decoding is slow when multiple languages are used

Tess-two version:
9.0.0

Android version:
7.0.0

Phone/device model:
Android TV Amlogic 905X

Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64):
arm64-v8a

Link to training data used:
https://github.com/tesseract-ocr/tessdata/tree/3.04.00

Link to image used as input:

rmtheis · 2019-04-11T02:45:33Z

I don't have a good way to do it. As an interesting test, you could try running Firebase's language detection on the output of the English OCR and then run Arabic OCR if it isn't identified as English.

Note that msa is Malay and not Modern Standard Arabic.

Anyway, the slowness is a normal side effect and not really a bug in this project.

DorisGM · 2019-04-11T03:29:34Z

Thanks for your reply， I switched init different language when OCR different language image。 It looks good.

rmtheis closed this as completed Apr 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding is slow when multiple languages are used #261

Decoding is slow when multiple languages are used #261

DorisGM commented Apr 9, 2019

rmtheis commented Apr 11, 2019

DorisGM commented Apr 11, 2019

Decoding is slow when multiple languages are used #261

Decoding is slow when multiple languages are used #261

Comments

DorisGM commented Apr 9, 2019

rmtheis commented Apr 11, 2019

DorisGM commented Apr 11, 2019