Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

Decoding is slow when multiple languages are used #261

Closed
DorisGM opened this issue Apr 9, 2019 · 2 comments
Closed

Decoding is slow when multiple languages are used #261

DorisGM opened this issue Apr 9, 2019 · 2 comments

Comments

@DorisGM
Copy link

DorisGM commented Apr 9, 2019

Summary:
Decoding is slow when multiple languages are used.Can I dynamically switch languages to decode images? I want to support multi languages but only a language when decode image . Sometime eng or Sometime ara. Not one sentence include many languages.

Steps to reproduce the issue:

  1. I want to support multi languages but only a language when decode image . Sometime eng or Sometime ara. Not one sentence include many languages.
  2. I had init TessBaseApi by eng + ara + msa to decode several image which maybe english or arabic.
    3.When I init only english , It decoded image fast. But if I init TessBaseApi by eng + ara + msa, it decoded it very slow by a same English sentence.

Expected result:
I want when I init TessBaseApi by eng + ara + msa can fast as only init by eng. Or maybe I need to switch language dynamically by myself when I decode different language image. And If I switch init different language dynamically, whether it will influence decode performance and should I invoke TessBaseApi.clear before I switch.

Actual result:
Decoding is slow when multiple languages are used

Tess-two version:
9.0.0

Android version:
7.0.0

Phone/device model:
Android TV Amlogic 905X

Phone/device architecture (armeabi, armeabi-v7a, x86, mips, arm64-v8a, x86_64, mips64):
arm64-v8a

Link to training data used:
https://github.com/tesseract-ocr/tessdata/tree/3.04.00

Link to image used as input:

ott_subtitle jpg

@rmtheis
Copy link
Owner

rmtheis commented Apr 11, 2019

I don't have a good way to do it. As an interesting test, you could try running Firebase's language detection on the output of the English OCR and then run Arabic OCR if it isn't identified as English.

Note that msa is Malay and not Modern Standard Arabic.

Anyway, the slowness is a normal side effect and not really a bug in this project.

@rmtheis rmtheis closed this as completed Apr 11, 2019
@DorisGM
Copy link
Author

DorisGM commented Apr 11, 2019

Thanks for your reply, I switched init different language when OCR different language image。 It looks good.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants