Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

Arabic trained-data produce 20% accuracy #250

Closed
ibrahimAlii opened this issue Sep 11, 2018 · 2 comments
Closed

Arabic trained-data produce 20% accuracy #250

ibrahimAlii opened this issue Sep 11, 2018 · 2 comments

Comments

@ibrahimAlii
Copy link

Summary:

When I use english data It's worked very well, but when I use arabic it's required to copy all cube data and also produced in bad quality.

Steps to reproduce the issue:

  1. Input any arabic digits/words.
  2. Get the Utf8Text()

Expected result:
I should get correct data.

Actual result:
I got wired result.

Tess-two version:
8.0.0

Android version:
28

Phone/device model:
Pixel

Link to training data used:
https://github.com/tesseract-ocr/tessdata/blob/3.04.00/ara.traineddata

Link to image used as input:

http://3.bp.blogspot.com/-CZRdjlj2ybU/TkAbU6C4RWI/AAAAAAAAAAw/n4Hej0ct3rw/s1600/ind.jpg

@rmtheis
Copy link
Owner

rmtheis commented Sep 13, 2018

Thanks for the bug report. It's not entirely clear to me what the problem is because you just said you get a "weird result." Maybe try different page segmentation modes and try using different portions of the input image.

Most likely your issue is not a bug and this is working as intended.

@rmtheis rmtheis closed this as completed Sep 13, 2018
@ibrahimAlii
Copy link
Author

@rmtheis Please check below image

screenshot_1537628905

The result is should be like picture I got some arabic character instead of digits, also there is digits like one and three i got nine instead of one and "ها" instead of three and "هلا" instead of six.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants