Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recognizes more characters than present #1362

Open
abieler opened this issue Mar 5, 2018 · 5 comments
Open

recognizes more characters than present #1362

abieler opened this issue Mar 5, 2018 · 5 comments

Comments

@abieler
Copy link

abieler commented Mar 5, 2018

  • Tesseract Version: 4.00 (with tessdata_best, lstm-only, lang=deu+eng)
  • Platform: arch linux 64 bit

Current Behavior:

With psm=7 (single text line):
For the thresholded image below recognizes "2427. 50" instead of "24. 50", for the original image recognizes "242. 50".

For psm=1 or 3 it complains about "empty page" for both images.

timg

img

@Shreeshrii
Copy link
Collaborator

@zdenop Label with

4.0x
Accuracy

@Shreeshrii
Copy link
Collaborator

Shreeshrii commented Apr 17, 2019

With current code, correct output with tessdata and tessdata_best with psm 6 or 7 or 8 for first image.

tesseract 4.1.0-rc1-255-g332a1
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0

*****  1362-1.jpg OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
42.50
**** PSM 7 ****
42.50
**** PSM 8 ****
42.50

*****  1362-1.jpg OEM 1 LANG eng TESSDATA tessdata
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
42.50
**** PSM 7 ****
42.50
**** PSM 8 ****
42.50

*****  1362-1.jpg OEM 1 LANG eng TESSDATA tessdata_fast
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
A>. 50
**** PSM 7 ****
A>. 50
**** PSM 8 ****
a>. 50

*****  1362-2.jpg OEM 1 LANG eng TESSDATA tessdata_best
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
A422. 50)
**** PSM 7 ****
A422. 50)
**** PSM 8 ****
42. 50)

*****  1362-2.jpg OEM 1 LANG eng TESSDATA tessdata
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
422. 50)
**** PSM 7 ****
422. 50)
**** PSM 8 ****
42. 550)

*****  1362-2.jpg OEM 1 LANG eng TESSDATA tessdata_fast
**** PSM 3 ****
Empty page!!
Empty page!!
**** PSM 6 ****
A>. 50
**** PSM 7 ****
A>. 50
**** PSM 8 ****
427.50

@daviddphillips
Copy link

Here is an example of double characters when english language is used, and correct when another language (french) is used. I've tried the fast and best tessdata and also different versions of tesseract. I've seen this behavior with lots of images, specifically when there is a mixture of letters and numbers in a word.

The file is called EVO_Double.png:
EVO_Double

tesseract EVO_Double.png stdout -l fra
Violation # N/A-56890 | License Plate EVO11B — Amount Due 78.75 Payment Due By 05/08/2019

tesseract EVO_Double.png stdout -l eng
Violation # N/A-56890 License Plate EV0O11B Amount Due 78.75 Payment Due By 05/08/2019

@stweil
Copy link
Contributor

stweil commented Nov 9, 2020

Latest Tesseract and release 4.1.1 recognize Violation # N/A-56890 License PlateEV011B Amount Due 78.75. Payment Due By 05/08/2019.

@woodjohndavid
Copy link

I have just created pull request #4211 which I consider to be an improved solution for diplopia.

I encourage everyone on this trail to try this out and test it with as broad a range of cases as possible.

Note by the way, there are some new configuration values that can only be set in code as things stand. These configuration values are:

bool kRemoveDiplopia - if true, enables diplopia removal functionality. If false, my changes have no effect
int kMaxDiplopiaGap - maximum number of timesteps apart to be considered diplopia, default 2

Obviously if my diplopia change is of value, then these configuration items should be made into settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants