Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy and speed tests #3402

Open
amitdo opened this issue Apr 21, 2021 · 2 comments
Open

Accuracy and speed tests #3402

amitdo opened this issue Apr 21, 2021 · 2 comments
Labels

Comments

@amitdo
Copy link
Collaborator

amitdo commented Apr 21, 2021

Related to discussion in: #707 (comment).

@stweil,

Can you contribute these images and their GT and release them under Apache 2.0 license?

Then, can you make a new daily CI test (for Ubuntu at least) that will include these images?

UNLV images could also be used as part of an accuracy test with CI.

@amitdo amitdo added the RFC label Apr 21, 2021
@stweil
Copy link
Contributor

stweil commented Apr 22, 2021

All images and the ground truth are already available online under the free license Public Domain Mark 1.0.

The script for the evaluation is also available online (do5.sh here, currently uses local filenames), and so is the list of GT files.

The URLs for a GT filename follow this pattern:

41989179X_0036.txt is at https://digi.bib.uni-mannheim.de/fileadmin/digi/41989179X/gt/41989179X_0036.txt, the image is at https://digi.bib.uni-mannheim.de/fileadmin/digi/41989179X/max/41989179X_0036.jpg.

@amitdo
Copy link
Collaborator Author

amitdo commented Dec 15, 2021

According to this report, there was a huge regression in accuracy and speed in version 3.03 compared to 3.02.02.

We need to evaluate Tesseract weekly using the UNLV dataset, in order to make sure there are no regression in performance. The legacy engine and the CRNN engine should be tested separately. This is very important TODO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants