-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the option for character accumulated glyph confidences. #1851
Conversation
The parameter glyph_confidences is changed from bool to int. An execution with value 1 outputs the hOCR file enriched with glyph confidences for every timestep like before. An execution with value 2 outputs the timesteps accumulated over the recognized characters. Signed-off-by: Noah Metzger <[email protected]>
@@ -508,7 +508,7 @@ Tesseract::Tesseract() | |||
STRING_MEMBER(page_separator, "\f", | |||
"Page separator (default is form feed control character)", | |||
this->params()), | |||
BOOL_MEMBER(glyph_confidences, false, | |||
INT_MEMBER(glyph_confidences, 0, | |||
"Allows to include glyph confidences in the hOCR output", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noah, could you please add help information here on the valid values for glyph_confidences
?
Can you please do it today or tomorrow? |
Considerations for (re)naming
|
What is more: a function to retrieve the full matrix of LSTM predictions could be very helpful for applications like keyword spotting or post-correction. In contrast to |
+1 |
Thank for analyze. If you want to change name, please send PR ASAP. |
Ok then. I am not familiar with the release schedule, but I reckon it might take too long to get the additional |
I fine with it. Please send PR or post patch here for renaming ASAP. There is 1-2 open topics for 4.0.0 e.g. we would like to release it this or next week. |
Would it be fine to replace |
yes. |
How about |
I am not sure if |
Well I am certainly not an expert for Tesseract API, but (as stated above) "choices" is the term used for this so far, not "alternatives". And currently it can be both "symbols" or timesteps. Lastly, "mode" appears in various places. |
allright then |
The parameter glyph_confidences is changed from bool to int.
An execution with value 1 outputs the hOCR file enriched with glyph confidences
for every timestep like before. An execution with value 2 outputs the timesteps
accumulated over the recognized characters.
Signed-off-by: Noah Metzger [email protected]