Skip to content
This repository has been archived by the owner on Mar 17, 2022. It is now read-only.

Stop() does not work with GetUTF8Text() #185

Closed
0xbad1d3a5 opened this issue Dec 4, 2016 · 3 comments
Closed

Stop() does not work with GetUTF8Text() #185

0xbad1d3a5 opened this issue Dec 4, 2016 · 3 comments
Labels

Comments

@0xbad1d3a5
Copy link
Contributor

0xbad1d3a5 commented Dec 4, 2016

I been having trouble getting the stop() function to work when I call the GetUTF8Text() function. After digging around, the problem seems to stem from the fact that the GetUTF8Text() function in tesseract/baseapi.cpp itself does not set the monitor object:

char* TessBaseAPI::GetUTF8Text() {
  if (tesseract_ == NULL ||
      (!recognition_done_ && Recognize(NULL) < 0))
    return NULL;

It seems like at least in checkin 5d2e03b this was a feature, as you actually changed tesseract's baseapi to take in a monitor parameter, but this was removed due to issue #116. Now there's unused code in the nativeGetUTF8Text() function since it creates a unused ETEXT_DESC monitor struct and stop() no longer works with a GetUTF8Text() call.

I'm wondering what should tess-two actually do here? Should it mirror tesseract's API exactly and not provide a way to stop a GetUTF8Text() function call? Or should it be flexible and allow users to stop that call as well? I believe a easy fix if users should be allowed to stop a GetUTF8Text() recognition would simply be to call Recognize() before GetUTF8Text():

  ETEXT_DESC monitor;
  monitor.progress_callback = progressJavaCallback;
  monitor.cancel = cancelFunc;
  monitor.cancel_this = nat;
  monitor.progress_this = nat;

  nat->api.Recognize(&monitor);
  char *text = nat->api.GetUTF8Text();

Alternatively, tess-two can also provide native function to int Recognize(ETEXT_DESC* monitor); so that users may call nativeRecognize() before calling GetUTF8Text().

I'd be glad to send in a pull request for any of these suggestions as well as fixing the commented out test case TessBaseAPITest.testStop(). Otherwise, you might want to remove the unused monitor and perhaps add a comment documenting that stop() doesn't work with GetUTF8Text().

@rmtheis
Copy link
Owner

rmtheis commented Dec 5, 2016

Good catch. Thanks for looking into this, and for the thorough description.

We should remove the monitor from getUTF8Text() completely. That would match the Tesseract API. The ability to stop/monitor would still be available when using getHOCRText(), so removing the monitor properly wouldn't remove any functionality.

If you could send a pull request to remove the extraneous code, fix the test case to work with getHOCRText() if possible, and update the Javadoc with your suggestion that would be outstanding. That may fix #97 too. Otherwise I'll have a look at it when I get a chance.

@0xbad1d3a5
Copy link
Contributor Author

Cool, I'll look into submitting a pull request in the next couple of days then, thanks!

@rmtheis rmtheis added the bug label Dec 6, 2016
@rmtheis
Copy link
Owner

rmtheis commented Dec 8, 2016

Fixed in #186.

@rmtheis rmtheis closed this as completed Dec 8, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants