Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An option to output scores and alternate chars. #25

Closed
wants to merge 1 commit into from

Conversation

danvk
Copy link
Contributor

@danvk danvk commented Jan 14, 2015

When requested, this data is written into a .alts.json file alongside the line image.

For example:

$ ./ocropus-rpred --alternates -n -m models/en-default.pyrnn.gz book/0001/010004.bin.png
$ cat book/0001/010004.alts.json
[
  [
    [
      "a",
      0.78196890563642629
    ],
    [
      "s",
      0.33257442535877763
    ]
  ],
  ...
]

Fixes #16

When requested, this data is written into a '.alts.json' file alongside
the line image.
@tmbdev
Copy link
Collaborator

tmbdev commented Jan 29, 2015

Thanks for giving this a try. I don't think this works very well, though. Alternative characters frequently don't occur in the same place, so your code will fail to pick up important recognition alternatives. For example, the alternative to an "m" is an "rn", but neither the "r" nor the "n" are going to be where the "m" is. At the very least, you'd have to output all candidates as hypotheses with potentially overlapping bounding boxes. The old recognizer had a .boxes format for that, and if one wanted to do that, it would probably be best to stick with the same format.

A better solution is to output a recognition lattice. The best format for that is probably OpenFST binary or text format.

A good way to output a recognition lattice is to take the posterior probability at each output x coordinate, and building a WFST out of that (strictly linear, c transitions between state x and state x+1, where c is the number of classes). That can then be matched against a simple model of the form ε+([^ε]+ε+)* The result can be thresholded and then output as a recognition lattice. Instead of the unconstrained model, you can also use a simple bigraph or trigraph model to already pre-select reasonable interpretations of the input.

That can be done pretty easily using pyopenfst. In fact, I used to have code like that for LSTM decoding, but I can't find it anymore. The resulting .fst files can also be used with the existing OCRopus language modeling.

@danvk
Copy link
Contributor Author

danvk commented Jan 29, 2015

I'll close this out then -- thanks for the details. It would be nice to get information on alternatives.

Would the scores output by this make sense if you only looked at the top match? In the example above, would it be fair to say that there's a 78% chance that there's an a?

One thought I had was to use these probabilities to gauge whether a line was upside-down. I'd run the same model on the original line image & a flipped line image and look for high probabilities of asymmetric letters like A and v.

@danvk danvk closed this Jan 29, 2015
@tmbdev
Copy link
Collaborator

tmbdev commented Jan 29, 2015

We've tried this and it works very well: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5413722&tag=1

The best way is still to do what I suggested: generate the FST corresponding to each text line, then match against a language model, look at the cost of a match.

It's not that hard to do, but if you aren't familiar with pyopenfst, it may be some overhead getting started.

These two notebooks explain it a little:

http://nbviewer.ipython.org/github/tmbdev/teaching-nlpa/blob/master/nlpa-openfst.ipynb

http://nbviewer.ipython.org/github/tmbdev/teaching-nlpa/blob/master/nlpa-openfst2.ipynb

A good way to enable experimentation might simply be to save the undecoded LSTM output as a PNG image. That way, people can write lots of post-processing scripts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata about detected characters: quality scores + alternatives
2 participants