-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect alternative image (if present) #33
Comments
Indeed. Also applies to later processing steps, i.e. image manipulation steps operating on regions and lines. The only way to reference these additional image files is via If there is ambiguity (multiple alternative images available), maybe we should define rules to choose? We already have rules for |
@wrznr and I have given this some thought: There are preprocessing steps that must create new image data (because there is no other way to represent their result), like despeckling, dewarping and binarization. There are also steps that can, but could also just annotate the PAGE with enough information for later steps to apply them, e.g. deskewing (via But whatever the level, when descending to a lower level, all the annotated image preprocessing should be applied, because otherwise it would have to be repeated in all the constituent elements during the next step. Therefore, while generally it is for the processor to decide whether or not to create new image data, at the last step per level (typically binarization) it must be configured to do so. And every processor must be programmed to respect image data ( So PAGE+METS allows a very flexible generic workflow design. However, there is a subtelty in coordinate calculations involved here: Since And obviously, this would be difficult to do (and even more difficult to annotate) with non-linear transforms like dewarping. It is easier to live with that if dewarping is done on the line level (when only vertical coordinates will be off for words and glyphs) than on the page level. But for linear transforms this can be done easily:
I have implemented this for ocropy first. Functions in |
@kba @chreul What do you think? With permission from @wrznr I add this general workflow diagram for illustration of preprocessing options. |
principleSo, to rephrase the "subtelty": we have a principle at work here which states that coordinates within any problemsThis reproducibility priniple is currently jeopardized (in concept) by two problems:
dewarpingNow, as for 1, we could try to define a parametric field equivalent (within reasonable accuracy) to any conceivable binary dewarping transform. For example, let's assume the Leptonica approach has sufficient generality. It defines the transform as a vertical and horizontal disparity field, which is basically a (quadratic) parametric function of points interpolated between equidistant intervals. This can be described as two vectors each. So all we need is an attribute in PAGE for this, and consumers willing to perform the compensatory calculations on all coordinates after and below dewarping. We could of course use Or could we perhaps use rescalingRegarding problem 2, we now face the problem that a difference between actual binary size of @chris1010010 what do you think? (BTW, introcuding |
BTW, @cneud how does ALTO deal with this? Is |
Oh, while we are at it: there are two more points which might need disambiguation: A. If a region (or page) has non-zero |
According to the OCR-D functional model, binarization can take place prior to block and line segmentation. Both processing steps should use the alternative image (if present).
The text was updated successfully, but these errors were encountered: