Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image_from_*: increase tolerance for size mismatch after rotation to 2px #371

Merged
merged 1 commit into from
Dec 18, 2019

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Dec 9, 2019

Fixes #367

Please test if this is sufficient on real workflows!

@bertsky bertsky requested review from kba and wrznr December 9, 2019 14:49
@codecov-io
Copy link

codecov-io commented Dec 9, 2019

Codecov Report

Merging #371 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master    #371   +/-   ##
======================================
  Coverage    85.2%   85.2%           
======================================
  Files          30      30           
  Lines        1764    1764           
  Branches      341     341           
======================================
  Hits         1503    1503           
  Misses        210     210           
  Partials       51      51
Impacted Files Coverage Δ
ocrd/ocrd/workspace.py 52.98% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 25c4dd8...cbc6add. Read the comment docs.

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Hard to assess in a real workflow since I lack a good example where this is noticeable.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 9, 2019

Hard to assess in a real workflow since I lack a good example where this is noticeable.

On the old GT bags (which included text) or Matthias' new pre-release data_structure_text:

ocrd-make -j4 -f gt-binarize-page-olena-sauvola-deskew-page-ocropy-clip-deskew-region-tesseract-resegment-dewarp-ocr-ocropy-tesseract.mk

(after you installed workflow-configuration along with the processor modules and their dependencies)

@wrznr
Copy link
Contributor

wrznr commented Dec 10, 2019

Please test if this is sufficient on real workflows!

The setting works for all steps until dewarping (i.e. I do not see shifts which are larger than 2px.). But dewarping seems to introduce larger offsets which the recognition stage then complains about:

09:20:24.595 INFO processor.TesserocrRecognize - Using model 'GT4HistOCR+frk+Latin' in /usr/local/share/tessdata/ for recognition at the line level
09:20:24.642 INFO processor.TesserocrRecognize - INPUT FILE 0 / FILE_0003
09:20:24.676 INFO processor.TesserocrRecognize - Processing page 'FILE_0003'
09:20:24.735 ERROR ocrd.workspace - segment "Page1_Block2_line0000" image (binarized,dewarped; 109x72) has not been cropped properly (109x49)
09:20:24.813 ERROR ocrd.workspace - segment "Page1_Block3_line0000" image (binarized,dewarped; 47x45) has not been cropped properly (47x31)
09:20:24.876 ERROR ocrd.workspace - segment "Page1_Block5_line0000" image (binarized,deskewed,dewarped; 266x40) has not been cropped properly (266x37)
09:20:25.087 ERROR ocrd.workspace - segment "Page1_Block5_line0001" image (binarized,deskewed,dewarped; 1016x50) has not been cropped properly (1016x47)
09:20:25.260 ERROR ocrd.workspace - segment "Page1_Block5_line0002" image (binarized,deskewed,dewarped; 1105x48) has not been cropped properly (1105x39)
09:20:25.862 ERROR ocrd.workspace - segment "Page1_Block5_line0003" image (binarized,deskewed,dewarped; 1035x50) has not been cropped properly (1035x59)
09:20:26.408 ERROR ocrd.workspace - segment "Page1_Block5_line0004" image (binarized,deskewed,dewarped; 132x52) has not been cropped properly (132x32)
...

Copy link
Contributor

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The increase of tolerance may not be large enough (cf. previous comment).

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 10, 2019

But dewarping seems to introduce larger offsets which the recognition stage then complains about:

That is expected. Remember, we currently don't have a solution for keeping up coordinate consistency under dewarping (because we cannot annotate that transformation in PAGE-XML), not even for the simplistic center normalizer approach (which does add vertical padding on the average). It's not that big a problem for line-level dewarping though, because all segmentation that follows is word/glyph level (after OCR).

The increase of tolerance may not be large enough (cf. previous comment).

This is a different kind of tolerance (unrelated to rotation with its increase in canvas size). I don't think we should be boosting up that value to the padding of the center normalizer. After all, this is an error (however small or unavoidable it may be)!

Copy link
Contributor

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put it that way. However, I still think that it is not good to make the user suffer from our conceptual shortcomings.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 10, 2019

Well, the user here will be an early adopter who is prepared for trouble (or where to get help). And this error won't appear until you add dewarping to the workflow. And who knows what other errors might come up with the module processors – some of which shouldn't be ignored...

The only chance I can see of removing this error is by annotating the coordinate transform from dewarping in PAGE-XML itself.

For page-level dewarping, this would only be possible with the dewarping schema's GridType (i.e. in a separate XML file).

And for line-level dewarping we need to annotate the vertical padding somehow – perhaps with @custom (at least provisionally). Or we just assume that the increased height of the derived image distributes evenly at top and bottom, and consider dewarped in AlternativeImage/@comments sufficient indication that the transform actually happened.

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

page-level dewarping

If we have a decent line-level dewarping at hands (and the one you made available in ocrd_cis works pretty good most of the times), I do not see the need for page-level dewarping. Especially since it destroys the “reconstructability” at the highest processing level.

line-level dewarping perhaps with @Custom

Sounds like a good plan. We could even put it into the corresponding AlternativeImage instance(s).

Apart from that you have two approvals. Pls. merge.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

page-level dewarping

If we have a decent line-level dewarping at hands (and the one you made available in ocrd_cis works pretty good most of the times), I do not see the need for page-level dewarping. Especially since it destroys the “reconstructability” at the highest processing level.

If we do page-level dewarping before the "original image", i.e. on an earlier image file group, with the dewarped image output as OCR-D-IMG and pagecontent XML referencing it under @imageFilename, then we should be good: the exporter/viewer can then decide to either present the results w.r.t. OCR-D-IMG only, or invert the results via dewarping XML Grid and present them w.r.t. the true original image.

Even if we get (NN-based) line segmentation which is capable of finding good polygons even when images are warped heavily, line-level dewarping still cannot cope with horizontal compression. Plus I think we should think of this from the neutral perspective of the framework provider: enable everything, prescribe nothing.

line-level dewarping perhaps with @Custom

Sounds like a good plan. We could even put it into the corresponding AlternativeImage instance(s).

I'm afraid that's not true: AlternativeImageType does not have @custom or any such thing. We'd have to use TextLine/@custom. But...

Or we just assume that the increased height of the derived image distributes evenly at top and bottom, and consider dewarped in AlternativeImage/@comments sufficient indication that the transform actually happened.

... I like this option much better. It is more consistent with our previous decisions (not relying on @custom), and would not even require re-calculating annotations (but merely re-interprete the data).

Apart from that you have two approvals. Pls. merge.

I'm hesitant to do this myself on core. @kba please do it when it best fits your workflow!

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

I like this option much better

Why would we not assume that dewarping took place when it is referenced in the comments?

We'd have to use TextLine/@custom. But...

No. With so many AlternativeImages present on line level, that is not a good idea.

@kba Yeah, merge!

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

Why would we not assume that dewarping took place when it is referenced in the comments?

It's not about whether dewarping took place, but that the increase in image size can be attributed entirely to that. That's probably a dangerous assumption.

Another option would be to add another string to @comments, e.g. padded-12 when dewarping added 6px at top and bottom.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

We'd have to use TextLine/@custom. But...

No. With so many AlternativeImages present on line level, that is not a good idea.

Are you saying that we should not use and rely on TextLine/@custom (e.g. vpadding=12) to give us the offset shift when we see dewarped in TextLine/AlternativeImage/@comments?

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

Pls. apologize that I messed up custom and comments attribute. But an additional comment was exactly what I thought you were proposing earlier. It could also be dewarped-12 (vs. dewarped-0).

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

I am saying if we store information on padding for a specific AlternativeImage it would have to be at the AlternativeImage (and not at the corresponding text line element)

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

I am saying if we store information on padding for a specific AlternativeImage it would have to be at the AlternativeImage (and not at the corresponding text line element)

Oh I see. But we already have the same ambiguity with deskewing: the page/region element has @orientation, but we can have multiple images with and without deskewed in its @comments. Some of these could even have been from an earlier deskewing step with a different @orientation value that got overriden later-on (without removing the old image reference). That's why we always take the last image consistent with all the attributes we want.

Thus, for dewarping, if we had e.g. TextLine/@custom="vpadding=12" and various derived images under TextLine/AlternativeImage, we'd take the last one consistent with the feature selector/filter, and then undo the vpadding iff it contains dewarped.

Pls. apologize that I messed up custom and comments attribute. But an additional comment was exactly what I thought you were proposing earlier. It could also be dewarped-12 (vs. dewarped-0).

No need to apologize! I was proposing 2 alternative solutions. I'd still stick to dewarped (since we already have it in the spec) though, just back it with an additional feature (and padded-N has the benefit that we could also use it independent of dewarping).

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

But we already have the same ambiguity with deskewing

But in contrast to deskewing, padding does not influence coordinates, right? I am not 100 % sure wrt. the current implementation, pls. correct me if I am wrong here: Deskewing has to be stored at the text line element to allow for ignoring AlternativeImages and still being able to extract line-level images from the original page image. I.e. what does it help to have the information that one of the AlternativeImages is padded at TextLine?

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

But we already have the same ambiguity with deskewing

But in contrast to deskewing, padding does not influence coordinates, right?

Wrong: padding does add an offset.

pls. correct me if I am wrong here: Deskewing has to be stored at the text line element to allow for ignoring AlternativeImages and still being able to extract line-level images from the original page image.

You have to annotate @orientation in order to be able to (re)produce deskewed derived images – so yes. But PAGE-XML has @orientation only at PageType and RegionType, so no.

But the true reason we need @orientation is that there is no other way to infer the coordinate transform which belongs to the deskewed images (regardless of who produces them).

I.e. what does it help to have the information that one of the AlternativeImages is padded at TextLine?

Yes, it's probably to far fetched to tie @custom/vpadding to AlternativeImage/@comment/dewarped in the same way we already tie @orientation to AlternativeImage/@comment/deskewed. Plus, if we want to allow filtering or selecting images with dewarped, then we want that to precisely match those images that are padded due to dewarping.

So you convinced me that @comments/dewarped-12 is the best representation.

Or perhaps we could even carry this further (to cover page-level dewarping as well): 'dewarped' ( '-' x_i ':' y_i '=' x_o ':' y_o)* e.g. for line-level dewarping dewarped-0:0=0:6-0:48=0:54-1:0=1:4-1:48=1:52-2:0=2:3-2:48=2:51-3:0=3:2-3:48=3:50 etc.

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

PAGE-XML has @orientation only at PageType and RegionType

OCR-D-wise, that is only consequent. The Funktionsmodell does not know deskewing at line level.

Wrong: padding does add an offset.

Granted. But may a dewarping processor change the coordinates of a line?

you convinced me that @comments/dewarped-12 is the best representation

Quaint, you just convinced me that padded-12 is the best solution since it generalizes to other possible processors (e.g. ocrd-image-pad).

But at least we agree that we store the information at AlternativeImage. That is an issue for the specs.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

Wrong: padding does add an offset.

Granted. But may a dewarping processor change the coordinates of a line?

It needs to! At least the center normalizer principle needs extra vertical space to shift pixel columns along (otherwise it would have to shift foreground components out of the bbox). Page-level dewarping might need a margin, too.

Quaint, you just convinced me that padded-12 is the best solution since it generalizes to other possible processors (e.g. ocrd-image-pad).

It does. But then you get new problems: What if a processor does not want to receive dewarped images – do we suppress padded-* as well? Moreover, to be exact, dewarping does not just pad, it shifts each column by a different amount. So that's already different to what ocrd-image-pad might do. Plus: What's the use-case for some ocrd-image-pad – can't we just extend the polygon when we want more space (as ocrd-tesserocr-segment-region with padding>0 does)?

But at least we agree that we store the information at AlternativeImage. That is an issue for the specs.

It is. But so far there has been no progress on OCR-D/spec#116. I think we should continue discussion here until we reach a consensus and then propose that in the spec issue (maybe along with rescaled so we can perspectively get this fixed completely).

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

It needs to!

Just to make this crystal-clear: ocrd-cis-ocropy-dewarp changes the Coords element of the line it dewarps?

But then you get new problems

That is not what I want! We go with dewarped-amount_of_padding then.

I think we should continue discussion here until we reach a consensus

Agreed.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

Just to make this crystal-clear: ocrd-cis-ocropy-dewarp changes the Coords element of the line it dewarps?

No! And that would not be correct either. The coordinates still reflect the original image, which we don't touch here. What changes are the pixel positions (relative coordinates) in the line image. And that means vertical shift due to padding most of the time, but not always (e.g. not when that x position gets shifted upwards maximally).

We go with dewarped-amount_of_padding then.

Splendid! The full blast (amount_of_padding as input-output grid) or the specialized version we started from (amount_of_padding as vertical padding at top and bottom on average)?

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

Or perhaps we could even carry this further (to cover page-level dewarping as well): 'dewarped' ( '-' x_i ':' y_i '=' x_o ':' y_o)* e.g. for line-level dewarping dewarped-0:0=0:6-0:48=0:54-1:0=1:4-1:48=1:52-2:0=2:3-2:48=2:51-3:0=3:2-3:48=3:50 etc.

If we do this, we'll also have to change the current implementation again, because passing affine transformations won't be enough anymore. (It will probably have to be a function or class object which can apply the transformation itself, with numpy matmul when it's just affine, or with bezier splines etc when it includes dewarping. This will probably always add an extra computational cost.)

But on the pro side we get rid of the problem entirely, and can keep up the functional model. (The above proposal to use the dewarping XSD prior to OCR-D-IMG being the only other way to solve this.)

@wrznr
Copy link
Contributor

wrznr commented Dec 11, 2019

The coordinates still reflect the original image

This was a huge misunderstanding and is even more reason to stay with AlternativeImage. We already reached agreement here. A side question: Does deskewing on region level touch the Coords or only write the orientation attribute and an AlternativeImage (I assume the latter)?

full blast or average

Personally, I would prefer the fully blasted version but I fear we will have difficulties to convince @kba and @cneud. Let us create a full example to allow for easier judgement.

@bertsky
Copy link
Collaborator Author

bertsky commented Dec 11, 2019

The coordinates still reflect the original image

This was a huge misunderstanding and is even more reason to stay with AlternativeImage.

Yes, it's hard to even talk about this in the right way. I was led astray because you were parallelling this to deskewing (which doesn't change the absolute coordinates either, but the relative coordinates as well).

A side question: Does deskewing on region level touch the Coords or only write the orientation attribute and an AlternativeImage (I assume the latter)?

The latter. The @orientation will merely influence at what angle a bounding box is constructed around the annotated polygon.

full blast or average

Personally, I would prefer the fully blasted version but I fear we will have difficulties to convince @kba and @cneud. Let us create a full example to allow for easier judgement.

Agreed.

Then IMO what we need now is a wrapper for libleptonica's page-level dewarping: it can provide us with a real-world example, and the first implementation (without paying attention to coordinates at all) could expose the monstrosity of the coordinate inconsistency problem it creates (although it does not prevent further processing per se, only relating results back to the original image). We could then add a serialisation of the scalar field in the above syntax, and a demo of how to apply this to the coordinate system. When approved, this demo code could later enter core's improved implementation.

@kba kba merged commit 5ae2d56 into OCR-D:master Dec 18, 2019
@bertsky bertsky deleted the increase-image-size-tolerance branch October 8, 2020 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

image API: increase tolerance for coordinate checks
4 participants