-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
image_from_*: increase tolerance for size mismatch after rotation to 2px #371
Conversation
Codecov Report
@@ Coverage Diff @@
## master #371 +/- ##
======================================
Coverage 85.2% 85.2%
======================================
Files 30 30
Lines 1764 1764
Branches 341 341
======================================
Hits 1503 1503
Misses 210 210
Partials 51 51
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Hard to assess in a real workflow since I lack a good example where this is noticeable.
On the old GT bags (which included text) or Matthias' new pre-release ocrd-make -j4 -f gt-binarize-page-olena-sauvola-deskew-page-ocropy-clip-deskew-region-tesseract-resegment-dewarp-ocr-ocropy-tesseract.mk (after you installed workflow-configuration along with the processor modules and their dependencies) |
The setting works for all steps until dewarping (i.e. I do not see shifts which are larger than 2px.). But dewarping seems to introduce larger offsets which the recognition stage then complains about:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The increase of tolerance may not be large enough (cf. previous comment).
That is expected. Remember, we currently don't have a solution for keeping up coordinate consistency under dewarping (because we cannot annotate that transformation in PAGE-XML), not even for the simplistic center normalizer approach (which does add vertical padding on the average). It's not that big a problem for line-level dewarping though, because all segmentation that follows is word/glyph level (after OCR).
This is a different kind of tolerance (unrelated to rotation with its increase in canvas size). I don't think we should be boosting up that value to the padding of the center normalizer. After all, this is an error (however small or unavoidable it may be)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it that way. However, I still think that it is not good to make the user suffer from our conceptual shortcomings.
Well, the user here will be an early adopter who is prepared for trouble (or where to get help). And this error won't appear until you add dewarping to the workflow. And who knows what other errors might come up with the module processors – some of which shouldn't be ignored... The only chance I can see of removing this error is by annotating the coordinate transform from dewarping in PAGE-XML itself. For page-level dewarping, this would only be possible with the dewarping schema's And for line-level dewarping we need to annotate the vertical padding somehow – perhaps with |
If we have a decent line-level dewarping at hands (and the one you made available in
Sounds like a good plan. We could even put it into the corresponding Apart from that you have two approvals. Pls. merge. |
If we do page-level dewarping before the "original image", i.e. on an earlier image file group, with the dewarped image output as Even if we get (NN-based) line segmentation which is capable of finding good polygons even when images are warped heavily, line-level dewarping still cannot cope with horizontal compression. Plus I think we should think of this from the neutral perspective of the framework provider: enable everything, prescribe nothing.
I'm afraid that's not true:
... I like this option much better. It is more consistent with our previous decisions (not relying on
I'm hesitant to do this myself on core. @kba please do it when it best fits your workflow! |
Why would we not assume that dewarping took place when it is referenced in the comments?
No. With so many @kba Yeah, merge! |
It's not about whether dewarping took place, but that the increase in image size can be attributed entirely to that. That's probably a dangerous assumption. Another option would be to add another string to |
Are you saying that we should not use and rely on |
Pls. apologize that I messed up |
I am saying if we store information on padding for a specific |
Oh I see. But we already have the same ambiguity with deskewing: the page/region element has Thus, for dewarping, if we had e.g.
No need to apologize! I was proposing 2 alternative solutions. I'd still stick to |
But in contrast to deskewing, padding does not influence coordinates, right? I am not 100 % sure wrt. the current implementation, pls. correct me if I am wrong here: Deskewing has to be stored at the text line element to allow for ignoring |
Wrong: padding does add an offset.
You have to annotate But the true reason we need
Yes, it's probably to far fetched to tie So you convinced me that Or perhaps we could even carry this further (to cover page-level dewarping as well): |
OCR-D-wise, that is only consequent. The Funktionsmodell does not know deskewing at line level.
Granted. But may a dewarping processor change the coordinates of a line?
Quaint, you just convinced me that But at least we agree that we store the information at |
It needs to! At least the center normalizer principle needs extra vertical space to shift pixel columns along (otherwise it would have to shift foreground components out of the bbox). Page-level dewarping might need a margin, too.
It does. But then you get new problems: What if a processor does not want to receive
It is. But so far there has been no progress on OCR-D/spec#116. I think we should continue discussion here until we reach a consensus and then propose that in the spec issue (maybe along with |
Just to make this crystal-clear:
That is not what I want! We go with
Agreed. |
No! And that would not be correct either. The coordinates still reflect the original image, which we don't touch here. What changes are the pixel positions (relative coordinates) in the line image. And that means vertical shift due to padding most of the time, but not always (e.g. not when that x position gets shifted upwards maximally).
Splendid! The full blast ( |
If we do this, we'll also have to change the current implementation again, because passing affine transformations won't be enough anymore. (It will probably have to be a function or class object which can apply the transformation itself, with numpy But on the pro side we get rid of the problem entirely, and can keep up the functional model. (The above proposal to use the dewarping XSD prior to OCR-D-IMG being the only other way to solve this.) |
This was a huge misunderstanding and is even more reason to stay with
Personally, I would prefer the fully blasted version but I fear we will have difficulties to convince @kba and @cneud. Let us create a full example to allow for easier judgement. |
Yes, it's hard to even talk about this in the right way. I was led astray because you were parallelling this to deskewing (which doesn't change the absolute coordinates either, but the relative coordinates as well).
The latter. The
Agreed. Then IMO what we need now is a wrapper for libleptonica's page-level dewarping: it can provide us with a real-world example, and the first implementation (without paying attention to coordinates at all) could expose the monstrosity of the coordinate inconsistency problem it creates (although it does not prevent further processing per se, only relating results back to the original image). We could then add a serialisation of the scalar field in the above syntax, and a demo of how to apply this to the coordinate system. When approved, this demo code could later enter core's improved implementation. |
Fixes #367
Please test if this is sufficient on real workflows!