-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix image API (for rotation) #327
Conversation
do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency channel if the input mode allows it
(image_from_polygon), keep the input image mode; moreover, add a transparency channel if the input image allows it
- image_from_page, image_from_segment, image_from_polygon: add parameter ``fill`` - possible values white/background/transparent, with ``transparent`` (behaviour introduced by this branch) as default
- image_from_page, image_from_segment, image_from_polygon: add parameter ``transparency``, independent of ``fill`` - an alpha channel with the mask will be added iff ``transparency``, colour in ``fill`` will be used regardless (for consumers which cannot handle alpha channels)
…b.com/bertsky/core into rotate-with-background-and-transparency
- image_from_polygon: regardless of the ``transparency`` parameter, if the input already has an alpha channel, then shrink its mask from the polygon mask
- for converting to/from relative coordinates on each level, instead of passing on offsets and angles along with images (which would actually have to be stored and applied for all levels monotonically, but was only implemented for the previous level), propagate one single affine coordinate transformation (which can be composed via matrix multiplication and inverted via matrix inversion) - encapsulate image rotation, coordinate rotation, coordinate translation, offset calculation for coordinate rotation, application of coordinate transformation - use the same feature selectors/filters for coordinate transforms as for image operations - crop to the bounding box of the rotated polygon, not of the original; likewise, calculate rotation center and rotation offset (for regions) based on such bbox (whether it was used for AlternativeImage or not) - break API for callers that expected the bounding box of the original (this was incorrect; callers must likewise crop via relative bboxes) - warn if actual image size after rotation or from AlternativeImage does not fit calculated image size, but keep going with the calculated offset - page and region level orientation are not described additive-relative but supplantive-absolute; thus, rotation must be differential when both apply - bbox_from_polygon: intermediate coordinates can be negative - image_from_polygon: allow passing any fillcolor - improve docstrings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU, looks great!
The float-integer conversion errors problems for CI have to be handled before merging. |
Codecov Report
@@ Coverage Diff @@
## master #327 +/- ##
==========================================
- Coverage 91.12% 85.61% -5.51%
==========================================
Files 30 30
Lines 1622 1731 +109
Branches 313 333 +20
==========================================
+ Hits 1478 1482 +4
- Misses 108 204 +96
- Partials 36 45 +9
Continue to review full report at Codecov.
|
Fixed. |
Oh, I just remembered I still wanted to fix
Should we still allow processors to use rotation for larger orientation angles (e.g. when not using core API to create the images)? In that case, maybe |
BTW, is this the correct way of making a new PR on top of an old one (#311)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks great, just confused by the schema changes.
Also basic tests for the utility functions at least would be helpful as companions to the docstrings. |
Absolutely. All of this is in desparate need for extensive tests. So many things can go wrong here – believe me! But it's a lot of additional effort, and one way or the other this will add a dependency on assets. |
So, could we please postpone adding automated tests one more time? (I believe it is urgent we roll out, because others will soon start using the image API, and we don't want to disappoint or confuse them with inaccuracies after rotation. Also, the later this arrives, the more existing code will probably break.) I still need your opinions whether I should take care of the remaining problem of flipping/transposing here (or in a separate PR later). |
@bertsky Do it here and if possible we should strictly forbid skew angles >= 90° and defer them to the rotation feature. |
PAGE-XML's Do I read you correctly as saying "yes, |
Even if we go for this interpretation (i.e.
Ever since we rely on image features in core, offering feature filtering and selection, such violations would always lead to inaccurate image data and/or coordinates. For example, image data producers forgetting
Some of these problems can be detected heuristically. But we cannot set image features ourselves: only the producer knows what operations it actually applied, and what image it came from. To be able to enforce the spec, and thus ensure consistency between image data and features/coordinates, we could encapsulate both in a dedicated class Opinions? |
I generally like the idea. Though, I think it would be sufficient to have a customized |
But |
This is the point I do not understand. |
(But as being said on another channel: Pls. let's discuss this in depth at our next meeting.) |
Or even stronger: instead of
We thus don't need validation heuristics anymore. The only useful checks here are for the last 4 operations to retain exact image size. This would force the programmer to think in terms of image features, restrict herself to those operations in the spec, and inform core about all steps correctly. Self-baked image operations would still be possible, but they must be partitioned and categorized into the predefined steps, announcing each one via a new instantiation. For example, ocrd-cis-ocropy-clip would have to:
For another example, ocrd-cis-binarize would have to:
|
- image_from_page / image_from_segment: when applying @orientation to the image (or at least applying corresponding operations to the affine coordinate transform), always try to split the angle into orientation (multiples of 90°) and the remaining skew (both positive and negative, i.e. split symmetrically around -45°/45°); now apply skew via rotation, but orientation via transposition (i.e. a combination of reflection and 90° rotation operations, which may also swap width with height) – describing the latter via image features `rotated-90`, `rotated-180` and `rotated-270`, as required by the spec - adjust_canvas_to_transposition, transpose_coordinates, transpose_image: encapsulate and document all possible PIL.Image transposition methods
The last commit addresses that. It seems to work well, although I do not have many actual examples of >90° skew, at least ones that are properly detected by ocrd-tesserocr-deskew (which happens to be the only processor that tries to detect orientation at the moment). (Practically, I have to reduce the confidence threshold to a very low setting.) Please re-review! |
- for fillcolor background estimation, use median of all channels (but make sure to set alpha value to zero/fully transparent) - workaround for Pillow #1600 (creating black fill when processing images in LA mode)
- angle already applied must be sum of angle applied on parent level and on current level - not all image features are monotonic - some do propagate through all hierarchy levels: binarized, grayscale_normalized, despeckled - some must be treated level-local (to make sense and allow for relative coordinate consistency): deskewed, rotated-90, rotated-180, rotated-270, dewarped
The latter commit brings yet another breaking change: Some image features ( |
Maybe I should also make a PR in the spec for all the requirements related to image preprocessing that our PAGE-XML setup (keeping the original image, requiring exact coordinates, allowing derived images on any level) logically casts on any such implementation. Since there was so far no discussion on OCR-D/spec#116, I don't think it makes sense to detail all the reasons for the latest changes. The result can probably speak for itself. Opinions? |
At least some documentation in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way) would be highly appreciated. It could help the module projects and the poor folks who have to repair their contributions. |
Practical howto in the form of unit tests?
That would be appreciated. Wouldn't have to be that elaborate, maybe a slightly less implementation-specific version of the docstrings? |
The API syntax has only been broken very little actually. Most changes are additions. It's only those early adopters that have used the Another side is the API's semantics, which is also about how things are received by the programmer. There are significant changes/clarifications in that area, which every MP that does preprocessing needs to be aware of:
Sorry, I started out to avoid another lengthy explanation. But how do you put that briefly? And how do we spread the news?
That's essential for other reasons. But if these tests want to be useful, they must become convoluted. For education, IMHO it's better to point to actual code in the flesh, like
I will attempt that. (But the docstrings are quite complementary: they detail how you can do stuff with images, whereas the spec must say what should be done.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK you have convinced me that this is needed and can provide benefits (for edge cases), but due to the amount of extra complexity this will add in core
it is essential that there is
- high coverage with tests
- OCR-D processors that use this functionality should be adapted accordingly ASAP, with tests
- full documentation on these new API methods (preferably also with examples) is provided
No description provided.