Fix image API (for rotation) #327

bertsky · 2019-10-07T12:44:36Z

No description provided.

do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency channel if the input mode allows it

(image_from_polygon), keep the input image mode; moreover, add a transparency channel if the input image allows it

- image_from_page, image_from_segment, image_from_polygon: add parameter ``fill`` - possible values white/background/transparent, with ``transparent`` (behaviour introduced by this branch) as default

- image_from_page, image_from_segment, image_from_polygon: add parameter ``transparency``, independent of ``fill`` - an alpha channel with the mask will be added iff ``transparency``, colour in ``fill`` will be used regardless (for consumers which cannot handle alpha channels)

…b.com/bertsky/core into rotate-with-background-and-transparency

- image_from_polygon: regardless of the ``transparency`` parameter, if the input already has an alpha channel, then shrink its mask from the polygon mask

- for converting to/from relative coordinates on each level, instead of passing on offsets and angles along with images (which would actually have to be stored and applied for all levels monotonically, but was only implemented for the previous level), propagate one single affine coordinate transformation (which can be composed via matrix multiplication and inverted via matrix inversion) - encapsulate image rotation, coordinate rotation, coordinate translation, offset calculation for coordinate rotation, application of coordinate transformation - use the same feature selectors/filters for coordinate transforms as for image operations - crop to the bounding box of the rotated polygon, not of the original; likewise, calculate rotation center and rotation offset (for regions) based on such bbox (whether it was used for AlternativeImage or not) - break API for callers that expected the bounding box of the original (this was incorrect; callers must likewise crop via relative bboxes) - warn if actual image size after rotation or from AlternativeImage does not fit calculated image size, but keep going with the calculated offset - page and region level orientation are not described additive-relative but supplantive-absolute; thus, rotation must be differential when both apply - bbox_from_polygon: intermediate coordinates can be negative - image_from_polygon: allow passing any fillcolor - improve docstrings

wrznr

AFAIU, looks great!

ocrd_utils/ocrd_utils/__init__.py

wrznr · 2019-10-07T13:07:24Z

The float-integer conversion errors problems for CI have to be handled before merging.

codecov-io · 2019-10-07T21:32:17Z

Codecov Report

Merging #327 into master will decrease coverage by 5.5%.
The diff coverage is 25.26%.

@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   91.12%   85.61%   -5.51%     
==========================================
  Files          30       30              
  Lines        1622     1731     +109     
  Branches      313      333      +20     
==========================================
+ Hits         1478     1482       +4     
- Misses        108      204      +96     
- Partials       36       45       +9

Impacted Files	Coverage Δ
ocrd/ocrd/workspace.py	`52.98% <21.78%> (-12.55%)`	⬇️
ocrd_utils/ocrd_utils/__init__.py	`64.25% <29.41%> (-17.66%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e71eff2...5b05737. Read the comment docs.

bertsky · 2019-10-07T21:37:53Z

The float-integer conversion errors problems for CI have to be handled before merging.

Fixed.

bertsky · 2019-10-07T21:52:15Z

Oh, I just remembered I still wanted to fix rotated-90 / rotated-180 / rotated-270 – these are currently ignored. But now that we have image features, it should be like this:

@orientation first gets split into multiples of 90° and rest
multiples of 90° should be applied by flip/transposition (both on the image and in the coordinate system) instead of rotation – this is more precise than rotation, and avoids large increase in canvas size, and avoids the case that width or height decreases (depending on angle and aspect ratio)
the rest gets applied via rotation

Should we still allow processors to use rotation for larger orientation angles (e.g. when not using core API to create the images)? In that case, maybe rotated-90 / rotated-180 / rotated-270 should be strictly reserved for the flip/transposition operations, i.e. their semantics would be how @orientation was applied, not whether @orientation was applied (the latter still being deskewed)?

bertsky · 2019-10-07T21:54:54Z

BTW, is this the correct way of making a new PR on top of an old one (#311)?

wrznr · 2019-10-08T05:59:08Z

BTW, is this the correct way of making a new PR on top of an old one (#311)?

@kba?

kba

Code looks great, just confused by the schema changes.

ocrd_utils/ocrd_utils/__init__.py

ocrd_validators/ocrd_validators/ocrd_tool.schema.yml

kba · 2019-10-08T13:24:25Z

Also basic tests for the utility functions at least would be helpful as companions to the docstrings.

bertsky · 2019-10-08T16:30:27Z

Also basic tests for the utility functions at least would be helpful as companions to the docstrings.

Absolutely. All of this is in desparate need for extensive tests. So many things can go wrong here – believe me! But it's a lot of additional effort, and one way or the other this will add a dependency on assets.

bertsky · 2019-10-09T08:03:20Z

So, could we please postpone adding automated tests one more time? (I believe it is urgent we roll out, because others will soon start using the image API, and we don't want to disappoint or confuse them with inaccuracies after rotation. Also, the later this arrives, the more existing code will probably break.)

I still need your opinions whether I should take care of the remaining problem of flipping/transposing here (or in a separate PR later).

wrznr · 2019-10-09T08:07:52Z

@bertsky Do it here and if possible we should strictly forbid skew angles >= 90° and defer them to the rotation feature.

bertsky · 2019-10-09T08:21:53Z

if possible we should strictly forbid skew angles >= 90° and defer them to the rotation feature.

PAGE-XML's @orientation itself conflates both orientation and skew. So any processor is free to choose a method for applying it to the image, pure rotation or rotation and transposition/flip – at least in principle. My question addresses what the API should do, whether we should allow deviations from that, and what exact meaning the image features in @comments should therefore have.

Do I read you correctly as saying "yes, rotated-90 should mean iff transposition was used (together with rotation) to apply orientation", so processors are free to either use rotation only (and just set deskewed) or a combination of both (and set deskewed,rotated-90)?

bertsky · 2019-10-09T10:14:55Z

Even if we go for this interpretation (i.e. rotated-90 denotes transposition, rotated-180 denotes flip, deskewed denotes rotation by any angle remaining from @orientation), this still/again leaves the problem that producers might silently violate the spec and set image features (@comments) inconsistent with the actual operations they performed, including:

forgetting to set features new in their output
dropping features from the input they used (i.e. violating monotonicity)
ignoring features altogether (i.e. setting no features at all)

Ever since we rely on image features in core, offering feature filtering and selection, such violations would always lead to inaccurate image data and/or coordinates. For example, image data producers forgetting

cropped after cropping will lead to "double cropping"
deskewed after rotation will lead to double rotation
rotated-90 after transposition will lead to either double rotation (if not appearing together with deskewed) or wrong offset (if appearing together with deskewed)
binarized after binarization will lead to either re-binarization or ad-hoc binarization or exception (depending on the consumer).

Some of these problems can be detected heuristically. But we cannot set image features ourselves: only the producer knows what operations it actually applied, and what image it came from.

To be able to enforce the spec, and thus ensure consistency between image data and features/coordinates, we could encapsulate both in a dedicated class OcrdDerivedImage which image_from_page and image_from_segment return, but image_from_segment and save_image_file expect. We give only read access to the PIL.Image and metadata, but offer a method derive(new_image_data, new_features, new_transform) which instantiates a OcrdDerivedImage from the given image, but adds the old features and composes the old transform, and uses aforementioned heuristics to check whether the operation looks consistent – raising an exception in case of failure. No-one else may instantiate that class. Thus, only @comments consistent with the image may enter AlternativeImage. (Unless of course, add_AlternativeImage() gets used directly...)

Opinions?

wrznr · 2019-10-09T10:20:43Z

I generally like the idea. Though, I think it would be sufficient to have a customized add_AlternativeImage routine with parameters for image features and to forbid access to the comments attribute. However, I am a huge fan of object-oriented solutions, so 👍 for OcrdDerivedImage.

bertsky · 2019-10-09T10:26:25Z

Though, I think it would be sufficient to have a customized add_AlternativeImage routine with parameters for image features and to forbid access to the comments attribute.

But @comments are the image features, it's just another name for the same concept. And add_AlternativeImage is too late: We don't know where the image and its features came from, therefore cannot apply any heuristics.

wrznr · 2019-10-09T10:27:21Z

We don't know where the image and its features came from, therefore cannot apply any heuristics.

This is the point I do not understand.

wrznr · 2019-10-09T10:28:07Z

(But as being said on another channel: Pls. let's discuss this in depth at our next meeting.)

bertsky · 2019-10-09T13:29:35Z

Or even stronger: instead of OcrdDerivedImage.derive() as described above, we could offer the very operations corresponding to the image features in the spec:

.crop(bbox) – uses crop_image() on the image and shift_coordinates() on the transform, adds cropped feature
.transpose(angle) – uses transpose_image on the image and transpose_coordinates() on the transform, adds rotated-90 / rotated-180 / rotated-270 feature
.deskew(angle) – uses rotate_image() on the image and rotate_coordinates() on the transform, adds deskewed feature
.dewarp(image, offset) – replaces the given image, uses shift_coordinates() on the transform, adds dewarped feature; in the future, we could require a grid instead of the offset parameter to map coordinates exactly
.despeckle(image) – replaces the given image, adds despeckled feature
.binarize(image) – replaces the given image, adds binarized feature
.normalize(image) – replaces the given image, adds grayscale_normalized feature
.derive(image, feature=None) – replaces the given image, changes neither transform nor features, but adds feature if given (which must not be any of the classes in the spec)

We thus don't need validation heuristics anymore. The only useful checks here are for the last 4 operations to retain exact image size.

This would force the programmer to think in terms of image features, restrict herself to those operations in the spec, and inform core about all steps correctly. Self-baked image operations would still be possible, but they must be partitioned and categorized into the predefined steps, announcing each one via a new instantiation.

For example, ocrd-cis-ocropy-clip would have to:

use image_from_page() to get the page-level OcrdDerivedImage instance
pass that on to image_from_segment() to get the region-level OcrdDerivedImage,
retrieve the PIL.Image object from that, convert to Numpy, clip some colours, convert back, and then
use derive(new_image, feature='clipped') to get a new instance that can be stored with save_image_file and then annotated as AlternativeImage.

For another example, ocrd-cis-binarize would have to:

use image_from_page() to get the page-level OcrdDerivedImage instance
retrieve the PIL.Image object from that, convert to Numpy, do binarization, convert back, and then
use binarize(new_image) to get a new instance, then on that
(when requested by the caller) apply denoising, and use despeckle(new_image) to get a new instance, and on that
(when requested by the caller) apply deskewing, and use deskew(angle) to get a new instance, and on that
store with save_image_file() and annotate with add_AlternativeImage()

- image_from_page / image_from_segment: when applying @orientation to the image (or at least applying corresponding operations to the affine coordinate transform), always try to split the angle into orientation (multiples of 90°) and the remaining skew (both positive and negative, i.e. split symmetrically around -45°/45°); now apply skew via rotation, but orientation via transposition (i.e. a combination of reflection and 90° rotation operations, which may also swap width with height) – describing the latter via image features `rotated-90`, `rotated-180` and `rotated-270`, as required by the spec - adjust_canvas_to_transposition, transpose_coordinates, transpose_image: encapsulate and document all possible PIL.Image transposition methods

bertsky · 2019-10-11T13:49:32Z

Oh, I just remembered I still wanted to fix rotated-90 / rotated-180 / rotated-270 – these are currently ignored. But now that we have image features, it should be like this:
* `@orientation` first gets split into multiples of 90° and rest

* multiples of 90° should be applied by flip/transposition (both on the image and in the coordinate system) instead of rotation – this is more precise than rotation, and avoids large increase in canvas size, and avoids the case that width or height _decreases_ (depending on angle and aspect ratio)

* the rest gets applied via rotation
Should we still allow processors to use rotation for larger orientation angles (e.g. when not using core API to create the images)? In that case, maybe rotated-90 / rotated-180 / rotated-270 should be strictly reserved for the flip/transposition operations, i.e. their semantics would be how @orientation was applied, not whether @orientation was applied (the latter still being deskewed)?

The last commit addresses that. It seems to work well, although I do not have many actual examples of >90° skew, at least ones that are properly detected by ocrd-tesserocr-deskew (which happens to be the only processor that tries to detect orientation at the moment). (Practically, I have to reduce the confidence threshold to a very low setting.)

Please re-review!

bertsky · 2019-10-11T13:59:28Z

So for example, in euler_rechenkunst01_1738 on page phys_0004, region region_1475755598239_323 is obviously oriented left in an otherwise upright page:

Tesseract OSD (with confidence threshold down to 1.5) detects that correctly, and we can resegment and dewarp (and recognize!) the individual lines:

Ti1illionen (conf=0.4)
tauſend (conf=0.96)
Bilhonen. (conf=0.37)
tauſend (conf=0.96)
Millionen (conf=0.84)
tauſend. (conf=0.96)

- for fillcolor background estimation, use median of all channels (but make sure to set alpha value to zero/fully transparent) - workaround for Pillow #1600 (creating black fill when processing images in LA mode)

- angle already applied must be sum of angle applied on parent level and on current level - not all image features are monotonic - some do propagate through all hierarchy levels: binarized, grayscale_normalized, despeckled - some must be treated level-local (to make sense and allow for relative coordinate consistency): deskewed, rotated-90, rotated-180, rotated-270, dewarped

bertsky · 2019-10-15T15:39:27Z

The latter commit brings yet another breaking change: Some image features (AlternativeImage/@comments tags) must be annotated and interpreted only level-local, while others are monotonic. Processors must set features accordingly, or things will go wrong (silently). I don't see a simple way to detect that automatically, except for the proposed strongly encapsulating API.

bertsky · 2019-10-22T12:48:24Z

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing that our PAGE-XML setup (keeping the original image, requiring exact coordinates, allowing derived images on any level) logically casts on any such implementation. Since there was so far no discussion on OCR-D/spec#116, I don't think it makes sense to detail all the reasons for the latest changes. The result can probably speak for itself. Opinions?

wrznr · 2019-10-22T12:52:06Z

At least some documentation in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way) would be highly appreciated. It could help the module projects and the poor folks who have to repair their contributions.

kba · 2019-10-22T13:11:36Z

in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way)

Practical howto in the form of unit tests?

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing

That would be appreciated. Wouldn't have to be that elaborate, maybe a slightly less implementation-specific version of the docstrings?

bertsky · 2019-10-22T23:09:10Z

in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way)
It could help the module projects and the poor folks who have to repair their contributions.

The API syntax has only been broken very little actually. Most changes are additions. It's only those early adopters that have used the x, y, w and h keys of the secondary objects returned by image_from_page() and image_from_segment() which have to be changed. But this was incorrect anyway (it lead to wrong segment coordinates and thus images). And AFAICT it affects only cisocrgroup/ocrd_cis#17 and OCR-D/ocrd_tesserocr#80. Both of them are already waiting to be able to merge when this is merged in core.

Another side is the API's semantics, which is also about how things are received by the programmer. There are significant changes/clarifications in that area, which every MP that does preprocessing needs to be aware of:

image_from_page() and image_from_segment() now both apply reflection/transposition for the large multiples of 90° instead of rotation (which enlarges the image and thus increases offset, as opposed to the former), so they respect and utilise the image features rotated-90, rotated-180 and rotated-270 that have been in the spec all along. Since setting image features (AlternativeImage/@comments) correctly is also obligatory now, processors must not (any longer) simply ignore them and handle everything via rotation (or else coordinates will be off again). (Not setting image features appropriately have more serious consequences under other circumstances though: causing "double rotation" or "double cropping" in the aftermath.)
Some image features only make sense when they propagate across hierarchy levels (e.g. binarized), others must be set for each level independently (e.g. deskewed). Otherwise the representation in PAGE would allow non-trivial ambiguity which no implementation could get right. For example, what does deskewed say of an AlternativeImage if both the page and the region have an @orientation? Does it mean both have been applied, or just one – which one? The only correct interpretation of the (underspecified) OCR-D and PAGE spec at that point is: deskewed describes the current level, and since @orientation refers to Page/@imageFilename, by passing this information down the hierarchy level, we can always infer what angle has already been applied where.
Nevertheless, dewarping and rescaling are still open problems that currently will lead to derived image coordinate inconsistency. So when they are used somewhere in a workflow, any segmentation at the same or lower levels must happen afterwards (and cannot be mapped back to or displayed at any image higher in the hierarchy) – otherwise the crops will be completely wrong. For MPs this also means they have to expect changes to the API and representation in the future – like using @custom for the parameters of the coordinate transform, or extending PAGE with @scale and something like DwGts/Grid.

Sorry, I started out to avoid another lengthy explanation. But how do you put that briefly? And how do we spread the news?

Practical howto in the form of unit tests?

That's essential for other reasons. But if these tests want to be useful, they must become convoluted. For education, IMHO it's better to point to actual code in the flesh, like ocrd-tesserocr-* or ocrd-cis-ocropy-*. People should really use those as templates.

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing

That would be appreciated. Wouldn't have to be that elaborate, maybe a slightly less implementation-specific version of the docstrings?

I will attempt that. (But the docstrings are quite complementary: they detail how you can do stuff with images, whereas the spec must say what should be done.)

cneud

OK you have convinced me that this is needed and can provide benefits (for edge cases), but due to the amount of extra complexity this will add in core it is essential that there is

high coverage with tests
OCR-D processors that use this functionality should be adapted accordingly ASAP, with tests
full documentation on these new API methods (preferably also with examples) is provided

bertsky and others added 10 commits September 11, 2019 23:24

Workspace.image_from_*: when rotation is necessary,

fe925a0

do not always fill with white; instead, determine the background color by median, and only use white for binary images; moreover, add a transparency channel if the input mode allows it

Workspace.image_from_*: when cropping is necessary

a10c9f2

(image_from_polygon), keep the input image mode; moreover, add a transparency channel if the input image allows it

fix debug message

440863f

cropping/rotation: caller can opt out of transparency:

d4b46a3

- image_from_page, image_from_segment, image_from_polygon: add parameter ``fill`` - possible values white/background/transparent, with ``transparent`` (behaviour introduced by this branch) as default

Merge branch 'master' into rotate-with-background-and-transparency

7f62ec9

Merge branch 'rotate-with-background-and-transparency' of ssh://githu…

dfb7a57

…b.com/bertsky/core into rotate-with-background-and-transparency

cropping: if input already has alpha, then use it unconditionally

215105a

- image_from_polygon: regardless of the ``transparency`` parameter, if the input already has an alpha channel, then shrink its mask from the polygon mask

polygon_mask: outline must have color of fill

1d82c42

bertsky requested review from kba and wrznr October 7, 2019 12:44

wrznr approved these changes Oct 7, 2019

View reviewed changes

kba requested changes Oct 8, 2019

View reviewed changes

ocrd_utils/ocrd_utils/__init__.py Show resolved Hide resolved

ocrd_validators/ocrd_validators/ocrd_tool.schema.yml Outdated Show resolved Hide resolved

bertsky added 2 commits October 8, 2019 18:48

improve docstrings

c5198f8

image_from_page: integer instead of float polygon

359e153

bertsky force-pushed the fix-rotation branch from 27ef213 to 359e153 Compare October 8, 2019 16:49

bertsky added 2 commits October 15, 2019 17:22

fix fill/transparency:

a4ede35

- for fillcolor background estimation, use median of all channels (but make sure to set alpha value to zero/fully transparent) - workaround for Pillow #1600 (creating black fill when processing images in LA mode)

bertsky mentioned this pull request Oct 15, 2019

fix deskewing OCR-D/ocrd_tesserocr#80

Closed

kba approved these changes Oct 22, 2019

View reviewed changes

Merge branch 'master' into fix-rotation

5b05737

bertsky mentioned this pull request Oct 22, 2019

many more Ocropy improvements cisocrgroup/ocrd_cis#17

Merged

bertsky mentioned this pull request Oct 29, 2019

Missing v0.5.0 tag OCR-D/ocrd_tesserocr#85

Closed

cneud self-requested a review November 4, 2019 17:33

cneud approved these changes Nov 4, 2019

View reviewed changes

kba merged commit 5b05737 into OCR-D:master Nov 5, 2019

bertsky mentioned this pull request Nov 6, 2019

Using other than the last AlternativeImage OCR-D/ocrd_anybaseocr#19

Closed

bertsky deleted the fix-rotation branch October 8, 2020 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix image API (for rotation) #327

Fix image API (for rotation) #327

bertsky commented Oct 7, 2019

wrznr left a comment

wrznr commented Oct 7, 2019

codecov-io commented Oct 7, 2019 •

edited

Loading

bertsky commented Oct 7, 2019

bertsky commented Oct 7, 2019

bertsky commented Oct 7, 2019

wrznr commented Oct 8, 2019

kba left a comment

kba commented Oct 8, 2019

bertsky commented Oct 8, 2019

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019 •

edited

Loading

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019

bertsky commented Oct 11, 2019

bertsky commented Oct 11, 2019 •

edited

Loading

bertsky commented Oct 15, 2019

bertsky commented Oct 22, 2019

wrznr commented Oct 22, 2019

kba commented Oct 22, 2019

bertsky commented Oct 22, 2019

cneud left a comment

Fix image API (for rotation) #327

Fix image API (for rotation) #327

Conversation

bertsky commented Oct 7, 2019

wrznr left a comment

Choose a reason for hiding this comment

wrznr commented Oct 7, 2019

codecov-io commented Oct 7, 2019 • edited Loading

Codecov Report

bertsky commented Oct 7, 2019

bertsky commented Oct 7, 2019

bertsky commented Oct 7, 2019

wrznr commented Oct 8, 2019

kba left a comment

Choose a reason for hiding this comment

kba commented Oct 8, 2019

bertsky commented Oct 8, 2019

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019 • edited Loading

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019

wrznr commented Oct 9, 2019

wrznr commented Oct 9, 2019

bertsky commented Oct 9, 2019

bertsky commented Oct 11, 2019

bertsky commented Oct 11, 2019 • edited Loading

bertsky commented Oct 15, 2019

bertsky commented Oct 22, 2019

wrznr commented Oct 22, 2019

kba commented Oct 22, 2019

bertsky commented Oct 22, 2019

cneud left a comment

Choose a reason for hiding this comment

codecov-io commented Oct 7, 2019 •

edited

Loading

bertsky commented Oct 9, 2019 •

edited

Loading

bertsky commented Oct 11, 2019 •

edited

Loading