Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix image API (for rotation) #327

Merged
merged 16 commits into from
Nov 5, 2019
Merged

Fix image API (for rotation) #327

merged 16 commits into from
Nov 5, 2019

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Oct 7, 2019

No description provided.

bertsky and others added 10 commits September 11, 2019 23:24
do not always fill with white; instead, determine the
background color by median, and only use white for
binary images; moreover, add a transparency channel
if the input mode allows it
(image_from_polygon), keep the input image mode;
moreover, add a transparency channel if the input
image allows it
- image_from_page, image_from_segment, image_from_polygon:
  add parameter ``fill``
- possible values white/background/transparent, with
  ``transparent`` (behaviour introduced by this branch)
  as default
- image_from_page, image_from_segment, image_from_polygon:
  add parameter ``transparency``, independent of ``fill``
- an alpha channel with the mask will be added iff ``transparency``,
  colour in ``fill`` will be used regardless (for consumers which
  cannot handle alpha channels)
…b.com/bertsky/core into rotate-with-background-and-transparency
- image_from_polygon: regardless of the ``transparency`` parameter,
  if the input already has an alpha channel, then shrink its mask
  from the polygon mask
- for converting to/from relative coordinates on each level,
  instead of passing on offsets and angles along with images
  (which would actually have to be stored and applied for all
  levels monotonically, but was only implemented for the previous
  level), propagate one single affine coordinate transformation
  (which can be composed via matrix multiplication and inverted
  via matrix inversion)
- encapsulate image rotation, coordinate rotation, coordinate translation,
  offset calculation for coordinate rotation, application of coordinate
  transformation
- use the same feature selectors/filters for coordinate transforms as for
  image operations
- crop to the bounding box of the rotated polygon, not of the original;
  likewise, calculate rotation center and rotation offset (for regions)
  based on such bbox (whether it was used for AlternativeImage or not)
- break API for callers that expected the bounding box of the original
  (this was incorrect; callers must likewise crop via relative bboxes)
- warn if actual image size after rotation or from AlternativeImage does
  not fit calculated image size, but keep going with the calculated offset
- page and region level orientation are not described additive-relative but
  supplantive-absolute; thus, rotation must be differential when both apply
- bbox_from_polygon: intermediate coordinates can be negative
- image_from_polygon: allow passing any fillcolor
- improve docstrings
@bertsky bertsky requested review from kba and wrznr October 7, 2019 12:44
Copy link
Contributor

@wrznr wrznr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU, looks great!

ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
ocrd_utils/ocrd_utils/__init__.py Outdated Show resolved Hide resolved
@wrznr
Copy link
Contributor

wrznr commented Oct 7, 2019

The float-integer conversion errors problems for CI have to be handled before merging.

@codecov-io
Copy link

codecov-io commented Oct 7, 2019

Codecov Report

Merging #327 into master will decrease coverage by 5.5%.
The diff coverage is 25.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #327      +/-   ##
==========================================
- Coverage   91.12%   85.61%   -5.51%     
==========================================
  Files          30       30              
  Lines        1622     1731     +109     
  Branches      313      333      +20     
==========================================
+ Hits         1478     1482       +4     
- Misses        108      204      +96     
- Partials       36       45       +9
Impacted Files Coverage Δ
ocrd/ocrd/workspace.py 52.98% <21.78%> (-12.55%) ⬇️
ocrd_utils/ocrd_utils/__init__.py 64.25% <29.41%> (-17.66%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e71eff2...5b05737. Read the comment docs.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 7, 2019

The float-integer conversion errors problems for CI have to be handled before merging.

Fixed.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 7, 2019

Oh, I just remembered I still wanted to fix rotated-90 / rotated-180 / rotated-270 – these are currently ignored. But now that we have image features, it should be like this:

  • @orientation first gets split into multiples of 90° and rest
  • multiples of 90° should be applied by flip/transposition (both on the image and in the coordinate system) instead of rotation – this is more precise than rotation, and avoids large increase in canvas size, and avoids the case that width or height decreases (depending on angle and aspect ratio)
  • the rest gets applied via rotation

Should we still allow processors to use rotation for larger orientation angles (e.g. when not using core API to create the images)? In that case, maybe rotated-90 / rotated-180 / rotated-270 should be strictly reserved for the flip/transposition operations, i.e. their semantics would be how @orientation was applied, not whether @orientation was applied (the latter still being deskewed)?

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 7, 2019

BTW, is this the correct way of making a new PR on top of an old one (#311)?

@wrznr
Copy link
Contributor

wrznr commented Oct 8, 2019

BTW, is this the correct way of making a new PR on top of an old one (#311)?

@kba?

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great, just confused by the schema changes.

ocrd_utils/ocrd_utils/__init__.py Show resolved Hide resolved
ocrd_validators/ocrd_validators/ocrd_tool.schema.yml Outdated Show resolved Hide resolved
@kba
Copy link
Member

kba commented Oct 8, 2019

Also basic tests for the utility functions at least would be helpful as companions to the docstrings.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 8, 2019

Also basic tests for the utility functions at least would be helpful as companions to the docstrings.

Absolutely. All of this is in desparate need for extensive tests. So many things can go wrong here – believe me! But it's a lot of additional effort, and one way or the other this will add a dependency on assets.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2019

So, could we please postpone adding automated tests one more time? (I believe it is urgent we roll out, because others will soon start using the image API, and we don't want to disappoint or confuse them with inaccuracies after rotation. Also, the later this arrives, the more existing code will probably break.)

I still need your opinions whether I should take care of the remaining problem of flipping/transposing here (or in a separate PR later).

@wrznr
Copy link
Contributor

wrznr commented Oct 9, 2019

@bertsky Do it here and if possible we should strictly forbid skew angles >= 90° and defer them to the rotation feature.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2019

if possible we should strictly forbid skew angles >= 90° and defer them to the rotation feature.

PAGE-XML's @orientation itself conflates both orientation and skew. So any processor is free to choose a method for applying it to the image, pure rotation or rotation and transposition/flip – at least in principle. My question addresses what the API should do, whether we should allow deviations from that, and what exact meaning the image features in @comments should therefore have.

Do I read you correctly as saying "yes, rotated-90 should mean iff transposition was used (together with rotation) to apply orientation", so processors are free to either use rotation only (and just set deskewed) or a combination of both (and set deskewed,rotated-90)?

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2019

Even if we go for this interpretation (i.e. rotated-90 denotes transposition, rotated-180 denotes flip, deskewed denotes rotation by any angle remaining from @orientation), this still/again leaves the problem that producers might silently violate the spec and set image features (@comments) inconsistent with the actual operations they performed, including:

  • forgetting to set features new in their output
  • dropping features from the input they used (i.e. violating monotonicity)
  • ignoring features altogether (i.e. setting no features at all)

Ever since we rely on image features in core, offering feature filtering and selection, such violations would always lead to inaccurate image data and/or coordinates. For example, image data producers forgetting

  • cropped after cropping will lead to "double cropping"
  • deskewed after rotation will lead to double rotation
  • rotated-90 after transposition will lead to either double rotation (if not appearing together with deskewed) or wrong offset (if appearing together with deskewed)
  • binarized after binarization will lead to either re-binarization or ad-hoc binarization or exception (depending on the consumer).

Some of these problems can be detected heuristically. But we cannot set image features ourselves: only the producer knows what operations it actually applied, and what image it came from.

To be able to enforce the spec, and thus ensure consistency between image data and features/coordinates, we could encapsulate both in a dedicated class OcrdDerivedImage which image_from_page and image_from_segment return, but image_from_segment and save_image_file expect. We give only read access to the PIL.Image and metadata, but offer a method derive(new_image_data, new_features, new_transform) which instantiates a OcrdDerivedImage from the given image, but adds the old features and composes the old transform, and uses aforementioned heuristics to check whether the operation looks consistent – raising an exception in case of failure. No-one else may instantiate that class. Thus, only @comments consistent with the image may enter AlternativeImage. (Unless of course, add_AlternativeImage() gets used directly...)

Opinions?

@wrznr
Copy link
Contributor

wrznr commented Oct 9, 2019

I generally like the idea. Though, I think it would be sufficient to have a customized add_AlternativeImage routine with parameters for image features and to forbid access to the comments attribute. However, I am a huge fan of object-oriented solutions, so 👍 for OcrdDerivedImage.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2019

Though, I think it would be sufficient to have a customized add_AlternativeImage routine with parameters for image features and to forbid access to the comments attribute.

But @comments are the image features, it's just another name for the same concept. And add_AlternativeImage is too late: We don't know where the image and its features came from, therefore cannot apply any heuristics.

@wrznr
Copy link
Contributor

wrznr commented Oct 9, 2019

We don't know where the image and its features came from, therefore cannot apply any heuristics.

This is the point I do not understand.

@wrznr
Copy link
Contributor

wrznr commented Oct 9, 2019

(But as being said on another channel: Pls. let's discuss this in depth at our next meeting.)

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 9, 2019

Or even stronger: instead of OcrdDerivedImage.derive() as described above, we could offer the very operations corresponding to the image features in the spec:

  • .crop(bbox) – uses crop_image() on the image and shift_coordinates() on the transform, adds cropped feature
  • .transpose(angle) – uses transpose_image on the image and transpose_coordinates() on the transform, adds rotated-90 / rotated-180 / rotated-270 feature
  • .deskew(angle) – uses rotate_image() on the image and rotate_coordinates() on the transform, adds deskewed feature
  • .dewarp(image, offset) – replaces the given image, uses shift_coordinates() on the transform, adds dewarped feature; in the future, we could require a grid instead of the offset parameter to map coordinates exactly
  • .despeckle(image) – replaces the given image, adds despeckled feature
  • .binarize(image) – replaces the given image, adds binarized feature
  • .normalize(image) – replaces the given image, adds grayscale_normalized feature
  • .derive(image, feature=None) – replaces the given image, changes neither transform nor features, but adds feature if given (which must not be any of the classes in the spec)

We thus don't need validation heuristics anymore. The only useful checks here are for the last 4 operations to retain exact image size.

This would force the programmer to think in terms of image features, restrict herself to those operations in the spec, and inform core about all steps correctly. Self-baked image operations would still be possible, but they must be partitioned and categorized into the predefined steps, announcing each one via a new instantiation.

For example, ocrd-cis-ocropy-clip would have to:

  1. use image_from_page() to get the page-level OcrdDerivedImage instance
  2. pass that on to image_from_segment() to get the region-level OcrdDerivedImage,
  3. retrieve the PIL.Image object from that, convert to Numpy, clip some colours, convert back, and then
  4. use derive(new_image, feature='clipped') to get a new instance that can be stored with save_image_file and then annotated as AlternativeImage.

For another example, ocrd-cis-binarize would have to:

  1. use image_from_page() to get the page-level OcrdDerivedImage instance
  2. retrieve the PIL.Image object from that, convert to Numpy, do binarization, convert back, and then
  3. use binarize(new_image) to get a new instance, then on that
  4. (when requested by the caller) apply denoising, and use despeckle(new_image) to get a new instance, and on that
  5. (when requested by the caller) apply deskewing, and use deskew(angle) to get a new instance, and on that
  6. store with save_image_file() and annotate with add_AlternativeImage()

- image_from_page / image_from_segment: when applying @orientation
  to the image (or at least applying corresponding operations to the
  affine coordinate transform), always try to split the angle into
  orientation (multiples of 90°) and the remaining skew (both positive
  and negative, i.e. split symmetrically around -45°/45°);
  now apply skew via rotation, but orientation via transposition
  (i.e. a combination of reflection and 90° rotation operations, which
  may also swap width with height) – describing the latter via image features
  `rotated-90`, `rotated-180` and `rotated-270`, as required by the spec
- adjust_canvas_to_transposition, transpose_coordinates, transpose_image:
  encapsulate and document all possible PIL.Image transposition methods
@bertsky
Copy link
Collaborator Author

bertsky commented Oct 11, 2019

Oh, I just remembered I still wanted to fix rotated-90 / rotated-180 / rotated-270 – these are currently ignored. But now that we have image features, it should be like this:

* `@orientation` first gets split into multiples of 90° and rest

* multiples of 90° should be applied by flip/transposition (both on the image and in the coordinate system) instead of rotation – this is more precise than rotation, and avoids large increase in canvas size, and avoids the case that width or height _decreases_ (depending on angle and aspect ratio)

* the rest gets applied via rotation

Should we still allow processors to use rotation for larger orientation angles (e.g. when not using core API to create the images)? In that case, maybe rotated-90 / rotated-180 / rotated-270 should be strictly reserved for the flip/transposition operations, i.e. their semantics would be how @orientation was applied, not whether @orientation was applied (the latter still being deskewed)?

The last commit addresses that. It seems to work well, although I do not have many actual examples of >90° skew, at least ones that are properly detected by ocrd-tesserocr-deskew (which happens to be the only processor that tries to detect orientation at the moment). (Practically, I have to reduce the confidence threshold to a very low setting.)

Please re-review!

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 11, 2019

So for example, in euler_rechenkunst01_1738 on page phys_0004, region region_1475755598239_323 is obviously oriented left in an otherwise upright page:
OCR-D-IMG-CLIP-deskewed_0004_region_1475755598239_323

Tesseract OSD (with confidence threshold down to 1.5) detects that correctly, and we can resegment and dewarp (and recognize!) the individual lines:

OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755661600_324 Ti1illionen (conf=0.4)
OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755666424_325 tauſend (conf=0.96)
OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755675224_326 Bilhonen. (conf=0.37)
OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755684104_327 tauſend (conf=0.96)
OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755689009_328 Millionen (conf=0.84)
OCR-D-IMG-DEWARP-deskewed_0004_region_1475755598239_323_line_1475755693816_329 tauſend. (conf=0.96)

- for fillcolor background estimation, use median of all channels
  (but make sure to set alpha value to zero/fully transparent)
- workaround for Pillow #1600 (creating black fill when processing
  images in LA mode)
- angle already applied must be sum of angle applied on parent level
  and on current level
- not all image features are monotonic
  - some do propagate through all hierarchy levels:
    binarized, grayscale_normalized, despeckled
  - some must be treated level-local (to make sense and allow for
    relative coordinate consistency):
    deskewed, rotated-90, rotated-180, rotated-270, dewarped
@bertsky
Copy link
Collaborator Author

bertsky commented Oct 15, 2019

The latter commit brings yet another breaking change: Some image features (AlternativeImage/@comments tags) must be annotated and interpreted only level-local, while others are monotonic. Processors must set features accordingly, or things will go wrong (silently). I don't see a simple way to detect that automatically, except for the proposed strongly encapsulating API.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 22, 2019

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing that our PAGE-XML setup (keeping the original image, requiring exact coordinates, allowing derived images on any level) logically casts on any such implementation. Since there was so far no discussion on OCR-D/spec#116, I don't think it makes sense to detail all the reasons for the latest changes. The result can probably speak for itself. Opinions?

@wrznr
Copy link
Contributor

wrznr commented Oct 22, 2019

At least some documentation in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way) would be highly appreciated. It could help the module projects and the poor folks who have to repair their contributions.

@kba
Copy link
Member

kba commented Oct 22, 2019

in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way)

Practical howto in the form of unit tests?

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing

That would be appreciated. Wouldn't have to be that elaborate, maybe a slightly less implementation-specific version of the docstrings?

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 22, 2019

in terms of a Howto (i.e. how do I do image preprocessing with OCR-D the right way)
It could help the module projects and the poor folks who have to repair their contributions.

The API syntax has only been broken very little actually. Most changes are additions. It's only those early adopters that have used the x, y, w and h keys of the secondary objects returned by image_from_page() and image_from_segment() which have to be changed. But this was incorrect anyway (it lead to wrong segment coordinates and thus images). And AFAICT it affects only cisocrgroup/ocrd_cis#17 and OCR-D/ocrd_tesserocr#80. Both of them are already waiting to be able to merge when this is merged in core.

Another side is the API's semantics, which is also about how things are received by the programmer. There are significant changes/clarifications in that area, which every MP that does preprocessing needs to be aware of:

  1. image_from_page() and image_from_segment() now both apply reflection/transposition for the large multiples of 90° instead of rotation (which enlarges the image and thus increases offset, as opposed to the former), so they respect and utilise the image features rotated-90, rotated-180 and rotated-270 that have been in the spec all along. Since setting image features (AlternativeImage/@comments) correctly is also obligatory now, processors must not (any longer) simply ignore them and handle everything via rotation (or else coordinates will be off again). (Not setting image features appropriately have more serious consequences under other circumstances though: causing "double rotation" or "double cropping" in the aftermath.)

  2. Some image features only make sense when they propagate across hierarchy levels (e.g. binarized), others must be set for each level independently (e.g. deskewed). Otherwise the representation in PAGE would allow non-trivial ambiguity which no implementation could get right. For example, what does deskewed say of an AlternativeImage if both the page and the region have an @orientation? Does it mean both have been applied, or just one – which one? The only correct interpretation of the (underspecified) OCR-D and PAGE spec at that point is: deskewed describes the current level, and since @orientation refers to Page/@imageFilename, by passing this information down the hierarchy level, we can always infer what angle has already been applied where.

  3. Nevertheless, dewarping and rescaling are still open problems that currently will lead to derived image coordinate inconsistency. So when they are used somewhere in a workflow, any segmentation at the same or lower levels must happen afterwards (and cannot be mapped back to or displayed at any image higher in the hierarchy) – otherwise the crops will be completely wrong. For MPs this also means they have to expect changes to the API and representation in the future – like using @custom for the parameters of the coordinate transform, or extending PAGE with @scale and something like DwGts/Grid.

Sorry, I started out to avoid another lengthy explanation. But how do you put that briefly? And how do we spread the news?

Practical howto in the form of unit tests?

That's essential for other reasons. But if these tests want to be useful, they must become convoluted. For education, IMHO it's better to point to actual code in the flesh, like ocrd-tesserocr-* or ocrd-cis-ocropy-*. People should really use those as templates.

Maybe I should also make a PR in the spec for all the requirements related to image preprocessing

That would be appreciated. Wouldn't have to be that elaborate, maybe a slightly less implementation-specific version of the docstrings?

I will attempt that. (But the docstrings are quite complementary: they detail how you can do stuff with images, whereas the spec must say what should be done.)

Copy link
Member

@cneud cneud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK you have convinced me that this is needed and can provide benefits (for edge cases), but due to the amount of extra complexity this will add in core it is essential that there is

  • high coverage with tests
  • OCR-D processors that use this functionality should be adapted accordingly ASAP, with tests
  • full documentation on these new API methods (preferably also with examples) is provided

@kba kba merged commit 5b05737 into OCR-D:master Nov 5, 2019
@bertsky bertsky deleted the fix-rotation branch October 8, 2020 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants