Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair: inaccurate coordinates (tiny inconsistency/invalidity) #32

Closed
bertsky opened this issue Feb 21, 2020 · 2 comments
Closed

repair: inaccurate coordinates (tiny inconsistency/invalidity) #32

bertsky opened this issue Feb 21, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@bertsky
Copy link
Collaborator

bertsky commented Feb 21, 2020

From discussion on OCR-D/core#418:

Additionally, IMO the coordinate checks should be made a little less strict (and thus more compatible with Aletheia) to avoid crying wolf.

Things I see frequently:

  1. very small (up to 1 pixel) violations of non-containment in parent element
    • Shapely does not have almost_within, but one could try containment within the dilated version:
      if not (child_poly.within(node_poly) or
              child_poly.within(node_poly.buffer(0.5)))
  2. tiny (direct neighbour) self-intersections because of back-and-forth (probably caused by internal rounding)
    • This must be repaired on the spot, otherwise Shapely will not operate on these polygons. Possibly:
      if not node_poly.is_valid:
          if node_poly.simplify(0.8).is_valid:
              node_poly = node_poly.simplify(0.8)

But it could be more prudent to keep a strict validator, and outsource these repairs into a dedicated Aletheia postprocessor (e.g. ocrd-segment-repair with a new correct-coords=true).

Originally posted by @bertsky in OCR-D/core#418 (comment)

@bertsky
Copy link
Collaborator Author

bertsky commented Sep 4, 2020

Status update: since OCR-D/core@6bf98d0 we do have a slightly more tolerant validator, but this is not much help, because

  • invalid shapes can still cause exceptions in follow-up processors
  • invalidities/inconsistencies which are larger than the 2px tolerance may still be explainable/repairable systematically
  • especially if it does turn out PAGE coordinate semantics are "pixel center" instead of "pixel below right", we need an automatic translation between the two paradigms (the former as file interface, the latter as runtime interface for all polygon/bbox and image processing libraries)
  • we need some tool for productive use as long as we have not fixed the offending (producing) processors themselves

@bertsky
Copy link
Collaborator Author

bertsky commented Feb 17, 2022

This has been solved long ago – ocrd-segment-repair does fix (trivial) validation errors automatically

@bertsky bertsky closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant