-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using other than the last AlternativeImage #19
Comments
@bertsky I bet you know a way out of this, right? |
Where? I cannot see this on your current master or any other branch. It's important to see how exactly you were trying to do it.
This feature name violates the spec. And more importantly, there is no such thing as a block-segmented derived image in the OCR-D processing model. You must add your segment coordinates and classes to the PAGE-XML, not the cropped image. (Cropped images are merely allowed as an extra, consistent with the coordinates. But since you say you already filter binarized/normalized images on the input side, I would strongly recommend against that. In the least, the output images should be cropped from the fully-featured latest input images. Otherwise you likely break workflow / user expectations.)
Any consumer of PAGE-XML must assume that a derived image ( Some of those features are monotonic across levels, whereas others like
No, the 2 image API functions in core will always try to give the last derived images which satisfies all required features (positive and negative). If some operations are missing which itself can apply, then it does so. |
We tried it locally in the way you described as well. Since it didnt work we didnt push it to master.
Noted and will be fixed next commit.
While, as mentioned in #15 we didnt transform coordinates properly in all cases, this was not the problem in this case. What we did to create this problem is the following:
and the detected border:
Would it be possible to change the behavior of segment_from_image to "apply" cropping to these regions not just by shifting their coordinates, but also crop the TextRegions to the BorderRegion if they are outside? Maybe alongside a warning, that TextRegions are being shrunk to match the BorderRegion.
We actually tried this but the performance loss of our BlockSegmentation is to big if used after Binarization or Cropping to be a viable option. This means we also cant use it after Dewarping, since our Dewarping requires binarized images. |
That's just as well. But you should push it to some feature branch or PR (for which you can request my review) publicly so that we can talk about actual things, not feats of my imagination.
You said so earlier, but not why. I explained how to use
That sounds like you were also using these images in the next-respective consumer (i.e. binarized image for deskewing, binarized+deskewed image for cropping, binarized+deskewed+cropped image for segmentation). But is that really the case?
I don't know how I should read this. PAGE coordinates are always absolute. (Please don't use
Okay, so at least for segmentation you were ignoring all the derived images produced so far. You are correct: no coordinate conversion is necessary in that case. (But segmentation cannot benefit from deskewing and cropping either. Maybe if you trained your net with augmented skewing, and with uncropped, maybe even crop-masked images, then that would not hurt. But did you?)
You didn't say at what level this happened and what filters/selectors you were using. I assume you mean you did some
Yes, this could explain the problem. The figures are not exactly the same – but you have a skew angle on the page as well, right? (So in detail what would happen is that But without the actual code or full example, this is all guesswork.
No, that would lead to downstream inconsistency again. Who can still tell that the derived image now annotated does not represent the bbox of the region's polygon, but of a modified polygon? Core might do the same trick each time when calculating the coordinate transform for that image, but other implementations might not be aware of it. What core does here is correct: it gives you the requested image, but everything outside Note that no element in PAGE is allowed to exceed its parent element. You must fix this in/after the producer of the coordinates (your segmentation). And make sure the region image also matches the fixed polygon!
If you don't want to do it in your segmentation processor, then you could use
What do you mean performance, computation time or quality? If the former: Why do you attribute the time requried for binarization and cropping to the segmentation processor? These are independent steps, they each have their own "quota". Where to invest resources for improved quality is, after all, up to the workflow designer to decide. |
@bertsky I created a branch "region_cropping_problem". In that branch I also added a "pipeline.py" with a test page that should allow you to reproduce the error.
I was talking about the quality. P.S.: This branch doesnt use the latest version of our BlockSegmentation. I just use it to reliably reproduce the problem, so I can fix it. |
@mjenckel Were you able to fix the initial problem? |
Fixed. |
In our BlockSegmentation code we use raw input rather than the processed AlternativeImages. We achieved that by using the "feature_filter" and filtering for all other processing steps. This works quite nicely, however after adding the resulting text_regions to the page file like this:
we cant use them in any of the following processes. We get the following error:
The problem seems to be, that any future process assumes the coordinates added by BlockSegmentation should be transformed according to any previous process (e.g. cropping), even though the comments do not mention any previous computation for this region.
@kba Is it also correct, that even though the mentioned AlternativeImages exist as files in the workspace, the processor prefers to calculate the regions from the image?
For now we will change it so BlockSegmentation uses the latest AlternativeImage rather than the raw image as input.
The text was updated successfully, but these errors were encountered: