-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support scale attribute for down/upsampled images #25
Comments
It should be noted that this feature request is not academic at all. It is based on concrete necessities which arose in the context of “real“ OCR processing. I.e., we need this one! |
Sorry, don't have the time to dive into this, but it seems to diverge quite a bit of what the idea of the PageContent format is. |
That would be unadvisable for many reasons:
I couldn't disagree more 😉 |
I second adding attributes to |
Since
AlternativeImage
has been introduced on every level of the structural hierarchy, these image files can be used to represent results from image preprocessing (normalization, denoising, binarization, non-text suppression, despeckling, deskewing, dewarping). Some of these operations can and some cannot be represented descriptively – but referencing derived images always helps avoiding repeated computations.However, there's a difficulty/penalty involved: All coordinates in the PAGE hierarchy are referring to the original image (under
/PcGts/Page/@imageFilename
), whereas derived images (AlternativeImage/@filename
underPage
orRegion
orTextLine
orWord
) necessarily have different, local/relative coordinate system. It is connected to the global/absolute coordinate system only implicitly.So if you want to process via derived images, like crop segments further down the hierarchy (translating from their absolute coordinates to the images' relative coordinates) or add further segmentation (translating from new relative coordinates in the images to new absolute coordinates), then you must know the transformation between them.
This could merely be an offset (which could be unambiguously defined as the top left of the bounding box of the element's polygon), which happens after cropping (on the page level or any segmentation below that).
But there are certain operations which change coordinates non-trivially:
All those effects are cumulative, i.e. they will compose into a new coordinate transform at each step, and in the order of the operations applied to the image (and its predecessors). This is not always trivial, e.g. cropping before/after deskewing, deskewing on page and then again on region level. It's certainly not rocket science, but (believe me) there are many ways you can get this wrong when you have to implement it.
Now, for cropping and deskewing, we are in the fortunate situation that – provided the operations applied on the derived image have been carried out in the "correct" way and documented in its
@comments
– their respective coordinate transform can be reconstructed from the descriptive information (Coords/@points
and@orientation
).But for dewarping and rescaling we don't even have any descriptive annotation yet.
For dewarping, maybe the dewarping schema with its
/DwGts/Grid/Row/@points
is sufficient (although it is unfortunate that this schema is external to the content schema).But for rescaling, there's nothing at all.
You could ask:
1: I'd be happy to see PAGE adopt some representation of affine transformations (basically a 3x3 float array) under
AlternativeImage/@coordinate-system
. But I would still consider this only a redundant convenience feature.2: Rescaling is useful under various scenarios:
Thus, I propose to at least introduce a descriptive annotation for derived images' scale factors:
AlternativeImage/@imageWidth
(as inPage/@imageWidth
)AlternativeImage/@imageHeight
(as inPage/@imageHeight
)AlternativeImage/@imageXResolution
(as inPage/@imageXResolution
)AlternativeImage/@imageYResolution
(as inPage/@imageYResolution
)AlternativeImage/@imageResolutionUnit
(as inPage/@imageResolutionUnit
)AlternativeImage/@imageXScale
(how much isAlternativeImage/@imageXResolution
zoomed overPage/@imageXResolution
?)AlternativeImage/@imageYScale
(how much isAlternativeImage/@imageYResolution
zoomed overPage/@imageYResolution
?)(Of course, the latter 2 are redundant, but pixel density might not be known exactly/reliably and thus omitted / set to zero. In that case, the scale can still describe precisely the factor between the unknown density of the original image and the unknown density of the derived image.)
The text was updated successfully, but these errors were encountered: