You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This means that applications like PageViewer or PageConverter will use the XML order instead of the actual order laid out by the schema semantics. Which in turn creates a problem for applications like OCR-D: What is the correct representation, the one shown by PageViewer or my strict implementation?
Here's an example of the difference this can make:
In sharp contrast to what one might suspect superficially, here it's PageViewer who gets the order wrong – along with the producing tool eynollah (which follows its model of just looking at the XML order), hence a compensatory error.
If my interpretation is wrong, please get back to me soonish for confirmation. (I don't care about the fix so much as clarity on the correct meaning of the standard for implementation in software and adoption in derived specifications like OCR-D.)
If the better place is the PAGE-XML repo, please transfer.
The text was updated successfully, but these errors were encountered:
I would also be very happy to know what PRImA-Research-Lab's view on the index value here is. 😀 I would interpret the schema description in the same way as @bertsky and I, too, think that the implementation in PAGE Viewer is therefore wrong as shown in the example. (In the example, XML order = correct reading order but the index values are essentially random values. These essentially random values should be interpreted as the order if our interpretation of the schema is correct.)
AFAICS, the existing implementations for all versions of PAGE-XML ignore
(OrderedGroup|OrderedGroupIndexed)/@index
when parsing the XML.This is how it looks:
prima-core-libs/java/PrimaDla/src/org/primaresearch/dla/page/io/xml/sax/SaxPageHandler_2019_07_15.java
Lines 335 to 342 in 1f087a4
References for
ATTR_index
are nowhere to be found.The model class of the group in turn does nothing on its part to check incoming indices, it simply appends them:
prima-core-libs/java/PrimaDla/src/org/primaresearch/dla/page/layout/logical/Group.java
Lines 193 to 199 in 1f087a4
This means that applications like PageViewer or PageConverter will use the XML order instead of the actual order laid out by the schema semantics. Which in turn creates a problem for applications like OCR-D: What is the correct representation, the one shown by PageViewer or my strict implementation?
Here's an example of the difference this can make:
In sharp contrast to what one might suspect superficially, here it's PageViewer who gets the order wrong – along with the producing tool eynollah (which follows its model of just looking at the XML order), hence a compensatory error.
If my interpretation is wrong, please get back to me soonish for confirmation. (I don't care about the fix so much as clarity on the correct meaning of the standard for implementation in software and adoption in derived specifications like OCR-D.)
If the better place is the PAGE-XML repo, please transfer.
The text was updated successfully, but these errors were encountered: