-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline use of PAGE → ALTO conversion #32
Comments
I think that code is only used in the unit tests, so we will have to patch prima-page-converter |
Personally, I'm not super happy about relying on prima-page-converter for this. It's hard to work with (though that may be largely due to my lack of Java expertise...), hard to build (PRImA-Research-Lab/prima-core-libs#10) and one of the dependencies is closed source (PRImA-Research-Lab/prima-page-converter#17). Did someone explore the option of doing this conversion by other means, i.e. reimplementing it in Python? |
Yes, I am going to invest a bit of time into this, since we need that conversion in many places and it must be fast and robust. I was going to first consider XSLT and if that turned out to be too cumbersome or inefficient, do the conversion in Python. |
In addition to @kba's efforts at https://github.com/kba/page-to-alto, I also stumbled upon these XSLT files: https://gitlab.com/readcoop/transkribus/TranskribusCore/-/tree/master/src/main/resources/xslt |
At least |
@mikegerber, did the Pythonic page-to-alto backend solve this for you? |
Yes! |
Currently converting from PAGE to ALTO - so one of the primary use cases for me - requires a working network connection and possibly a working HTTP proxy configuration to - apparently - load the ALTO schema. (#29) I also noticed that this conversion also needs to load - at least -
xlink.xsd
from the network.There is code in PrimaDla.jar to load from a schema folder (
searchForAdditionalSchemas
), we should probably explore this first with the aim to pre-install all schemas in such a folder.The text was updated successfully, but these errors were encountered: