Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline use of PAGE → ALTO conversion #32

Closed
mikegerber opened this issue Mar 19, 2021 · 7 comments
Closed

Offline use of PAGE → ALTO conversion #32

mikegerber opened this issue Mar 19, 2021 · 7 comments

Comments

@mikegerber
Copy link

Currently converting from PAGE to ALTO - so one of the primary use cases for me - requires a working network connection and possibly a working HTTP proxy configuration to - apparently - load the ALTO schema. (#29) I also noticed that this conversion also needs to load - at least - xlink.xsd from the network.

There is code in PrimaDla.jar to load from a schema folder (searchForAdditionalSchemas), we should probably explore this first with the aim to pre-install all schemas in such a folder.

@mikegerber
Copy link
Author

I think that code is only used in the unit tests, so we will have to patch prima-page-converter

@mikegerber
Copy link
Author

mikegerber commented Mar 23, 2021

Personally, I'm not super happy about relying on prima-page-converter for this. It's hard to work with (though that may be largely due to my lack of Java expertise...), hard to build (PRImA-Research-Lab/prima-core-libs#10) and one of the dependencies is closed source (PRImA-Research-Lab/prima-page-converter#17).

Did someone explore the option of doing this conversion by other means, i.e. reimplementing it in Python?

@kba
Copy link
Member

kba commented Mar 23, 2021

Did someone explore the option of doing this conversion by other means, i.e. reimplementing it in Python?

Yes, I am going to invest a bit of time into this, since we need that conversion in many places and it must be fast and robust. I was going to first consider XSLT and if that turned out to be too cumbersome or inefficient, do the conversion in Python.

@mikegerber
Copy link
Author

@mikegerber
Copy link
Author

In addition to @kba's efforts at https://github.com/kba/page-to-alto, I also stumbled upon these XSLT files: https://gitlab.com/readcoop/transkribus/TranskribusCore/-/tree/master/src/main/resources/xslt

At least PageToAlto.xsl does not contain any mention of ReadingOrder so I think it might me unsuitable.

@bertsky
Copy link
Collaborator

bertsky commented Nov 30, 2021

@mikegerber, did the Pythonic page-to-alto backend solve this for you?

@mikegerber
Copy link
Author

@mikegerber, did the Pythonic page-to-alto backend solve this for you?

Yes!

@kba kba closed this as completed Dec 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants