Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generated namespace prefix is invalid #277

Closed
bertsky opened this issue Aug 12, 2019 · 10 comments
Closed

generated namespace prefix is invalid #277

bertsky opened this issue Aug 12, 2019 · 10 comments
Assignees
Labels

Comments

@bertsky
Copy link
Collaborator

bertsky commented Aug 12, 2019

With 1.0.0b12, I now get documents from to_xml that begin like this:

<pc:PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15">

Thus, all elements use the pc prefix, but this is not declared in the root as xmlns:pc=....

IMO, this regression is a combination of c0c1ea6 and the earlier af5098d – first to_xml was rewritten to not include the pc prefix anymore (but at that time it was also not used in the generateDS call), then the generateDS call changed (but to_xml was not changed back accordingly).

@bertsky
Copy link
Collaborator Author

bertsky commented Aug 12, 2019

Oh, I just found your fix 22a4624

@bertsky
Copy link
Collaborator Author

bertsky commented Aug 12, 2019

(Could we separate this out of #268, and merge it right away?)

@n00blet
Copy link

n00blet commented Aug 14, 2019

@kba hi, can you please fix this ?

@kba kba closed this as completed in ce198ed Aug 14, 2019
@kba
Copy link
Member

kba commented Aug 14, 2019

Fixed in 1.0.0b15

@bertsky
Copy link
Collaborator Author

bertsky commented Aug 14, 2019

Thanks!

@dstoekl
Copy link

dstoekl commented Dec 17, 2020

what is the purpose of the pc prefix? e.g. pc:pcgts? It does not validate against the schema.

@cneud
Copy link
Member

cneud commented Dec 17, 2020

what is the purpose of the pc prefix? e.g. pc:pcgts? It does not validate against the schema.

pc stands for page content within PAGE-XML afaict. The PAGE schema also has various other parts, e.g. relating to evaluation of binarization etc.

@dstoekl
Copy link

dstoekl commented Dec 17, 2020

thx. but why would you prefix every single namespace with it? I haven't seen this on other pagexmls and it does not validate against the schema given.

@kba
Copy link
Member

kba commented Dec 17, 2020

thx. but why would you prefix every single namespace with it? I haven't seen this on other pagexmls and it does not validate against the schema given.

Why would this not validate against the PAGE schema? Our PAGE-XML define xmlns:pc=xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15" so this should be valid. If we produce invalid XML, please open a new issue and share the setup that led to the faulty PAGE-XML.

@dstoekl
Copy link

dstoekl commented Dec 17, 2020

ok. never mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants