-
Notifications
You must be signed in to change notification settings - Fork 12
SimpleIOHandler prints xsd datatype as an abreviated string, which is subtly off #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Why using xsd:float or xsd:string in all those cases is wrong? I think it is correct, and that older versions of Paxtools printed long “<http://www.w3.org/2001/XMLSchema#float” over and over, making BioPAX file larger… I suspect the problem is somewhere in those other tools that you use. Tools should understand different styles RDFXML syntax rather that expect a particular one… Also, as I know, e.g. Jena has bugs, e.g. uses some weird rules rather than standard to tell valud local uri from valid absolute uri, adds xml:base unnecessarily or inserts ‘#’ sign into absolute uri, etc…
|
I think it is a bug. The other tools are compliant in how they interpret the string, this also includes RDFlib and RDF4j. Why it does not show up as a bug for most people is because the the value "xsd:float" is also an IRI. Just not the IRI that one expects "http://www.w3.org/2001/XMLSchema#float". This is because rdf/xml uses xml infoset for iri expansion and that only works on elements and not on attributes (other than the xml:base setting which does not apply here). Now this only matters if someone actually uses RDF tools like SPARQL. Which not many people might, but we do. So currently the full iri of the datatype is just e.g. when writing a sparql query like this PREFIX xsd:<http://www.w3.org/2001/XMLSchema#float>
PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT *
WHERE {
?x biopax3:stoichiometricCoefficient "1.0"^^xsd:float .
} fails to return with the current biopax file. PREFIX biopax3:<http://www.biopax.org/release/biopax-level3.owl#>
SELECT *
WHERE {
?x biopax3:stoichiometricCoefficient "1.0"^^<xsd:float> .
} but that is because PREFIX xsd:<http://www.w3.org/2001/XMLSchema#float>
SELECT ?same
WHERE { BIND((<xdf:float> = xsd:float) AS ?same) } I saw the comments to the pull request about this being super long and ugly, Which I understand and feel. In that case I think we can use an XML entity. <?xml version="1.0"?>
<!DOCTYPE rdf [
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">
]>
...
<bp:comment rdf:datatype="&xsd;string"
... And because we live in the world of RDF 1.1 we can actually delete all the xsd:string attributes. Of course you don't need to accept either option nor the pull request, I just hate opening issues in an important but underfunded project without trying to make life as easy as possible for the kind people maintaining it. For us it is trivial to "patch" the files before publishing. Because all the other work in new updates in Paxtools are totally worth that small hassle. I just want to avoid someone else wondering about why their query doesn't work when it all looks right. |
…ntities to make long IRI's for xsd datatype references without introducing a lot of bytes
Wanted to add that you can see the effect using comunica click the execute button. And see two datatypes. One for the "" as given and one extracted from the hcyc.owl test file. |
Oh, I get it - admit that I (re-)introduced this regression bug in 2023-07; it seems made it to Paxtools release v5.3.0 and v5.2.1 too (older v5.1.0 seems fine...) I like the first PR #55 more (small fix), with comments. |
@IgorRodchenkov thank you! While not that important you might also want to pick up the part of the second patch I proposed regarding skipping doctype and entity declarations. It will give you more flexibility in the future. |
Thanks for reporting, @JervenBolleman! I am likely actually going to make that change to skip printing rdf:datatype attributes for "string" type/range BioPAX OWL properties. Just trying to do some more research and confirm if it's safe (it definitely makes files smaller!) But I am stll not so sure... May I ask you which dataset were you working with? Were you just converting from existing BioPAX RDF/XML datafile to n-triples or were you building own BioPAX model with Paxtools (6.0.0-SNAPSHOT?), outputting to RDF/XML, then converting? |
…to v3 (due to weird jsonld conversion issues when using jena 4 or 5).
OMG... It seems that Paxtools IO should not write to the output OWL (RDF/XML) files any of those rdf:datatype attributes at all for the BioPAX model property values (e.g. bp:name, bp:db, bp:id etc. primitive props) because they are well defined in the BioPAX L3 specification (which we import into every such file and also use as namespace prefix, usually "bp"). |
@IgorRodchenkov the specification/owl import says what the range of things are. However, the SPARQL and RDF layers don't care about that. Even OWL with datatype reasoning might not do what you expect. I think it as likely that an OWL reasoner would say that the pax file is inconsistent with the definition as it would be to coeerce a values datatype to a different one. In any case before you implement that may I suggest testing it out first, with a wide variety of tools. |
We are updating the code that is used to generate the biopax level2 of Rhea and noticed the change in output. |
@JervenBolleman please do not use BioPAX Level2 use Level3 if you can (Paxtools can in fact autoconvert L2 to L3). PS: as to "testing it out first, with a wide variety of tools...", IMHO, unlikely we could ever satisfy all the tools that might also have bugs... As long as we have valid OWL model (written as RDF/XML), we should be good - others can transform/postfix/mixin the data for their needs. Unfortunately, we also had to write lots of code to semantically validate and "clean" existing BioPAX data that we integrate/merge into Pathway Commons model (problems include wrong use of biopax properties, bad/nonsense/misspelled identifier collection names (in xref.db), unification xrefs attached to wrong type of individuals/objects, etc...) |
Dear BioPax developers,
On line 669 of the SimpleIOHandler the datatype print is slightly off.
Using a tool like rapper or riot to convert the output of the rdf/xml to n-triples, we see the difference (example from rhea-db)
was printed but we expect
Quite a lot of tools auto-correct this difference down stream. But if one uses a tool that is spec rdf/xml compliant this gives a difference when doing SPARQL values comparison. Changing from a numeric comparison to a string comparison.
We will open a PR with this fix soon.
Regards,
Jerven
The text was updated successfully, but these errors were encountered: