Skip to content

Conversation

gouttegd
Copy link
Contributor

@gouttegd gouttegd commented Jul 14, 2025

Resolves [#421, #457]

  • docs/ have been added/updated if necessary
  • make test has been run locally
  • [ ] tests have been added/updated (if applicable)
  • CHANGELOG.md has been updated.

This is the complete proposal for the specification of the SSSOM/RDF serialisation format, according to the current state of the discussions about it.

This is the complete proposal for the specification of the SSSOM/RDF
serialisation format, according to the current state of the discussions
about it.
@gouttegd gouttegd self-assigned this Jul 14, 2025
@gouttegd gouttegd requested a review from matentzn July 14, 2025 22:23
As noticed by @nichtich:

> the use of `pav:authoredOn` only makes sense if `pav:createdOn` is
> used as well to differentiate two types of dates, in addition to the
> publication date. SSSOM only has one type of date so there is no need
> not to use plain old `dcterms:created`.

closes #457
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start. Lets go a bit back and forth over this; I made my first round of comments with the biggest bomb is to specify a bespoke serialisation of curie_map.

Use BCP14 keywords more consistently.

Add a "special consideration" section to explain the possibility of
injecting "direct" SPO triples.
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic things, I will ask two reviewers to chime in so we can merge this asap!

This is huge work, thanks @gouttegd!! much appreciated!

@matentzn matentzn requested review from cthoyt and ehartley September 24, 2025 20:17
@cthoyt
Copy link
Member

cthoyt commented Sep 24, 2025

i'm going to run prettier on the markdown before reviewing it, after damien has a chance to address your suggestions

Clarify that a "string literal" is a `xsd:string` literal -- this has
the side-effect of clarifying that it cannot be a langString.

Also fix incorrect use of pav:authoredBy to represent the creator_id
slot.
@gouttegd
Copy link
Contributor Author

@cthoyt As you wish, but please note that you can also view a “rendered” version directly on the branch: https://github.com/mapping-commons/sssom/blob/rdf-spec/src/docs/spec-formats-rdf.md

This command: `npx prettier --prose-wrap always --check --write src/docs/spec-formats-rdf.md`
Copy link
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all for being patient and @gouttegd for writing this important groundwork. I had a full read and left some minor comments.

One larger question is still about what's the role of this specification? Is the goal to make sure that we can implement RDF serialization outside of a LinkML context (which I consider very important, it's not reliable to defer to that implicitly)

In many cases, I think we should be much more explicit. Let's consider in a follow-up to develop example a suite SSSOM/TSV and SSSOM/RDF input/output pairs against which a serializer can be valiated

@gouttegd
Copy link
Contributor Author

gouttegd commented Sep 26, 2025

One larger question is still about what's the role of this specification?

Err, to allow developers to write SSSOM/RDF serialisers and deserialisers? The same way the SSSOM/TSV specification is there to allow developers to write SSSOM/TSV serialisers and deserialisers.

What else do you think the specification of a format is for?

Is the goal to make sure that we can implement RDF serialization outside of a LinkML context

Yes. Because LinkML is made by and for Python developers. The LinkML runtime (with its built-in RDF serialisers and deserialisers) is only available in Python – there is no support whatsoever for any other language. Programmers in other languages have to implement RDF serialisation “outside of a LinkML context”, because there is no such thing as “a LinkML context” for them.

And as if that was not enough, it so happens that LinkML barely bothers to fully describe the way their RDF serialiser and deserialiser work (how objects described in a LinkML schema are turned into a RDF graph, or read from a RDF graph), which means that in practice, without the formal specification that we are trying to make here, the only way for someone wishing to read/write SSSOM/RDF files while having the silly idea of not working in Python is to reverse-engineer SSSOM-Py – that’s what I had to do when I added RDF support in SSSOM-Java. I highlighted at the very beginning of the discussion about the RDF serialisation that this was not acceptable for something claiming to be a “standard”.

@gouttegd
Copy link
Contributor Author

gouttegd commented Sep 26, 2025

Besides, the LinkML-generated serialization does not always do what we’d want.

For example, it will serialize a mapping record with an explicit record_id as:

[] a owl:Axiom ;
   owl:annotatedSource UBERON:0000001 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000001 ;
   sssom:mapping_justification semapv:ManualMappingCuration .
   sssom:record_id "https://example.org/mymapping1" .

whereas it was quickly agreed in the discussion about the RDF serialisation that, whenever a record_id is available, it should be used as the named resource that represents the entire record (this is one of the points that were the most important to @matentzn ), as in:

<https://example.org/mymapping1> a owl:Axiom ;
   owl:annotatedSource UBERON:0000001 ;
   owl:annotatedProperty semapv:crossSpeciesExactMatch ;
   owl:annotatedTarget FBbt:00000001 ;
   sssom:mapping_justification semapv:ManualMappingCuration .

cthoyt
cthoyt previously approved these changes Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants