Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subspacing Policy #133

Open
cmungall opened this issue Jul 30, 2021 · 0 comments
Open

Subspacing Policy #133

cmungall opened this issue Jul 30, 2021 · 0 comments
Labels
documentation Improvements or additions to documentation Policy

Comments

@cmungall
Copy link
Contributor

cmungall commented Jul 30, 2021

See also: #65

We're going to consider the names "subspacing" and "subnamespacing" the same for this policy. The general strategy handed down from identifiers.org is subnamespacing, e.g.

https://bioregistry.io/registry/kegg
https://bioregistry.io/registry/kegg.orthology
https://bioregistry.io/registry/kegg.FOO

I don't love this as it means there are two possible CURIEs for any KEGG entity, e.g.

https://bioregistry.io/kegg:K00001
https://bioregistry.io/kegg.orthology:K00001

As you know I am in favor of The One True Way to write a CURIE. But I appreciate it is handy to be liberal in what is resolved. I might suggest that the parent be annotated in such a way that explicitly indicates that this is a "parent" or whatever nomenclature we use.

E.g. "You are on the page for KEGG. This is a collection of namespaces. Consider using one of the namespaces below: kegg.orthology, ..."

However, I am not sure the inherited namespaces were done consistently. E.g. we have:

http://bioregistry.io/registry/wormpep
http://bioregistry.io/registry/wormbase

Wormbase has many datatypes. AFAIK there is no reason to call out their proteins over say, their genes or their alleles and make this it's own entry. Some of these may be historic decisions based on misunderstandings about what kinds of objects different databases can resolve.

For RGD we have rgd, rgd.qtl, rgd.strain. Here the parent is more of a bucket than it is a parent. It is the prefix for all things that are not qtls or strains (again, RGD has many many datatypes, and it's not clear why qtls and strains get their own subnamespace and everything else is rolled up into the parent)

I suggest the following guidelines (v drafty)

  • most resources should consider having a single namespace and rolling up everything to that namespace. They should take care of resolving everything within that namespace
  • "bigger" entities such as NCBI, KEGG, etc could be seen as collections of distinct databases justifying subnamespacing
  • where subnamespacing does happen, we encourage the parent namespace to be a collector and for the set of subnamespaces to be JEPD jointly exhaustive and pairwise disjoint
  • metadata about the uber-database can go in the parent entity, and the UI can be smart enough to replicate some of that in pages for the subnamespaces
  • care should be taken to assign the correct "parent". E,g, if a database is NCBI-specific, then ncbi.foo but if it's an ID that is jointly shared across INSDC, then insdc.foo
  • the period is the default divider, but there may be legacy cases where this is omitted, e.g. ncbigene, ncbitaxon
  • don't squat by making your namespace a core biological category. If you have a gene database, then don't try and squat gene (unless this is the one true effort to unite all genes, oh that would be wonderful).
@cthoyt cthoyt added the documentation Improvements or additions to documentation label Jul 30, 2021
@cthoyt cthoyt changed the title Document policy for subnamespacing Policy for subnamespacing Sep 21, 2021
@cthoyt cthoyt added the Policy label Sep 21, 2021
@cthoyt cthoyt changed the title Policy for subnamespacing Subnamespacing Policy Sep 21, 2021
@cthoyt cthoyt changed the title Subnamespacing Policy Subspacing Policy Sep 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation Policy
Projects
None yet
Development

No branches or pull requests

2 participants