You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're going to consider the names "subspacing" and "subnamespacing" the same for this policy. The general strategy handed down from identifiers.org is subnamespacing, e.g.
As you know I am in favor of The One True Way to write a CURIE. But I appreciate it is handy to be liberal in what is resolved. I might suggest that the parent be annotated in such a way that explicitly indicates that this is a "parent" or whatever nomenclature we use.
E.g. "You are on the page for KEGG. This is a collection of namespaces. Consider using one of the namespaces below: kegg.orthology, ..."
However, I am not sure the inherited namespaces were done consistently. E.g. we have:
Wormbase has many datatypes. AFAIK there is no reason to call out their proteins over say, their genes or their alleles and make this it's own entry. Some of these may be historic decisions based on misunderstandings about what kinds of objects different databases can resolve.
For RGD we have rgd, rgd.qtl, rgd.strain. Here the parent is more of a bucket than it is a parent. It is the prefix for all things that are not qtls or strains (again, RGD has many many datatypes, and it's not clear why qtls and strains get their own subnamespace and everything else is rolled up into the parent)
I suggest the following guidelines (v drafty)
most resources should consider having a single namespace and rolling up everything to that namespace. They should take care of resolving everything within that namespace
"bigger" entities such as NCBI, KEGG, etc could be seen as collections of distinct databases justifying subnamespacing
where subnamespacing does happen, we encourage the parent namespace to be a collector and for the set of subnamespaces to be JEPD jointly exhaustive and pairwise disjoint
metadata about the uber-database can go in the parent entity, and the UI can be smart enough to replicate some of that in pages for the subnamespaces
care should be taken to assign the correct "parent". E,g, if a database is NCBI-specific, then ncbi.foo but if it's an ID that is jointly shared across INSDC, then insdc.foo
the period is the default divider, but there may be legacy cases where this is omitted, e.g. ncbigene, ncbitaxon
don't squat by making your namespace a core biological category. If you have a gene database, then don't try and squat gene (unless this is the one true effort to unite all genes, oh that would be wonderful).
The text was updated successfully, but these errors were encountered:
See also: #65
We're going to consider the names "subspacing" and "subnamespacing" the same for this policy. The general strategy handed down from identifiers.org is subnamespacing, e.g.
https://bioregistry.io/registry/kegg
https://bioregistry.io/registry/kegg.orthology
https://bioregistry.io/registry/kegg.FOO
I don't love this as it means there are two possible CURIEs for any KEGG entity, e.g.
https://bioregistry.io/kegg:K00001
https://bioregistry.io/kegg.orthology:K00001
As you know I am in favor of The One True Way to write a CURIE. But I appreciate it is handy to be liberal in what is resolved. I might suggest that the parent be annotated in such a way that explicitly indicates that this is a "parent" or whatever nomenclature we use.
E.g. "You are on the page for KEGG. This is a collection of namespaces. Consider using one of the namespaces below: kegg.orthology, ..."
However, I am not sure the inherited namespaces were done consistently. E.g. we have:
http://bioregistry.io/registry/wormpep
http://bioregistry.io/registry/wormbase
Wormbase has many datatypes. AFAIK there is no reason to call out their proteins over say, their genes or their alleles and make this it's own entry. Some of these may be historic decisions based on misunderstandings about what kinds of objects different databases can resolve.
For RGD we have rgd, rgd.qtl, rgd.strain. Here the parent is more of a bucket than it is a parent. It is the prefix for all things that are not qtls or strains (again, RGD has many many datatypes, and it's not clear why qtls and strains get their own subnamespace and everything else is rolled up into the parent)
I suggest the following guidelines (v drafty)
ncbi.foo
but if it's an ID that is jointly shared across INSDC, theninsdc.foo
gene
(unless this is the one true effort to unite all genes, oh that would be wonderful).The text was updated successfully, but these errors were encountered: