Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What are GC_IDs? #47

Open
cthoyt opened this issue Dec 8, 2020 · 5 comments
Open

What are GC_IDs? #47

cthoyt opened this issue Dec 8, 2020 · 5 comments

Comments

@cthoyt
Copy link
Contributor

cthoyt commented Dec 8, 2020

Most terms have an xref to a namespace with a prefix GC_ID. Is anyone familiar with what that is or what it abbreviates?

@cthoyt cthoyt changed the title What are GC_IDs What are GC_IDs? Dec 8, 2020
@jamesaoverton
Copy link
Collaborator

I have a partial answer. The ncbitacon.owl file is a direct translation of the taxdmp.zip file available here: https://ftp.ncbi.nih.gov/pub/taxonomy/. In that directory is a taxdmp_readme.txt that explains the various fields. "GC" is their abbreviation for "genetic code", and points to a gencode.dmp file that we do not translate. Official NCBI Taxonomy pages include a "Genetic code" field with a link, e.g. https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606&lvl=3&lin=f&keep=1&srchmode=1&unlock. That's as much as I know.

@cthoyt
Copy link
Contributor Author

cthoyt commented Dec 8, 2020

Thanks @jamesaoverton, that's much appreciated. It's unbelievable how many nomenclatures the NCBI has generated...

@cmungall
Copy link
Member

FWIW UMLS doesn't translate this either https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NCBI/sourcerepresentation.html

I suggest

  1. Register something like NCBI.gc with identifiers.org / n2t.net
  2. Point this at URLs like https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG2 or https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?chapter=tgencodes#SG2
  3. have an annotation/xref pointing to this
  4. (stretch) have some kind of ontological rendering of gencode.dmp (btw, did the file move? I don't see it). E.g
    • taxon has-part (nuclear genome and has-part some translation system GC_ID)
    • GC_ID a SP:codon, label "ATG", starts-with some (adenine and followed-by thymine and ends-with guanine) encodes chebi:methionine]
    • this injects a bunch of blank nodes into the ontology with no real priority use case and would be for the sake of ontological completeness, so YMMV....

@cthoyt
Copy link
Contributor Author

cthoyt commented Oct 8, 2021

FYI: This has been registered in the Bioregistry at http://bioregistry.io/registry/gc

@cthoyt
Copy link
Contributor Author

cthoyt commented Nov 28, 2024

@cmungall @jamesaoverton I have got a working prototype of ontologizing GC in biopragmatics/pyobo#250, but I'm wondering if an approach more like TAXRANK where I hand-curate the list, then xref back to the NCBI vocabulary is best. I also went a bit off script in the PyOBO PR and added some non-isomorphic stuff, like defining an external "has genetic code" type definition as well as added a few mid-level classes to categorize nuclear, mitochondrial, and plastid genomes. Given I did that, maybe it makes sense to go the TAXRANK route of making this a manually-curated hub that could also be a place where we could add some of the ontologizations chris mentioned above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants