Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefix [seed.role] #1340

Closed
samseaver opened this issue Jan 6, 2025 · 12 comments · Fixed by #1341
Closed

Add prefix [seed.role] #1340

samseaver opened this issue Jan 6, 2025 · 12 comments · Fixed by #1341
Labels
New Used in combination with prefix, metaprefix, or collection for new entries Prefix

Comments

@samseaver
Copy link
Contributor

Prefix

seed.role

Name

The SEED

Homepage

https://theseed.org/

Source Code Repository

https://github.com/orgs/TheSEED

Description

With the growing number of available genomes, the need for an environment to support effective comparative analysis increases. The original SEED Project was started in 2003 by the Fellowship for Interpretation of Genomes (FIG) as a largely unfunded open source effort. Argonne National Laboratory and the University of Chicago joined the project, and now much of the activity occurs at those two institutions (as well as the University of Illinois at Urbana-Champaign, Hope college, San Diego State University, the Burnham Institute and a number of other institutions). The cooperative effort focuses on the development of the comparative genomics environment called the SEED and, more importantly, on the development of curated genomic data.

License

MIT

Publications

doi:10.1093/nar/gki866

Example Local Unique Identifier

Histidinol dehydrogenase (EC 1.1.1.23)

Regular Expression Pattern for Local Unique Identifier

No response

URI Format String

No response

Wikidata Property

No response

Contributor Name

Sam Seaver

Contributor GitHub

samseaver

Contributor ORCiD

0000-0002-7674-5194

Contributor Email

No response

Contact Name

No response

Contact ORCiD

No response

Contact GitHub

No response

Contact Email

No response

Additional Comments

No response

@samseaver samseaver added New Used in combination with prefix, metaprefix, or collection for new entries Prefix labels Jan 6, 2025
@cthoyt
Copy link
Member

cthoyt commented Jan 6, 2025

Hi @samseaver, there doesn't appear to be enough information in this new prefix request to make a follow-up.

It appears that the example you gave is an EC number. This already has a prefix eccode. More information at https://bioregistry.io/eccode.

It might be helpful to describe the semantic space itself instead of the resource that it comes from. What is a "role" in SEED? What kinds of things appear in this semantic space? Is there something other than enzymes, or is it just a duplicate of EC?

You might also want to take a look at existing SEED prefixes:

  1. https://bioregistry.io/registry/seed
  2. https://bioregistry.io/registry/seed.compound
  3. https://bioregistry.io/registry/seed.reaction

@samseaver
Copy link
Contributor Author

samseaver commented Jan 7, 2025

Hi, so, this is basically an ontology that is controlled by the organization known as SEED, there's not much other than their own publications and web-pages that describes how they curate their ontology, the paper I linked pretty much describes it. The "subsystem" is a collection of molecular roles, and the molecular roles describe the function of one or more protein in microbes and plants. It is in fact an ontology much like many of the other biological ontologies that are registered in bioregistry.io, but its parent organization does the curation. It then follows that the text of the curation, though they don't use identifiers, is what they decide it should be, as in essence, it can differentiate from what other ontologies use. The use of an EC number or in the case of transporters, a TC number, etc. links the descriptive text of the molecular role to another ontology of course, but doesn't necessarily mean that the role itself is exactly what you would see elsewhere.

Does this help?

@samseaver
Copy link
Contributor Author

Hello, is there any update to this?

@bgyori
Copy link
Contributor

bgyori commented Jan 14, 2025

Hi @samseaver, I think to properly address this, we would want to carefully review of all the existing prefixes related to SEED and your contributions in that context, and come up with a comprehensive solution for its representation. Unless someone else volunteers to do this, I think @nalikapalayoor could take this on and dig into the details to understand what is going on and how to best organize all the metadata surrounding SEED.

@nalikapalayoor
Copy link
Contributor

Hi, yes I am happy to look into this!

@bgyori
Copy link
Contributor

bgyori commented Jan 17, 2025

Hello, @nalikapalayoor and I have looked into this issue in more detail, here is a summary of our findings and what we think the next steps should be:

We can make changes accordingly, let us know if there is anything else we missed!

@samseaver
Copy link
Contributor Author

Hello, this is great! I appreciate that you two took the time to look into this, and it's very helpful. We're glad that we're on the same page with the system of dedicated prefixes, and I think that covers it.

Thank you!
Sam

@cthoyt
Copy link
Member

cthoyt commented Jan 17, 2025

I'm not sure I agree that identifiers like Histidinol dehydrogenase (EC 1.1.1.23) which correspond to an existing semantic space (albeit, in a non-trivial way) are sufficient to warrant a new prefix, which would imply that this is a new semantic space.

The use of an EC number or in the case of transporters, a TC number, etc. links the descriptive text of the molecular role to another ontology of course, but doesn't necessarily mean that the role itself is exactly what you would see elsewhere.

I'm not quite sure I understand this, does it mean that there are other kinds of things that appear as a SEED role? Like from the Transporter Classification Database (which is what I guess you mean by TC)? Can you give some more examples of things that are SEED roles to cover all different varieties?


In situations where the semantic space is reused in a trivial way, e.g., like how record for the SABIO-RK (sabiork.ec) prefix directly reuses the eccode semantic space, we are able to annotate a provides relationship. This is a bit more straightforwards, and any downstream tooling can automatically standardize a sabiork.ec CURIE to eccode. We don't typically curate things this way in the Bioregistry, but instead use the provides relationship to help consolidate historic curations, e.g., from Identifiers.org.

However, here, it's not clear if it's fair to say this is a new semantic space, or just an implementation detail of SEED in representing an existing semantic space. Since the job of Bioregistry is to represent semantic spaces, and not projects/tools/databases, this is where my concerns come from.

Therefore, I would like some time to investigate further if there are any examples of other prefixes that are minted this way. I think it's probably the case that we haven't done this before (intentionally)

Further, I re-read the contribution guidelines and found that I had not included any guidance on what constitutes a semantic space itself. I think that such a section would necessarily need to draw from W3C guidelines/specifications on CURIEs, the Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data, and maybe some other sources. I would like to make clear some guidance to help in future submissions


Given that I haven't given very positive feedback after you all put some thought and effort into this, I'll pose three alternatives for consideration:

  1. Curate seed.role provider in the eccode prefix, so you can see that seed.role corresponds to EC codes. There's a bit of an issue here about the non-trivial transform required to create a URL
  2. Modify the seed.role record to point to eccode, despite this being uncommon. If it's more important that you have a specific record for seed.role, then this might be an acceptable compromise, but this makes less sense if seed.role is the union of eccode, tcdb, and potentially more
  3. Consider a more heavyweight solution where we can include SED-like string processing commands that help transform SEED-flavored EC codes into standard ones.

@bgyori
Copy link
Contributor

bgyori commented Jan 17, 2025

Only some roles are EC-code related. There are many roles with no corresponding EC-code, e.g.,

So I still maintain that having a dedicated prefix for seed.role is the right conclusion.

@cthoyt
Copy link
Member

cthoyt commented Jan 17, 2025

Okay, so I don't think any of my suggestions are actionable. Let's just add some example_extras to cover those additional example local unique identifiers then finish the PR

@samseaver
Copy link
Contributor Author

I'm late responding as we had a holiday weekend, but thanks again for the feedback, we appreciate that it isn't a straight-forward situation with semantic spaces.

@bgyori
Copy link
Contributor

bgyori commented Jan 21, 2025

I just pushed example_extras with non-EC-code examples which I think resolves everything we discussed so far

bgyori added a commit that referenced this issue Jan 21, 2025
Closes #1340

---------

Co-authored-by: samseaver <[email protected]>
Co-authored-by: Ben Gyori <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New Used in combination with prefix, metaprefix, or collection for new entries Prefix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants