Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Principle #12 naming conventions - automated validation #1006

Open
beckyjackson opened this issue Aug 9, 2019 · 9 comments
Open

Principle #12 naming conventions - automated validation #1006

beckyjackson opened this issue Aug 9, 2019 · 9 comments
Labels
attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles. principles Issues related to Foundry principles

Comments

@beckyjackson
Copy link
Contributor

beckyjackson commented Aug 9, 2019

FP 12 - Naming Conventions

Automated checks:

  1. All entities must have labels
  2. No entities may share a label
  3. No entity may have more than one label

Mechanism:
ROBOT report already includes checks 1 through 3. We can run report and only look at the results of these three checks. If any of the rules are violated, the check fails.

We also may want to look at overlapping labels at some point (entities from separate ontologies that share a label) and determine if these need an 'OBO Foundry unique label', though I'm not sure if that needs to be addressed right now.

@beckyjackson beckyjackson added attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools labels Aug 9, 2019
@beckyjackson beckyjackson self-assigned this Aug 9, 2019
@nataled
Copy link
Contributor

nataled commented Sep 24, 2019

From EWG discussion on this:

labels must be unique within ontology, lowercase, no underscores

@beckyjackson
Copy link
Contributor Author

Checking that labels start with a lowercase character could be something we can add to ROBOT report. I wouldn't say it was an error, though, as there may be exceptions - either warning or even an info message? Underscore checking, as well, although I'm trying to think if there may be exceptions to this. @jamesaoverton - what do you think?

@jamesaoverton
Copy link
Member

I agree about uniqueness. ROBOT already checks that.

There are lots of old terms that include underscores, especially relations. I'd like to switch them to spaces for consistency, I just worry that changing labels can break things, and I don't know how important that really is.

While lowercase is a good rule of thumb, I can think of so many valid exceptions that I don't see how we can make a worthwhile automated check. Just looking at OBI, we have plenty of terms with labels that include proper names (companies, trademarked devices, 'Bernoulli trial'), taxa ('Mus musculus'), others like "B cell" and "T cell", all of which seem legitimate to me. We also have cases where we use an acronym as part of a label when it's better known than the expanded version, which we do judiciously.

@nataled
Copy link
Contributor

nataled commented Sep 27, 2019

Underscores for relations are indeed relatively accepted (and actually rather useful), but not for other terms. You're spot on with the lowercase issues, though lowercase is indeed the default casing that should be used (other than the usual exceptions for proper names and very common abbreviations such as 'DNA'). Oh, forgot that CamelCase is also not allowed.

The NCBITaxon exceptions are so ubiquitous that there is probably no need to run this check on it. Then again, no one maintains that ontology so none of the principles actually apply to it.

Perhaps the casing check could be as simple as "XYZ ontology has nn% terms that are uppercase." I would say being close to 100% for NCBITaxon is to be expected, but the number should be relatively small for other ontologies.

One other thing--I hesitate to even mention it--is that we could maintain a list of accepted uppercase labels. I actually do this for PRO; that is, I have a file that lists things that are okay, like Holliday and Golgi, and allow those to 'pass'. I hesitate to mention it because of the maintenance and portability issues that would come with implementing such a mechanism. I suppose a separate file could be created that contains some minimal set, and users could add to it after download, and maybe even suggest additions. ROBOT could look for this file (if it exists) and read its contents.

@cmungall cmungall added the principles Issues related to Foundry principles label Nov 22, 2019
@cmungall cmungall changed the title Principle #12 automated validation Principle #12 naming conventions - automated validation Nov 22, 2019
@zhengj2007
Copy link
Contributor

It seems ROBOT check uniqueness of labels with prefix assigned to a given ontology but not including all imported terms. It would be good to check all terms in an ontology to give warning to ontology developers that some entities shared a label.

VEuPathDB ontology made a release on 2019-12-16. During release process, we found IDO_0000586 and OBI_1110021 shared label 'infection' due to imported OBI terms are out-of-date. The issue identified by manual review rather than Robot tool checking.

@jamesaoverton
Copy link
Member

@zhengj2007 This is a little tricky:

  1. The robot report query for duplicate labels does not filter by prefix -- it includes all labels in the loaded ontology. I suspect that you were running robot report on an editing version of your ontology, without the imports merged. If there's a bug with this, it would be better on the ROBOT tracker.

  2. However the OBO Dashboard tests do filter by prefix, so that the dashboard does not report problems with imports. I think that's the correct behaviour.

@zhengj2007
Copy link
Contributor

@jamesaoverton Thanks for explanation. I did not run the robot report during release. I downloaded the results from OBO Dashboard tests. That's why it was not identified. However, it might be good to include it on OBO Dashboard tests by throwing a warning message.

@wdduncan wdduncan added the automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles. label Apr 28, 2020
@nlharris
Copy link
Contributor

What's the status of this? Is this now covered by the dashboard checks?

@nataled
Copy link
Contributor

nataled commented Jan 26, 2022

Status unsure, pending review by EWG.

@beckyjackson beckyjackson removed their assignment May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles. principles Issues related to Foundry principles
Projects
None yet
Development

No branches or pull requests

7 participants