-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Principle #12 naming conventions - automated validation #1006
Comments
From EWG discussion on this: labels must be unique within ontology, lowercase, no underscores |
Checking that labels start with a lowercase character could be something we can add to ROBOT report. I wouldn't say it was an error, though, as there may be exceptions - either warning or even an info message? Underscore checking, as well, although I'm trying to think if there may be exceptions to this. @jamesaoverton - what do you think? |
I agree about uniqueness. ROBOT already checks that. There are lots of old terms that include underscores, especially relations. I'd like to switch them to spaces for consistency, I just worry that changing labels can break things, and I don't know how important that really is. While lowercase is a good rule of thumb, I can think of so many valid exceptions that I don't see how we can make a worthwhile automated check. Just looking at OBI, we have plenty of terms with labels that include proper names (companies, trademarked devices, 'Bernoulli trial'), taxa ('Mus musculus'), others like "B cell" and "T cell", all of which seem legitimate to me. We also have cases where we use an acronym as part of a label when it's better known than the expanded version, which we do judiciously. |
Underscores for relations are indeed relatively accepted (and actually rather useful), but not for other terms. You're spot on with the lowercase issues, though lowercase is indeed the default casing that should be used (other than the usual exceptions for proper names and very common abbreviations such as 'DNA'). Oh, forgot that CamelCase is also not allowed. The NCBITaxon exceptions are so ubiquitous that there is probably no need to run this check on it. Then again, no one maintains that ontology so none of the principles actually apply to it. Perhaps the casing check could be as simple as "XYZ ontology has nn% terms that are uppercase." I would say being close to 100% for NCBITaxon is to be expected, but the number should be relatively small for other ontologies. One other thing--I hesitate to even mention it--is that we could maintain a list of accepted uppercase labels. I actually do this for PRO; that is, I have a file that lists things that are okay, like Holliday and Golgi, and allow those to 'pass'. I hesitate to mention it because of the maintenance and portability issues that would come with implementing such a mechanism. I suppose a separate file could be created that contains some minimal set, and users could add to it after download, and maybe even suggest additions. ROBOT could look for this file (if it exists) and read its contents. |
It seems ROBOT check uniqueness of labels with prefix assigned to a given ontology but not including all imported terms. It would be good to check all terms in an ontology to give warning to ontology developers that some entities shared a label. VEuPathDB ontology made a release on 2019-12-16. During release process, we found IDO_0000586 and OBI_1110021 shared label 'infection' due to imported OBI terms are out-of-date. The issue identified by manual review rather than Robot tool checking. |
@zhengj2007 This is a little tricky:
|
@jamesaoverton Thanks for explanation. I did not run the robot report during release. I downloaded the results from OBO Dashboard tests. That's why it was not identified. However, it might be good to include it on OBO Dashboard tests by throwing a warning message. |
What's the status of this? Is this now covered by the dashboard checks? |
Status unsure, pending review by EWG. |
FP 12 - Naming Conventions
Automated checks:
Mechanism:
ROBOT report already includes checks 1 through 3. We can run
report
and only look at the results of these three checks. If any of the rules are violated, the check fails.We also may want to look at overlapping labels at some point (entities from separate ontologies that share a label) and determine if these need an 'OBO Foundry unique label', though I'm not sure if that needs to be addressed right now.
The text was updated successfully, but these errors were encountered: