Principle #12 naming conventions - automated validation #1006

beckyjackson · 2019-08-09T15:05:00Z

FP 12 - Naming Conventions

Automated checks:

All entities must have labels
No entities may share a label
No entity may have more than one label

Mechanism:
ROBOT report already includes checks 1 through 3. We can run report and only look at the results of these three checks. If any of the rules are violated, the check fails.

We also may want to look at overlapping labels at some point (entities from separate ontologies that share a label) and determine if these need an 'OBO Foundry unique label', though I'm not sure if that needs to be addressed right now.

The text was updated successfully, but these errors were encountered:

nataled · 2019-09-24T17:49:08Z

From EWG discussion on this:

labels must be unique within ontology, lowercase, no underscores

beckyjackson · 2019-09-27T15:14:51Z

Checking that labels start with a lowercase character could be something we can add to ROBOT report. I wouldn't say it was an error, though, as there may be exceptions - either warning or even an info message? Underscore checking, as well, although I'm trying to think if there may be exceptions to this. @jamesaoverton - what do you think?

jamesaoverton · 2019-09-27T16:14:31Z

I agree about uniqueness. ROBOT already checks that.

There are lots of old terms that include underscores, especially relations. I'd like to switch them to spaces for consistency, I just worry that changing labels can break things, and I don't know how important that really is.

While lowercase is a good rule of thumb, I can think of so many valid exceptions that I don't see how we can make a worthwhile automated check. Just looking at OBI, we have plenty of terms with labels that include proper names (companies, trademarked devices, 'Bernoulli trial'), taxa ('Mus musculus'), others like "B cell" and "T cell", all of which seem legitimate to me. We also have cases where we use an acronym as part of a label when it's better known than the expanded version, which we do judiciously.

nataled · 2019-09-27T16:42:37Z

Underscores for relations are indeed relatively accepted (and actually rather useful), but not for other terms. You're spot on with the lowercase issues, though lowercase is indeed the default casing that should be used (other than the usual exceptions for proper names and very common abbreviations such as 'DNA'). Oh, forgot that CamelCase is also not allowed.

The NCBITaxon exceptions are so ubiquitous that there is probably no need to run this check on it. Then again, no one maintains that ontology so none of the principles actually apply to it.

Perhaps the casing check could be as simple as "XYZ ontology has nn% terms that are uppercase." I would say being close to 100% for NCBITaxon is to be expected, but the number should be relatively small for other ontologies.

One other thing--I hesitate to even mention it--is that we could maintain a list of accepted uppercase labels. I actually do this for PRO; that is, I have a file that lists things that are okay, like Holliday and Golgi, and allow those to 'pass'. I hesitate to mention it because of the maintenance and portability issues that would come with implementing such a mechanism. I suppose a separate file could be created that contains some minimal set, and users could add to it after download, and maybe even suggest additions. ROBOT could look for this file (if it exists) and read its contents.

zhengj2007 · 2020-01-31T21:04:05Z

It seems ROBOT check uniqueness of labels with prefix assigned to a given ontology but not including all imported terms. It would be good to check all terms in an ontology to give warning to ontology developers that some entities shared a label.

VEuPathDB ontology made a release on 2019-12-16. During release process, we found IDO_0000586 and OBI_1110021 shared label 'infection' due to imported OBI terms are out-of-date. The issue identified by manual review rather than Robot tool checking.

jamesaoverton · 2020-01-31T22:04:06Z

@zhengj2007 This is a little tricky:

The robot report query for duplicate labels does not filter by prefix -- it includes all labels in the loaded ontology. I suspect that you were running robot report on an editing version of your ontology, without the imports merged. If there's a bug with this, it would be better on the ROBOT tracker.
However the OBO Dashboard tests do filter by prefix, so that the dashboard does not report problems with imports. I think that's the correct behaviour.

zhengj2007 · 2020-02-03T15:51:12Z

@jamesaoverton Thanks for explanation. I did not run the robot report during release. I downloaded the results from OBO Dashboard tests. That's why it was not identified. However, it might be good to include it on OBO Dashboard tests by throwing a warning message.

nlharris · 2022-01-26T22:33:38Z

What's the status of this? Is this now covered by the dashboard checks?

nataled · 2022-01-26T22:47:03Z

Status unsure, pending review by EWG.

beckyjackson added attn: Editorial WG Issues pertinent to editorial activities, such as ontology reviews and principles attn: Technical WG Issues pertinent to technical activities, such as maintenance of website, PURLs, and tools labels Aug 9, 2019

beckyjackson self-assigned this Aug 9, 2019

beckyjackson mentioned this issue Oct 22, 2019

First steps toward OBO Dashboard #1069

Merged

jamesaoverton mentioned this issue Nov 19, 2019

OBO Dashboard #1076

Closed

cmungall added the principles Issues related to Foundry principles label Nov 22, 2019

cmungall changed the title ~~Principle #12 automated validation~~ Principle #12 naming conventions - automated validation Nov 22, 2019

wdduncan added the automated validation of principles Issues for the editorial WG pertinent to the automating the validation of the Principles. label Apr 28, 2020

beckyjackson removed their assignment May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Principle #12 naming conventions - automated validation #1006

Principle #12 naming conventions - automated validation #1006

beckyjackson commented Aug 9, 2019 •

edited

Loading

nataled commented Sep 24, 2019

beckyjackson commented Sep 27, 2019

jamesaoverton commented Sep 27, 2019

nataled commented Sep 27, 2019

zhengj2007 commented Jan 31, 2020

jamesaoverton commented Jan 31, 2020

zhengj2007 commented Feb 3, 2020

nlharris commented Jan 26, 2022

nataled commented Jan 26, 2022

Principle #12 naming conventions - automated validation #1006

Principle #12 naming conventions - automated validation #1006

Comments

beckyjackson commented Aug 9, 2019 • edited Loading

nataled commented Sep 24, 2019

beckyjackson commented Sep 27, 2019

jamesaoverton commented Sep 27, 2019

nataled commented Sep 27, 2019

zhengj2007 commented Jan 31, 2020

jamesaoverton commented Jan 31, 2020

zhengj2007 commented Feb 3, 2020

nlharris commented Jan 26, 2022

nataled commented Jan 26, 2022

beckyjackson commented Aug 9, 2019 •

edited

Loading