Skip to content

Commit

Permalink
Add FOODEX2 from AgroPortal (#573)
Browse files Browse the repository at this point in the history
* Add FOODEX2 from AgroPortal

* Update utils.py

* Create curation-import-external.md

* Finish

* Update curation-import-external.md
  • Loading branch information
cthoyt authored Sep 17, 2022
1 parent 45c8a21 commit 431e460
Show file tree
Hide file tree
Showing 8 changed files with 84 additions and 6 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -320,3 +320,5 @@ scratch/
docs/img/*.png
docs/img/*.eps
docs/source/api/
docs/_site/
docs/.jekyll-cache/
47 changes: 47 additions & 0 deletions docs/curation-import-external.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
layout: page
title: Importing External Prefixes
permalink: /curation/import-external
---
While the Bioregistry automatically imports all prefixes from external
registries with similar scope and sufficient minimum metadata and quality
standards (e.g., Identifiers.org), it only partially aligns most external
registries (e.g., BioPortal). This is a tutorial on how to import a prefix from
one of the partially aligned registries on an *as-needed* basis. More
specifically, it describes importing the _Food
classification_ ([`FOODEX2`](http://agroportal.lirmm.fr/ontologies/FOODEX2))
ontology from AgroPortal. Pull
request [#573](https://github.com/biopragmatics/bioregistry/pull/573) contains
the relevant diff for the changes described in this tutorial.

1. Identify a prefix of interest from AgroPortal (or another) curation
sheet (e.g., [here](https://github.com/biopragmatics/bioregistry/blob/main/src/bioregistry/data/external/agroportal/curation.tsv))
2. Pick a prefix for the Bioregistry. This doesn't have to be the same as the
external one, but usually it is. If the external prefix is too short or too
vague, it might be a good chance to improve on this. Further, keep in mind
that Bioregistry requires lowercase, so the best choice here is `foodex2`
3. Go into the `bioregistry.json` file and use the prefix as a key for a new
dictionary object.
4. The only thing you need inside the object is `"mappings"` which itself is a
dictionary object where the key is the metaprefix for the external registry
(in this case `agroportal`) and the value is the external registry's prefix
(in this case `FOODEX2`).

```json
"foodex2": {
"mappings": {
"agroportal": "FOODEX2"
}
}
```
5. Make sure the Bioregistry is installed in editable mode
6. Run the alignment script for the registry. In this case,
it's `python -m bioregistry.align.bioportal`
7. Run unit tests with `tox -e py`. This reveals that the alignment doesn't pull
in enough metadata to meet the minimum requirements. In this case, OntoPortal
instances (e.g., AgroPortal, BioPortal, etc.) don't provide a well-defined
example local unique identifier (though note they provide an
unstandardized `example_iri` field that might be helpful)
8. Curate an example identifier. In this case, the AgroPortal's `example_iri`
gives enough information to pull out an example identifier.
9. Add any additional curations
2 changes: 2 additions & 0 deletions docs/curation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ The following registries have xrefs that need curating:
</tbody>
</table>

See a more detailed tutorial [here](/curation/import-external).

## Adding a Wikidata Database Corresponding to Each Resource

<a id="wikidata"></a>
Expand Down
10 changes: 6 additions & 4 deletions src/bioregistry/align/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,15 +52,17 @@ class Aligner(ABC):

normalize_invmap: ClassVar[bool] = False

def __init__(self):
def __init__(self, force_download: Optional[bool] = None):
"""Instantiate the aligner."""
if self.key not in read_metaregistry():
raise TypeError(f"invalid metaprefix for aligner: {self.key}")

self.manager = Manager()

kwargs = self.getter_kwargs or {}
kwargs = dict(self.getter_kwargs or {})
kwargs.setdefault("force_download", True)
if force_download is not None:
kwargs["force_download"] = force_download
self.external_registry = self.__class__.getter(**kwargs)
self.skip_external = self.get_skip()

Expand Down Expand Up @@ -158,9 +160,9 @@ def write_registry(self) -> None:
self.manager.write_registry()

@classmethod
def align(cls, dry: bool = False, show: bool = False):
def align(cls, dry: bool = False, show: bool = False, force_download: Optional[bool] = None):
"""Align and output the curation sheet."""
instance = cls()
instance = cls(force_download=force_download)
if not dry:
instance.write_registry()
if show:
Expand Down
20 changes: 20 additions & 0 deletions src/bioregistry/data/bioregistry.json
Original file line number Diff line number Diff line change
Expand Up @@ -26356,6 +26356,26 @@
"pattern": "^FOOD\\d+$",
"uri_format": "https://foodb.ca/foods/$1"
},
"foodex2": {
"agroportal": {
"contact": {
"email": "[email protected]",
"name": "European Food Safety Authority"
},
"description": "FoodEx2 is a comprehensive food classification and description system aimed at covering the need to describe food in data collections across different food safety domains.",
"example_uri": "http://data.food.gov.uk/codes/foodtype/id/A0TMC",
"homepage": "http://www.efsa.europa.eu/",
"name": "Food classification and description system",
"prefix": "FOODEX2",
"publication": "https://doi.org/10.2903/sp.efsa.2015.EN-804",
"version": "2"
},
"example": "A0TMC",
"mappings": {
"agroportal": "FOODEX2"
},
"uri_format": "http://data.food.gov.uk/codes/foodtype/id/$1"
},
"foodon": {
"aberowl": {
"description": "A broadly scoped ontology representing entities which bear a “food role”. It encompasses materials in natural ecosystems and agriculture that are consumed by humans and domesticated animals. This includes any generic (unbranded) raw or processed food material found in processing plants, markets, stores or food distribution points. FoodOn also imports nutritional component and dietary pattern terms from other OBO Foundry ontologies to support interoperability in diet and nutrition research",
Expand Down
1 change: 0 additions & 1 deletion src/bioregistry/data/external/agroportal/curation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ DSW Darwin-SW https://github.com/darwin-sw/ Version 0.4 to 1.0 fixed reversed ds
DURUM_WHEAT Durum Wheat http://lovinra.inra.fr/ The durum wheat ontology (DURUM_WHEAT) is dedicated to the sustainability analysis of the durum wheat chain.Current data available on this ontology concern durum wheat quality criteria criteria used in different countries (Moisture content rate, chemical content, etc.).
E-PHY Catalogue des produits phytopharmaceutiques et de leurs usages, des matières fertilisantes et des supports de culture autorisés en France Le catalogue des produits phytopharmaceutiques de l’ANSES en une base de connaissances OWL dont le modèle ontologique est celui de la base de données E-Phy (https://ephy.anses.fr/) et les instances sont les produits qui y sont listés, alignées avec d’autres ressources sémantiques tel que le thésaurus French Crop Usage pour les cultures et l’ontologie CHEBI pour les familles chimiques. Cependant considérant que nos organismes ne sont pas des autorités pour ce catalogue, notre modèle de données est volontairement “bridé” pour être complètement rétro-compatible avec la base d’origine de façon à facilement mettre à jour notre ontologie à partir de nouveaux exports de l’ANSES.
FLAIR Wine descriptors ontology https://github.com/XavierDelpuech/flair Ce fichier au format RDF contient les éléments d'une modélisation de l'analyse sensorielle des vins. Cette ontologie a été construite lors du projet CASDAR VITISDATACROP (France, 2021-2023). Contacter Xavier DELPUECH ([email protected]) pour toute modification dans ce fichier., This file in RDF format contains the elements of a wine sensory tasting model. This ontology was built during the CASDAR VITISDATACROP project (France, 2021-2023). Contact Xavier DELPUECH ([email protected]) for any modification in this file.
FOODEX2 Food classification and description system http://www.efsa.europa.eu/ FoodEx2 is a comprehensive food classification and description system aimed at covering the need to describe food in data collections across different food safety domains.
FOODIE FOODIE core ontology http://foodie-cloud.github.io/model/FOODIE.html This release 4.6.3 added missing crop property; release 4.6.2 fixes model regarding DoseUnit that cannot be a codelist, and is defined as datatype, FOODIE ontology has been generated from FOODIE application schema (UML model), Revision 4.6.1, and translated into an ontology according to ISO/DIS 19150-2 using ShapeChange plus several pre and post processing changes. This is a revision of v4.3.2, manually updated to v4.6.1.
GACS Global Agricultural Concept Scheme http://agrisemantics.org/gacs/ The Global Agricultural Concept Scheme (GACS) is a hub for concepts related to agriculture, in multiple languages, for use in Linked Data. The idea for GACS emerged out of discussions at the World Congress of IAALD, the International Association of Agricultural Information Specialists, in July 2013. The Food and Agricultural Organization of the United Nations (FAO), CAB International (CABI), and the National Agricultural Library of the USA (NAL) agreed in October 2013 to explore the feasibility of developing a shared concept scheme by integrating their three thesauri: the AGROVOC Concept Scheme, the CAB Thesaurus (CABT), and NAL Thesaurus (NALT). In the GACS vision, the integration of these three thesauri is but the first step towards the realization of a hub that links to and from the concept schemes beyond the initial three, and in multiple language areas.
GAO Grapevine anatomy ontology Describes a list of organs classified by biological function, Décrit une liste d'organes classé par fonction biologique
Expand Down
1 change: 0 additions & 1 deletion src/bioregistry/data/external/bioportal/curation.tsv
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
prefix name homepage description
ABA-AMB Allen Brain Atlas (ABA) Adult Mouse Brain Ontology http://www.brain-map.org Allen Brain Atlas P56 Mouse Ontology
ABD Anthology of Biosurveillance Diseases http://brd.bsvgateway.org/disease Our disease ontology provides information on infectious diseases, disease synonyms, transmission pathways, disease agents, affected populations and disease properties. Diseases have been grouped into syndromic disease categories, such that programmers can look through relevant categories, as well as at specific diseases. Organisms, linked to both agents and populations, are structured hierarchically, to provide multiple levels of organism resolution. In addition, both disease transmission and relevant disease properties are available to search. Disease properties include tags like 'notifiable diseases' and 'economic importance' to flag particular disease characteristics that may be of interest, but are not captured elsewhere.
AC Mirzet
ACESO Adverse Childhood Experiences Ontology An ontology to help describe data about Adverse Childhood Experiences.
Expand Down
7 changes: 7 additions & 0 deletions src/bioregistry/schema/struct.py
Original file line number Diff line number Diff line change
Expand Up @@ -699,6 +699,8 @@ def get_name(self) -> Optional[str]:
"go",
"ncbi",
"bioportal",
"agroportal",
"ecoportal",
"miriam",
"n2t",
"cellosaurus",
Expand All @@ -724,6 +726,8 @@ def get_description(self, use_markdown: bool = False) -> Optional[str]:
"fairsharing",
"aberowl",
"bioportal",
"agroportal",
"ecoportal",
"cropoct",
),
)
Expand Down Expand Up @@ -850,6 +854,9 @@ def get_homepage(self) -> Optional[str]:
"ncbi",
"cellosaurus",
"cropoct",
"bioportal",
"agroportal",
"ecoportal",
),
)

Expand Down

0 comments on commit 431e460

Please sign in to comment.