Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AGRO-AGROVOC mappings #83

Merged
merged 7 commits into from
May 23, 2022
Merged

Add AGRO-AGROVOC mappings #83

merged 7 commits into from
May 23, 2022

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Dec 14, 2021

This PR adds predicted mappings between AGRO labels/synonyms and the English preferred terms from AGROVOC. The SPARQL query could be extended to use other alternative labels if desired.

@KrishnaTO can you do a sanity check - I'm surprised the number of suggestions is quite low

CC @KrishnaTO

The only thing I need to update is the ingestion pipeline for OWL since it doesn't currently accept AGRO's RDF/XML
@KrishnaTO
Copy link
Contributor

A quick check shows good matches, wherever labels match or def exists.

This is a great tool! Do you know if the AGROVOC synonyms are currently checked? Not to mention, our ontology also has lots of common labels as synonyms instead of the rdfs:labels.

@cthoyt
Copy link
Member Author

cthoyt commented Dec 14, 2021

maybe it wasn't clear from the description but all of the labels/synonyms from AGRO are considered. Currently only the preferred labels from AGROVOC are considered. That could be updated if you think it's important.

@KrishnaTO
Copy link
Contributor

Got it, compared to the previous mappings, some terms have changed between either one of the sources.

As well, AGROVOC seems to use "ENTRY TERMS" for synonyms (example), so there may be some mods there.

But so happy with this tool, thanks @cthoyt !

@KrishnaTO
Copy link
Contributor

@cthoyt How did you bypass the agro parsing error?
I was attempting to run the "scripts/generate_agrovoc_mappings.py" file locally, and can't figure out how you parsed AGRO?

File "/.../.local/lib/python3.8/site-packages/pronto/parsers/rdfxml.py", line 182, in _compact_datatype
    raise ValueError(f"invalid datatype: {iri!r}")
ValueError: invalid datatype: 'http://www.w3.org/2000/01/rdf-schema#anyURI'

Note: I do have the dev versions of pyobo, pystow, and biomappings you updated. Did you mod pronto as well?

@cthoyt
Copy link
Member Author

cthoyt commented Dec 16, 2021

@KrishnaTO I just went through PyStow and PyOBO to make sure there's a a PyPI release that's sufficient to make this work, so you can just get the new releases from pip uninstall pystow pyobo bioregistry && pip install --upgrade pystow bioregistry pyobo. Let me know if that fixes it - otherwise the error you showed is because it's using an RDF/XML reader for AGRO, which might be because it's not finding the recently added OBO link

@KrishnaTO
Copy link
Contributor

It found the right path for AgrO after the mentioned reinstallations.

@KrishnaTO
Copy link
Contributor

@cthoyt Regarding the sanity check, AgrO has 1190 terms within its namespace (AGRO_), so 162 matches using label and synonyms (not sure if searches definitions?) isn't bad.
A lot terms may not have the correct synonyms in either database, so there are has to be additional manual curation work to add those.

The mentioned predictions were curated here with 142/162 positive exact matches. Let me know if want to merge and create a PR to this this branch.

@cthoyt cthoyt marked this pull request as ready for review May 23, 2022 08:12
@cthoyt cthoyt merged commit b62c63c into master May 23, 2022
@cthoyt cthoyt deleted the add-agrovoc-mappings branch May 23, 2022 08:13
@cthoyt
Copy link
Member Author

cthoyt commented May 23, 2022

@KrishnaTO I'm sorry I missed your comment - it's great you triaged these! I will definitely merge them in a separate PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants