Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFO Plant List - correcting links and IDs #133

Open
rogerhyam opened this issue Dec 5, 2024 · 2 comments
Open

WFO Plant List - correcting links and IDs #133

rogerhyam opened this issue Dec 5, 2024 · 2 comments

Comments

@rogerhyam
Copy link

rogerhyam commented Dec 5, 2024

I'm not sure how the ingest is done but it would good if we could have some changes made to the WFO data in the public instance of GNA.

Currently if I do a search like this:

https://verifier.globalnames.org/?all_matches=on&capitalize=on&ds=196&format=json&names=Rhododendron+ponticum

The results have a matchedNameID as a UUID. We do have our own 10 digit name identifiers and it would be better if they were used. They are described on this page:

https://list.worldfloraonline.org/

The outlink in the returned data is also broken

outlink": "http://www.worldfloraonline.org/taxon/wfo-0000400178-2023-12",

We describe how the stable URIs for names and taxa work on this page

https://list.worldfloraonline.org/sw_index.php

For example that one should be https://list.worldfloraonline.org/wfo-0000400178

These URIs handle semantic web content negotiation correctly.

I would recommend always using the IDs and links to names (ten digits without the name on the end) as they will always redirect to the current placement of that name and only use the classification specific 16 digit versions if the taxon concept used is particularly important.

This is a little confusing because the GNA is matching names but takes into account whether they are accepted names or not so loading new classifications will change that. But linking to names is best.

We will have a new data release on the December solstice so January might be a good time to make any changes.

Happy to help.

@dimus
Copy link
Member

dimus commented Jan 23, 2025

Thanks for letting us know @rogerhyam, moving this to gnverifier (it was in gnparser issues)

At the bottom in GNverifier the scientific names are just name-strings, and each of them have UUID v5 generated from the name-string itself with 'globalnames.org' used to generate the seed. This allows to have unified ID (that does not need lookup) for all Data Sources. The WFO TaxonID is returned in RecordID field.

Thanks for spotting the change in the outlink, I will change it when WFO data will be updated to December release (should happen pretty soon).

It is a good point about having connection to the name id, It would require some thought how to do it consistent for all Data Sources. Names to my understanding can be considered on lexical, nomenclatural and taxonomical levels. I imagine your names are provided on the nomenclatural level?

@dimus dimus transferred this issue from gnames/gnparser Jan 23, 2025
@dimus
Copy link
Member

dimus commented Jan 23, 2025

@rogerhyam I made an SQLite exchange file for WFO December 2024. It is going to be the source of GNverifier's WFO data. It would be great if you can find time to look at it and tell if you spot any problems.

The file is SQLite database that follows closely CoLDP format (enums are tables, and their IDs for user-friendliness are capitalized and underscored strings)

http://opendata.globalnames.org/sfga-v0.3/196-wfo-2024-12-21.sqlite.zip

probably the simplest way is to use https://sqlitebrowser.org/

with something like

wget http://opendata.globalnames.org/sfga-v0.3/196-wfo-2024-12-21.sqlite.zip && \
unzip 196-wfo-2024-12-21.sqlite.zip && \
sqlitebrowser  196-wfo-2024-12-21.sqlite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants