Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WoRMS names to Arctos #1936

Closed
Jegelewicz opened this issue Feb 20, 2019 · 31 comments
Closed

Add WoRMS names to Arctos #1936

Jegelewicz opened this issue Feb 20, 2019 · 31 comments
Labels
Function-Taxonomy/Identification Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@Jegelewicz
Copy link
Member

Jegelewicz commented Feb 20, 2019

@ArctosDB/taxonomy requests the following to help fill in taxonomy gaps in Arctos taxonomy

  1. Add any names in WoRMS, but not in Arctos along with their WoRMS classifications and aphia IDs to Arctos

  2. For any names in Arctos with NO associated classification that have a classification in WoRMS, add the WoRMS classification to Arctos along with the aphia ID

  3. For any names not used by any collection in Arctos that have a classification in Arctos, convert the Arctos classification to WoRMS and add aphia ID

@Jegelewicz Jegelewicz added Priority-High (Needed for work) High because this is causing a delay in important collection work.. Function-Taxonomy/Identification labels Feb 20, 2019
@dustymc
Copy link
Contributor

dustymc commented Feb 21, 2019

What problem are we trying to solve?

(1) seems like it's trying to recreate the problem we solved by moving DMNS:Inv to Worms.

(2) seems destined for confusion. WoRMS classifications are periodically updated by aphiaia; Arctos is not, so if we do that it's likely to lead to a disassociation between the name and the ID (which doesn't do anything there).

(3) requires magic or I'm missing something really important - where would I get an aphiaid??

@Jegelewicz
Copy link
Member Author

Jegelewicz commented Feb 21, 2019

Attempting to fill in missing names and classifications in Arctos (both Anna and I have had to clone stuff in from WoRMS via Arcto)s, so if we go ahead and populate Arctos with missing stuff, that would save us a bit of work.

@Jegelewicz
Copy link
Member Author

  1. requires magic or I'm missing something really important - where would I get an aphiaid??

Just asking that the aphia ID be brought into Arctos along with the classification.

@campmlc
Copy link

campmlc commented Feb 21, 2019

This would only be for names that are currently either not in use or that are in use but lack a classification. This would allow Arctos users access to the updated taxonomy for these names as needed without having to create them. This would certainly help MSB:Para and would be a valuable addition as we bring in new collections, including paleo and MSB mollusc collections. Aphia IDs would only be associated with taxa/classifications that were brought in de novo from WoRMS - not with any existing names that currently have classifications in Arctos. And the AphiaIDs could then allow those classifications/names to be updated regularly from WoRMS.

We discussed this yesterday with the Taxonomy Committee and all agreed this would be a big improvement. WoRMS is constantly adding and improving their taxonomy with committees of experts in each relevant taxonomic group. Very few Arctos collections are willing/able to use WoRMS as an exclusive taxonomy source, as Denver Inverts is doing. However, we could all benefit from having updated WoRMS taxon names and classifications available in Arctos taxonomy.

In the case of birds, while WoRMS does have an Aves classification, no Arctos taxa that already have a classification would be affected. If MVZ or others find that the WoRMS update has pulled in taxa that are not in Arctos, they are easily identified by the AphiaID and either modified or deleted. We would also want the classification metadata to reflect the WoRMS source.

@dustymc
Copy link
Contributor

dustymc commented Feb 21, 2019

not in use

Those will be included in updates via the hierarchical editor.

are in use but lack a classification

There are 1713 of these. I don't think it's a problem to push the WoRMS classification across, after we resolve #1939.

-- used names
create table temp_u_n as select distinct taxon_name_id from identification_taxonomy;
-- names for which there's a local classificaiton
create table temp_hlc as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants');
-- drop the stuff that's used and for which we have a local classification
delete from temp_u_n where taxon_name_id in (select taxon_name_id from temp_hlc);
-- worms classifications
create table temp_hlwc as select distinct taxon_name_id from taxon_term where source = 'WoRMS (via Arctos)';
  -- these are used, don't have an Arctos classification, do have WoRMS
create table temp_u_w_nl as select taxon_name_id from temp_u_n where taxon_name_id in (select taxon_name_id from temp_hlwc);
select guid_prefix || ' @ ' || count(*) from collection, cataloged_item,identification,identification_taxonomy,temp_u_w_nl where
collection.collection_id=cataloged_item.collection_id and cataloged_item.collection_object_id=identification.collection_object_id and identification.identification_id=identification_taxonomy.identification_id and identification_taxonomy.taxon_name_id=temp_u_w_nl.taxon_name_id group by guid_prefix order by guid_prefix;
GUID_PREFIX||'@'||COUNT(*)
------------------------------------------------------------------------------------------------------------------------
CHAS:Bird @ 17
CHAS:Ento @ 54
CHAS:Fish @ 1
CHAS:Inv @ 3835
DMNS:Inv @ 536
HWML:Para @ 871
MSB:Fish @ 5
MSB:Para @ 21
MVZ:Bird @ 2
MVZ:Egg @ 1
OWU:Fish @ 3
UAM:Alg @ 38
UAM:ES @ 4
UAM:Fish @ 2
UAM:Inv @ 33
UAMb:Herb @ 19
UCM:Bird @ 1
UCM:Fish @ 9
UCM:Herp @ 2
UMNH:Herp @ 3
UMZM:Bird @ 8
UNM:ES @ 10
UNR:Fish @ 25
USNPC:Para @ 1
UTEP:Herb @ 14
UTEP:Herp @ 4
UTEP:Inv @ 158
UTEP:Teach @ 3
UTEPObs:Herp @ 1
UWYMV:Fish @ 1

30 rows selected.

AphiaIDs

Let's discuss that in #1939

no Arctos taxa that already have a classification would be affected

I think this is only part of the picture. AFAIK MVZ isn't limiting their efforts to things they've already used, they're also getting the things they might use.

@Jegelewicz
Copy link
Member Author

If MVZ is the sticking point, we need someone from MVZ to contribute to this conversation.

@ccicero @mkoo @atrox10

@dustymc
Copy link
Contributor

dustymc commented Feb 21, 2019

I think anyone who's ever tried to make classification data consistent is the sticking point. I'm just using MVZ as an example because I know they've put a lot of work into that; they're not necessarily the only ones to have done so.

@Jegelewicz
Copy link
Member Author

I'm pretty sure that Phyllis, me, and MVZ have been the users of the hierarchical tool. Phyllis doesn't care about Arctos taxonomy anymore, so I think it would be MVZ who would have a problem right now. I can't speak for the future, but if leaving the aphia ID out of the hierarchical tool export will help, I am completely OK with that because it means someone cares enough about a group of taxa to make them all have consistent and complete classifications.

@anna-chinn
Copy link

CHAS:Inv @ 3835

Woof! If nothing else, I'm in favour of pushing all of WoRMS' molluscan classification to the Arctos source!

@sharpphyl
Copy link

Need to get a written ok from everyone using anything in mollusca in WoRMS before moving forward - if Dusty can separate out the mollusca in Arctos. In the past, I had to check with you, Teresa (aka Elizabeth Walsh?), Andres Lopez and Dustin Perriguey. Dusty can probably confirm if there's anyone new or that I missed in the past.

@Jegelewicz
Copy link
Member Author

From Taxonomy Committee today:

@dustymc

For any names in Arctos with NO associated classification that have a classification in WoRMS, add the WoRMS classification to Arctos.

@dustymc dustymc added this to the Next Task milestone Mar 20, 2019
@dustymc
Copy link
Contributor

dustymc commented Apr 2, 2019

For any names in Arctos with NO associated classification that have a classification in WoRMS, add the WoRMS classification to Arctos.

This would be a slightly messier mess, I think - many of the WoRMS classifications are plants-n-such. Should we push them to Arctos Plants instead? Does anyone using the source want them? Scripts are slowly running now - I'll have numbers in a few days, we can discuss then.

Scripts are in taxonomyCrusade.sql

/remind me to check on the scripts in 2 days

@reminders reminders bot added the reminder label Apr 2, 2019
@reminders
Copy link

reminders bot commented Apr 2, 2019

@dustymc set a reminder for Apr 4th 2019

@sharpphyl
Copy link

Yes, many of the WoRMS names are Plantae and should go to Arctos Plants and not Arctos. I think that was our intent when we made the request, but committee members using both Arctos Plants and Arctos may feel differently.

@Jegelewicz
Copy link
Member Author

Yes, please if we can put the Plantae in Arctos Plants that would be most useful.

@dustymc
Copy link
Contributor

dustymc commented Apr 3, 2019

Here are the data. I'll need these (summary below) mapped to another Source. I don't really want to spread the outright garbage, not sure what to do about that....

UAM@ARCTOS> select worms_kingdom, count(*) from temp_has_worms_no_arctos group by worms_kingdom;

WORMS_KINGDOM
------------------------------------------------------------------------------------------------------------------------
  COUNT(*)
----------
Biota incertae sedis
	26

Plantae
     41079

Protozoa
      1149

Chromista
    133607

Viruses
       558

Archaea
       232

Animalia
    433001

Bacteria
      3350

Monera
	 1

NOT_RECORDED
      1529

Fungi
      2701


11 rows selected.

Elapsed: 00:00:00.14
UAM@ARCTOS> 

temp_has_worms_no_arctos.csv.zip

@reminders reminders bot removed the reminder label Apr 4, 2019
@reminders
Copy link

reminders bot commented Apr 4, 2019

👋 @dustymc, check on the scripts

@dustymc
Copy link
Contributor

dustymc commented Apr 17, 2019

The results of the script is above; I just beat the reminderbot to it.

Should I copy Plantae to Arctos Plants and Animalia to Arctos and worry about the rest if someone ever comes up with a reason to?

@campmlc
Copy link

campmlc commented Apr 17, 2019 via email

@dustymc
Copy link
Contributor

dustymc commented Apr 18, 2019

Scripts are running to push Animalia to Arctos.

/remind me to check back in 5 days

@reminders reminders bot added the reminder label Apr 18, 2019
@reminders
Copy link

reminders bot commented Apr 18, 2019

@dustymc set a reminder for Apr 23rd 2019

@reminders reminders bot removed the reminder label Apr 23, 2019
@reminders
Copy link

reminders bot commented Apr 23, 2019

👋 @dustymc, check back

@dustymc
Copy link
Contributor

dustymc commented Apr 23, 2019

animals are done, plants is running.

/remind me 24 hours

@reminders
Copy link

reminders bot commented Apr 23, 2019

@dustymc we had trouble parsing your reminder. Try:

/remind me [what] [when]

@dustymc
Copy link
Contributor

dustymc commented Apr 23, 2019

/remind me to check back tomorrow

@reminders reminders bot added the reminder label Apr 23, 2019
@reminders
Copy link

reminders bot commented Apr 23, 2019

@dustymc set a reminder for Apr 24th 2019

@reminders reminders bot removed the reminder label Apr 24, 2019
@reminders
Copy link

reminders bot commented Apr 24, 2019

👋 @dustymc, check back

@dustymc
Copy link
Contributor

dustymc commented Apr 24, 2019

Plants are done; closing.

@dustymc
Copy link
Contributor

dustymc commented Aug 21, 2019

reopening; need to run this periodically to catch new names

@dustymc dustymc reopened this Aug 21, 2019
@campmlc
Copy link

campmlc commented Nov 12, 2019

Adding to this issue - how do we get WoRMS to create reciprocal links back to Arctos? They do this for GenBank,USNM, Yale Peabody, etc. See "Cestoda" and go to Links at bottom of Worms page:

To Biological Information System for Marine Life (BISMaL)
To European Nucleotide Archive (ENA)
To Genbank
To Yale Peabody Museum of Natural History (YPM IZ 092606)
To ITIS

@dustymc
Copy link
Contributor

dustymc commented Nov 12, 2019

Ask them?

And new issue please - this one needs closed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Taxonomy/Identification Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
None yet
Development

No branches or pull requests

5 participants