Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxa without classification #1761

Closed
acdoll opened this issue Oct 24, 2018 · 41 comments
Closed

taxa without classification #1761

acdoll opened this issue Oct 24, 2018 · 41 comments
Assignees
Labels
Function-Taxonomy/Identification Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.

Comments

@acdoll
Copy link

acdoll commented Oct 24, 2018

I'd like to make a request, on behalf of the Taxonomy Committee, for a few summary lists that we can work on to improve the overall classification structure. Could I please get a list (or the SQL to generate such a list) of all the taxa that are not tied to a classification tree? I will take this list and divvy it up among the Taxonomy Committee and we'll try to make the appropriate classifications. If possible, we would also address any taxa that may have holes in their classification (e.g. they have phylum, class and order, but no family). I imagine this might be a tougher ask, but if you can get us a list we'll try to fix it. Finally, we'd also like to identify any taxa with multiple classifications (e.g. one's a fish and one's a plant). At a minimum, we'll try to add author names to these IDs to aid in their differentiation. Thanks.

@dustymc
Copy link
Contributor

dustymc commented Oct 24, 2018

Are you interested in local classifications (http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXONOMY_SOURCE) or any classification?

@dustymc
Copy link
Contributor

dustymc commented Oct 25, 2018

I did this to get names which have no local classifications:


create table temp_has_lcl_classification as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants');

create table temp_no_has_no_lcl_clsfcn as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_classification);

temp_no_has_no_lcl_clsfcn.csv.zip

Multiple classifications within a local source is:


create table temp_mple_class_in_src as select source, taxon_name_id, classification_id from taxon_term where source in ('Arctos','Arctos Plants')  having count(*) > 1 group by source, taxon_name_id, classification_id;

create table temp_mple_class_in_src2 as select * from taxon_name where taxon_name_id in (select taxon_name_id from temp_mple_class_in_src having count(*)>1 group by taxon_name_id);

temp_mple_class_in_src2.csv.zip

The hierarchical tool is the best place to find and fix gaps, assuming this is heading towards hierarchical data - but maybe some of this will get the party started anyway.

create table temp_has_lcl_has_kingdom as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='kingdom';
create table temp_has_lcl_no_kingdom as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_kingdom);

temp_has_lcl_no_kingdom.csv.zip

  create table temp_has_lcl_has_phylum as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='phylum';
create table temp_has_lcl_no_phylum as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_phylum);

temp_has_lcl_no_phylum.csv.zip




  create table temp_has_lcl_has_order as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='order';
create table temp_has_lcl_no_order as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_order);

temp_has_lcl_no_order.csv.zip

  create table temp_has_lcl_has_class as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='class';
create table temp_has_lcl_no_class as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_class);

temp_has_lcl_no_class.csv.zip

  create table temp_has_lcl_has_family as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='family';
create table temp_has_lcl_no_family as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_family);

temp_has_lcl_no_family.csv.zip

create table temp_has_lcl_pco_nf as select * from taxon_name where 
  taxon_name_id in (select taxon_name_id from temp_has_lcl_has_phylum) and
  taxon_name_id in (select taxon_name_id from temp_has_lcl_has_class) and
  taxon_name_id in (select taxon_name_id from temp_has_lcl_has_order) and
  taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_family)
  ;

temp_has_lcl_pco_nf.csv.zip

@sharpphyl
Copy link

Dusty, I started with the "has no family" list (temp_has_lcl_no_family.csv.zip) and many of them have no family because they're a higher level classification. See Acavoidea or Animalia.

Is there a way to delete the higher level superfamilies, orders, etc from this list?

@Jegelewicz
Copy link
Member

Could we also get a list of all classifications with nothing in the author_text field?

@Jegelewicz
Copy link
Member

Is there a way to delete the higher level superfamilies, orders, etc from this list?

This is where having taxon-rank as an actual field vs. a magical behind-the-scenes made up one would be helpful to everyone....

@dustymc
Copy link
Contributor

dustymc commented Oct 25, 2018

delete the higher level

Probably, but I'll let you do that - I added a "term_rank" column.


alter table temp_has_lcl_no_family add term_rank varchar2(255);
update temp_has_lcl_no_family set term_rank='hasspace' where scientific_name like '% %';
begin
  for r in (select * from temp_has_lcl_no_family where term_rank is null and rownum<2000) loop
    begin
      update temp_has_lcl_no_family set term_rank=(select term_type from taxon_term where POSITION_IN_CLASSIFICATION is not null and source in ('Arctos','Arctos Plants') and term=r.scientific_name) where scientific_name=r.scientific_name;
    exception when others then
      update temp_has_lcl_no_family set term_rank='lookupfail' where scientific_name=r.scientific_name;
    end;
  end loop;
end;
/

temp_has_lcl_no_family(1).csv.zip

nothing in the author_text

  create table temp_has_lcl_has_at as select distinct taxon_name_id from taxon_term where source in ('Arctos','Arctos Plants') and term_type='author_text';
create table temp_has_lcl_no_at as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_at);

temp_has_lcl_no_at.csv.zip

This is where having taxon-rank as an actual field vs. a magical behind-the-scenes made up one would be helpful to everyone....

If you can see a use for storing that with classifications, add a taxon_term to the code table and go for it!

@dustymc
Copy link
Contributor

dustymc commented Oct 25, 2018

Also: please talk to me before investing any work into these things. There are limits to what I can safely update out of context, and I don't want you to invest a bunch of work into something that ultimately can't do whatever you're trying to do.

@Jegelewicz
Copy link
Member

First priority is to get classifications added for names without them, then to fill in blanks in existing classifications. The rest will come later...

@acdoll
Copy link
Author

acdoll commented Oct 26, 2018

This is great, thanks Dusty. It really gives me a better sense of the scope of the problem. I agree that those lacking classifications all-together is where we should start. @dustymc , what do you mean "limits to what I can safely update out of context"? Maybe a better question is, what can we do that is most useful for cleaning up the taxonomy issues? What requires human eyeballs and brains that just can't be resolved by the hierarchical tool or some other equally impressive script?

@dustymc
Copy link
Contributor

dustymc commented Oct 26, 2018

The update issue mostly involves undiscovered homonyms, and we've mostly discovered them by making huge messes. You have Some clam without a family, we update from the genus-string because consistency is critical, but eventually figure out that Some butterfly, Some ant, Some lichen and Some thingelse are now pretending to be clams.

I'm not sure what exactly I can safely update. Anything at the level of genus-or-so involving strings is dangerous. Moving families around is PROBABLY less dangerous, but I wouldn't bet on it. (And even that only works when there's something more than strings involved - Diptera turns out to be a genus of plants as well as an order of insects, a fact I discovered in the usual way.)

The hierarchical tool isn't particular magical by itself, but finding Diptera in two places in the source data will probably produce some sort of outlier (the details depend on order of encounter, which is unpredictable), and human eyeballs are really good at detecting those.

For many of these, we'll also have data from GlobalNames and perhaps we can draw from that. GN tries to assert SOMETHING - feed it Some obscure-fossil-clam and you'll get classifications for Some even if they have nothing to do with clams - so that's sort of dangerous to use in asserting authorities, but perhaps it can be useful anyway.

I did this...

declare
  t varchar2(255);
begin
  for r in (select * from temp_has_lcl_no_family where rownum<10) loop
    dbms_output.put_line(r.scientific_name);
    for pf in (select term, count(*) c from taxon_term where taxon_name_id=r.taxon_name_id and term_type='family' group by term) loop
      dbms_output.put_line('    Possible family: ' || pf.term || ' @ ' || pf.c);
    end loop;
  end loop;
end;
/

to get family data from GN for the first few no-family records.


Draba hyperborea
    Possible family: Brassicaceae @ 6
Athysanus
    Possible family: Cruciferae @ 1
    Possible family: Cicadellidae @ 6
    Possible family: Brassicaceae @ 6
Thesium
    Possible family: Santalaceae @ 8
    Possible family: Staphylinidae @ 5
    Possible family: Thesieae @ 1
    Possible family: Thesiaceae @ 2
Polyplacophora
    Possible family: Polyplacophora @ 1
Amaurocichla bocagii
    Possible family: Motacillidae @ 3
    Possible family: Sylviidae @ 5
Murex recurvirostris rubidus
    Possible family: Muricidae @ 5
Aphyosemion gulare gulare
    Possible family: Aplocheilidae @ 1
    Possible family: Nothobranchiidae @ 1
Aphyosemion gulare coeruleum
    Possible family: Aplocheilidae @ 1
    Possible family: Nothobranchiidae @ 1
Aphyosemion calliurum calliurum
    Possible family: Aplocheilidae @ 1
    Possible family: Nothobranchiidae @ 3

So Draba hyperborea is probably a plant, Athysanus is a hemihomonym, Amaurocichla bocagii looks like a bird but who knows which family is "current" for the rest of the genus (many are mixed because #1698), Murex recurvirostris rubidus looks a whole lot like a snail but GN has very limited data on lots of things and it's a less well-known term than say, Draba, so I wouldn't put TOO much faith into the apparent consensus, etc.

In any case, anything you can see from eg, http://arctos.database.museum/name/Draba%20hyperborea is fairly accessible, and I'm happy to write scripts to assemble that in any way ya'll find useful. Eg Draba hyperborea has a bunch of relationships - perhaps those can be used to de-prioritize.

Sorry for the rambling wall of text and the lack of specifics. I hope there's something useful in there!

@dustymc dustymc added this to the Active Development milestone Oct 30, 2018
@Jegelewicz
Copy link
Member

@acdoll Are you going to split up the list of classifications that need work?

@Jegelewicz
Copy link
Member

@dustymc I have a bunch of author names for Troglodytes that I took from ITIS. Can you magic these in, or are we going to have to edit them one by one?

Troglodytes Authors.zip

@Jegelewicz
Copy link
Member

Jegelewicz commented Nov 20, 2018

For classifications without a kingdom, I am working on Aaptolasma by playing with the hierarchical editor.

@dustymc can we not add a new root to a tree? For all of these which need a kingdom, that would be helpful.

@sharpphyl
Copy link

I just finished adding classifications to Goniobasis (132 taxa) and to Calliostoma (51 taxa).
The Goniobasis only had a genus and species in Arctos. It looks like someone else also played with that genus in the Hierarchy Tool (I see an upload with that name) but can't figure out who was working on it.

I found an issue which I think is the same as Teresa described above. I was never able to change the classification in the tool. Because they aren't attached to a higher classification, they weren't in my "clams" upload although they are mollusca. If you load Goniobasis (which I did), you can't edit Goniobasis to move it to a new famiily because it's the root. So I edited the genus Goniobasis with a complete classification and reloaded that genus and children but got the same error message. So I tried to load the family Pleuroceridae but, of course, only got the Goniobasis that had been already reclassified to that family in Arctos and not the ones without a classification above genus. Sorry I didn't take any screen clips to attach but I'm sure this will happen again with the list of "no higher classification" terms we're all working on. Ultimately, I created the csv from scratch. Any hints for how to handle this problem?

@Jegelewicz
Copy link
Member

I found a huge ball of synonyms and valid/invalid taxa when working with the Aaptolasma. I haven't yet created the relationships between the no longer "accepted" Aaptolasma and the "accepted" Hexelasma that I added by using the taxon name bulkload tool, then the classification bulkload tool. None of this was easy or intuitive and I am not sure that I know exatly which relationships to create because the relationship values aren't very clear. It doesn't appear that anyone is using any of these taxa right now, so it is a bit of an effort in futility, but I feel safe editing them for the same reason. I am really just trying to make what is in Arctos match the information in WoRMS, which seems like a huge waste of time. I really hope we can work out a way to integrate WoRMS into Arctos because we are spending way too much mental energy re-creating the wheel.

On that note, we could use a bulkload tool for adding taxon relationships.

@sharpphyl
Copy link

What does hasspace mean as a taxon rank? And can you confirm the CSV that you recommend that we use. Teresa, I was working on a different one from what you posted on DropBox.

@Jegelewicz
Copy link
Member

What does hasspace mean as a taxon rank?

I have no idea and which of the above files are you looking at? I only put the "no_kingdom" file in the Dropbox.

@sharpphyl
Copy link

Well, I think I'm working on this one - not the very first one.

screen shot 2018-11-26 at 2 47 27 pm

screen shot 2018-11-26 at 2 48 18 pm

Dusty, can you confirm which list we should use?

BTW, I found the list of taxa with two classifications very helpful and I'm cleaning all those up if they're mollusca.

screen shot 2018-11-26 at 2 57 44 pm

@Jegelewicz
Copy link
Member

It doesn't matter which one we use, but it is very likely that anything missing a family is also missing a kingdom. If we start with the missing kingdom file and work our way down I think it will help eliminate any overlap.

@Jegelewicz
Copy link
Member

It would be best if we all just work from the missing kingdom file in the Dropbox.

Here is the link: https://www.dropbox.com/s/tgtyqrnauah3za7/Taxa%20with%20no%20kingdom.xlsx?dl=0

@sharpphyl
Copy link

sharpphyl commented Nov 27, 2018 via email

@sharpphyl
Copy link

Dusty, I started to work through the taxa with multiple classifications. It's a helpful list as for several months, I didn't realize I had to delete the first classification if I added a better one. But a lot of them have only one Arctos classification and one Arctos Plant classification. Is that a problem for you? If not, can you write code to eliminate those? Thx.

@dustymc
Copy link
Contributor

dustymc commented Nov 27, 2018

If you can stuff it in it's not a problem for me, and hemihomonyms in different classifications should not be a problem for anyone.

What's the scope of this - can I just ignore Arctos Plants (for now)?

@sharpphyl
Copy link

As far as I'm concerned, you can ignore Arctos Plants for purposes of this report (multiple classifications), but I think the rest of the committee needs to weigh in on the scope of the taxonomy issues they want/need to tackle.

@dustymc
Copy link
Contributor

dustymc commented Nov 27, 2018


create table temp_mple_class_in_src_np as select source, taxon_name_id, classification_id from taxon_term where source in ('Arctos')  having count(*) > 1 group by source, taxon_name_id, classification_id;


create table temp_mple_class_in_src_np2 as select * from taxon_name where taxon_name_id in (select taxon_name_id from temp_mple_class_in_src_np having count(*)>1 group by taxon_name_id);

temp_mple_class_in_src_np2.csv.zip

@sharpphyl
Copy link

Thanks. That took out 620 records. Only 2251 to go!

@sharpphyl
Copy link

Above, Dusty distributed a list of taxa that have multiple classifications.

  • Multiple classifications within a local source is:

  • create table temp_mple_class_in_src as select source, taxon_name_id, classification_id from taxon_term where source in ('Arctos','Arctos Plants') having count(*) > 1 group by source, taxon_name_id, classification_id;

  • create table temp_mple_class_in_src2 as select * from taxon_name where taxon_name_id in (select taxon_name_id from temp_mple_class_in_src having count(*)>1 group by taxon_name_id);

  • temp_mple_class_in_src2.csv.zip

I took a look at it and have corrected all the mollusca taxa and others I'm familiar with, but there are over 2,000 remaining. I have added a descriptor (mammalia, insecta, etc.) to make them easier to find. Can those of you with more knowledge in other taxa try to clean these up so only true homonyms remain? I only found three taxa that are actually homonyms. Most that I corrected just had two classifications because someone (alas, me) forgot to delete the inaccurate one.

I've uploaded it to our Dropbox as "Taxa with Multiple Classifications 30 Nov 2018.csv."

Dusty, most of those remaining are Insecta and many of them have two classifications, both from CoL 2011 checklist on the same day a few hours apart and the only difference is a more robust classification on one of them. Perhaps there's a way to do a mass correction rather than manually.

screen shot 2018-11-30 at 3 25 51 pm

@Jegelewicz
Copy link
Member

@dustymc ,

I would like to refine the list of classifications without a Kingdom if possible.

  1. I'd like to know which of these names are not currently in use by any collection.
  2. I'd also like to know which collections are using the ones that are in use.

This list is really long and we won't get it cleaned up in any reasonable time, the two bits of information above would help us to focus on names currently in use and to distribute the effort by asking collections to work on the classifications that pertain to them.

Any help would be greatly appreciated!

@dustymc
Copy link
Contributor

dustymc commented Dec 10, 2018

alter table temp_has_lcl_no_kingdom add used_by varchar2(255);

update temp_has_lcl_no_kingdom set used_by='nobody' where TAXON_NAME_ID not in (select taxon_name_id from identification_taxonomy);
declare
	ub varchar2(255);
begin
	for r in (select * from temp_has_lcl_no_kingdom where used_by is null) loop
		select LISTAGG(guid_prefix, '|') WITHIN GROUP (ORDER BY guid_prefix) into ub from (
			select distinct
				guid_prefix
			from
				collection,
				cataloged_item,
				identification,
				identification_taxonomy
			where
				collection.collection_id=cataloged_item.collection_id and
				cataloged_item.collection_object_id=identification.collection_object_id and
				identification.identification_id=identification_taxonomy.identification_id and
				identification_taxonomy.taxon_name_id =r.taxon_name_id
		);
		update temp_has_lcl_no_kingdom set used_by=ub where taxon_name_id=r.taxon_name_id;
	end loop;
end;
/

temp_has_lcl_no_kingdom(1).csv.zip

@Jegelewicz
Copy link
Member

Now that WoRMS is in Arctos, how does that affect the list?

@dustymc
Copy link
Contributor

dustymc commented Jan 16, 2019

the list

Which one? It's changing daily now (as stuff in WoRMS changes). I can rerun specific things if you want. (I can rerun everything too, but it's probably changing faster than you can deal with it.)

Here's Kingdom.


create table temp_has_lcl_has_kingdom_2 as select distinct taxon_name_id from taxon_term where source in (select source from cttaxonomy_source) and term_type='kingdom';

create table temp_has_lcl_no_kingdom_2 as select * from taxon_name where taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_kingdom_2);

temp_has_lcl_no_kingdom_2.csv.zip

I think this matters less if we have data from GN, so I did...



-- stuff with GN data
create table temp_idign as select distinct taxon_name_id from taxon_term where source not in (select source from cttaxonomy_source);

-- stuff with local data
create table temp_idilcl as select distinct taxon_name_id from taxon_term where source in (select source from cttaxonomy_source);

-- stuff with ONLY local data
create table temp_id_o_lcl as select distinct taxon_name_id from temp_idilcl where taxon_name_id not in (select taxon_name_id from temp_idign);

-- stuff with only local data, and no kingdom
create table temp_lcl_only_no_k as select taxon_name_id,scientific_name from taxon_name where taxon_name_id in (
select taxon_name_id from temp_id_o_lcl
) and taxon_name_id not in (select taxon_name_id from temp_has_lcl_has_kingdom_2);

temp_lcl_only_no_k.csv.zip

That's very small (4334 records) and low-quality (NOBODY knows kingdom), but I suspect a fair number of them aren't in GN because they aren't real taxa so it might be a ridiculous amount of work to sort out anyway. I looked at one random record (http://arctos.database.museum/name/Microsorium%20tuanense) and https://archive.org/stream/mobot31753002047071/mobot31753002047071_djvu.txt is the only google result - probably a week (or a lifetime!) of work to figure out what's REALLY going on there. (My first baseless guess involves OCR mistakes....)

@Jegelewicz
Copy link
Member

@dustymc Would it be possible to add the kingdom "Animalia" to any classification missing a kingdom, but with the phylum listed as any of the following?

Acanthocephala
Acoelomorpha
Annelida
Arthropoda
Brachiopoda
Bryozoa
Chaetognatha
Chordata
Cnidaria
Ctenophora
Cycliophora
Echinodermata
Entoprocta
Gastrotricha
Gnathostomulida
Hemichordata
Kinorhyncha
Loricifera
Micrognathozoa
Mollusca
Nematoda
Nematomorpha
Nemertea
Onychophora
Orthonectida
Phoronida
Placozoa
Platyhelminthes
Porifera
Priapulida
Rhombozoa
Rotifera
Sipuncula
Tardigrada
Xenoturbellida

And add Kingdom "Plantae" for any classification missing a kingdom, but with the phylum listed as any of the following?

Marchantiophyta
Anthocerotophyta
Bryophyta
Filicophyta
Sphenophyta
Cycadophyta
Ginkgophyta
Pinophyta
Gnetophyta
Magnoliophyta

I don't know if this will clean up much, but it will (hopefully) cut down on the number of classifications that we need to follow up on.

Thanks!

@dustymc
Copy link
Contributor

dustymc commented Feb 15, 2019

I updated around 24K records. Here's a new kingdomless list.

temp_has_lcl_no_kingdom_2(1).csv.zip

@Jegelewicz
Copy link
Member

HAHAHAHA! 137K classification with no kingdom. I'm about to just throw in the towel....

@Jegelewicz Jegelewicz added the Priority-Normal (Not urgent) Normal because this needs to get done but not immediately. label Mar 15, 2019
@sharpphyl
Copy link

A suggestion. I took the last list and ran 999 of them against the WoRMS Match Taxa tool. Many were invalid but more than half came up with a valid classification. If, for example, I then go to one on the list Romanocidaris dietli I find that there is no Arctos classification

Screen Shot 2019-03-16 at 8 09 40 AM

but there is a WoRMS (via Arctos) classification and the aphia ID of 1034447. I had to refresh to get the entire classification, but it's there.

Screen Shot 2019-03-16 at 8 10 40 AM

This relates to the second item in #1936

  1. For any names in Arctos with NO associated classification that have a classification in WoRMS, add the WoRMS classification to Arctos along with the aphia ID

If my sample is in anyway representative, we could eliminate many of the taxa without kingdom and link them to automatic update - unless the collection managers using the taxa don't want this.

Dusty, can you write a program to clone the WoRMS (via Arctos) classification into Arctos for all taxa without an Arctos classification? Then we can see what's left that needs personal attention.

WoRMS match of 999 arbitrary taxa without kingdom.xlsx

@dustymc
Copy link
Contributor

dustymc commented Mar 18, 2019

Let's have that discussion in #1936

@sharpphyl
Copy link

Teresa, I don't have your email to reference, but not long ago, you referred us to the "Taxa in use missing kingdom" spreadsheet in DropBox. I've been working on the 202 used by DMNS:Inv today and so far found only one without classification and kingdom - Megalobulimus capillaceus - which now has a WoRMS aphiaID (WoRMS added it March 23, 2019)

Several of them aren't in Arctos or WoRMS (via ARctos) so I may have already deleted them. (Acmaea notata, Acteon punctocoelata, Amoria volva.) Others show a kingdom - Anadara nodifera. Some don't have an Arctos classification but they do have an Arctos Relationship classification with a kingdom - Anomia adamas. Lastly, some have two classifications in Arctos Relationships, none in Arctos but one in WoRMS (via Arctos) and all classifications have Animalia as the kingdom - Barbatia lima. In a few cases, World Register of Marine Species shows Biota (but not as a kingdom).

It's really helpful to have the list be sorted by which collection is using it, but so far, I'm hitting deadends on all but one. Can you and Dusty reconfirm exactly what this list is?

I searched our DMNS:Inv collection for Kingdom = null, got 15 records and cleaned up taxa on those. I did the same for Family = null, and only got 15 records which we cannot identify beyond Order so they will never have a family. What am I missing?

@Jegelewicz
Copy link
Member

@sharpphyl I was cleaning stuff in alpha order so I may have gotten to stuff with "A" and I think the list may have been generated before the DMNS:Inv move to Arcots via WoRMS. Because you are using Arctos via WoRMS, I would expect the names used in DMNS:Inv to have a classification so I don't think you are missing anything and you probably don't need to do anything right now. It is all of us in Arctos and Arctos Plants that are missing things. (Just because there is a classification in Arctos via WoRMS, if there isn't one in Arctos and that is my source, it will show up without a classification....)

I don't think we should be spending too much time on this right now. First priority is to figure out where taxonomy is going. Also, I don't even know what "Arctos Relationships" is although I see it now occasionally as I add or clean up taxa, so I cannot answer anything about that.

@sharpphyl
Copy link

Good question, Dusty, and it gets to the heart of the primary purpose of our taxonomic tables.

Acteon punctocoelata. This is an invalid taxon that is not in WoRMS. The "correct" invalid taxon is Acteon punctocaelatus (alphiaID 1333309) which is invalid and now accepted as Rictaxis punctocaelatus. Both are in WoRMS (via Arctos). That doesn't mean that Acteon punctocoelata doesn't show up on the Internet a lot, so purely for search purposes, it may make sense to leave it in our tables as an invalid taxonomic name. But it isn't truly a synonym (or WoRMS doesn't think so) for the valid taxon. Your resource does list this as a historic synonym for Acteon punctocoelatus (Acteon punctocoelata Arnold, R. (1903) so WoRMS may just be hitting the high spots with their synonyms.

Amoria volva is similar. Per WoRMS, the "correct" invalid name is Voluta volva (aphiaID 385519) which is now accepted as Amoria maculata. WoRMS doesn't list Amoria volva as ever being in the taxonomic queue. Your reference lists it as Voluta (Amoria) volva.

So the question is whether the primary function of our taxon name tables (with the new "fuzzy" feature) is to cast as wide a net as possible during search or whether they are to be the most accurate tables for Arctos members use when adding specimens or updating identifications. Arctos is perfectly happy to accept Acteon punctocoelata as a taxon name during bulk loading or when our volunteers are entering individual records thereby propagating the error further on the internet and into GBIF etc. If Arctos said "whoa! that's not valid" whenever an invalid taxon was entered, I'd be more accepting of leaving "garbage taxa" in the tables.

For our collection, we don't try to enter every historic name as the identification. Instead we include it in the remarks as the legacy ID. That way, we don't need to add non-WoRMS taxon names except for (we think valid) terrestrial species or groups that WoRMS doesn't yet address. But, yes, it does make it harder to find species by using an invalid or misspelled name.

One approach is for me to add both names back in to Arctos but not into WoRMS (via Arctos). If it is inadvertently used in our collection, there will be no classification for that record and it's easy for me to find the inaccurate record. I'd like to keep such terms out of WoRMS (via Arctos), at least until another collection selects it as their source and wants to add historic (but possibly misspelled, "never been exactly that taxon name," non-WoRMS) taxa.

Input from others on the Taxonomic Committee would be helpful. Thanks.

@dustymc
Copy link
Contributor

dustymc commented Mar 25, 2019

My interests definitely lean way over towards getting users where they need to be.

The "new fuzzy feature" is just new locally - GN has been doing something similar for a while, they're just not very good at it. (Neither am I, I'm sure, but maybe they get some stuff I miss and vise-versa.) It's a way to find existing stuff, that's it - it introduces no new names and changes nothing other than making some new links.

"whoa! that's not valid" for CLASSIFICATIONS is why most of http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM exists. "whoa! that's not valid" for NAMES is at the top of http://handbook.arctosdb.org/documentation/taxonomy.html. I see no evidence that those aren't 'valid' (=published or used in "scientific literature") names, and as always I don't (much) care what you do with them in classifications.

Among other things, names bring in GlobalNames data which may get users where they want to be even if "we" don't have any useful data. "Local" classifications can help too, but there's not much in our data that's not also in something else IF WE HAVE THE NAME.

WoRMS may just be hitting the high spots

I think that's true of everything, starting with the Codes, and it actively makes it difficult to find anything that uses "alternative" names.

try to enter every historic name as the identification

There are 249684 identifications which use one or the other name in my recent "fuzzy list." I don't think ya'll are going to sort through 60K names, pick out the real synonyms, and apply them to a quarter-million specimens. Without that or those relationships, which can't exist if someone's deleted a name, some (most??) users just don't find what they're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Taxonomy/Identification Priority-Normal (Not urgent) Normal because this needs to get done but not immediately.
Projects
None yet
Development

No branches or pull requests

4 participants