-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taxa without classification #1761
Comments
Are you interested in local classifications (http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXONOMY_SOURCE) or any classification? |
I did this to get names which have no local classifications:
temp_no_has_no_lcl_clsfcn.csv.zip Multiple classifications within a local source is:
temp_mple_class_in_src2.csv.zip The hierarchical tool is the best place to find and fix gaps, assuming this is heading towards hierarchical data - but maybe some of this will get the party started anyway.
temp_has_lcl_no_kingdom.csv.zip
temp_has_lcl_no_phylum.csv.zip
temp_has_lcl_no_family.csv.zip
|
Dusty, I started with the "has no family" list (temp_has_lcl_no_family.csv.zip) and many of them have no family because they're a higher level classification. See Acavoidea or Animalia. Is there a way to delete the higher level superfamilies, orders, etc from this list? |
Could we also get a list of all classifications with nothing in the author_text field? |
This is where having taxon-rank as an actual field vs. a magical behind-the-scenes made up one would be helpful to everyone.... |
Probably, but I'll let you do that - I added a "term_rank" column.
temp_has_lcl_no_family(1).csv.zip
If you can see a use for storing that with classifications, add a taxon_term to the code table and go for it! |
Also: please talk to me before investing any work into these things. There are limits to what I can safely update out of context, and I don't want you to invest a bunch of work into something that ultimately can't do whatever you're trying to do. |
First priority is to get classifications added for names without them, then to fill in blanks in existing classifications. The rest will come later... |
This is great, thanks Dusty. It really gives me a better sense of the scope of the problem. I agree that those lacking classifications all-together is where we should start. @dustymc , what do you mean "limits to what I can safely update out of context"? Maybe a better question is, what can we do that is most useful for cleaning up the taxonomy issues? What requires human eyeballs and brains that just can't be resolved by the hierarchical tool or some other equally impressive script? |
The update issue mostly involves undiscovered homonyms, and we've mostly discovered them by making huge messes. You have Some clam without a family, we update from the genus-string because consistency is critical, but eventually figure out that Some butterfly, Some ant, Some lichen and Some thingelse are now pretending to be clams. I'm not sure what exactly I can safely update. Anything at the level of genus-or-so involving strings is dangerous. Moving families around is PROBABLY less dangerous, but I wouldn't bet on it. (And even that only works when there's something more than strings involved - Diptera turns out to be a genus of plants as well as an order of insects, a fact I discovered in the usual way.) The hierarchical tool isn't particular magical by itself, but finding Diptera in two places in the source data will probably produce some sort of outlier (the details depend on order of encounter, which is unpredictable), and human eyeballs are really good at detecting those. For many of these, we'll also have data from GlobalNames and perhaps we can draw from that. GN tries to assert SOMETHING - feed it Some obscure-fossil-clam and you'll get classifications for Some even if they have nothing to do with clams - so that's sort of dangerous to use in asserting authorities, but perhaps it can be useful anyway. I did this...
to get family data from GN for the first few no-family records.
So Draba hyperborea is probably a plant, Athysanus is a hemihomonym, Amaurocichla bocagii looks like a bird but who knows which family is "current" for the rest of the genus (many are mixed because #1698), Murex recurvirostris rubidus looks a whole lot like a snail but GN has very limited data on lots of things and it's a less well-known term than say, Draba, so I wouldn't put TOO much faith into the apparent consensus, etc. In any case, anything you can see from eg, http://arctos.database.museum/name/Draba%20hyperborea is fairly accessible, and I'm happy to write scripts to assemble that in any way ya'll find useful. Eg Draba hyperborea has a bunch of relationships - perhaps those can be used to de-prioritize. Sorry for the rambling wall of text and the lack of specifics. I hope there's something useful in there! |
@acdoll Are you going to split up the list of classifications that need work? |
@dustymc I have a bunch of author names for Troglodytes that I took from ITIS. Can you magic these in, or are we going to have to edit them one by one? |
For classifications without a kingdom, I am working on Aaptolasma by playing with the hierarchical editor. @dustymc can we not add a new root to a tree? For all of these which need a kingdom, that would be helpful. |
I just finished adding classifications to Goniobasis (132 taxa) and to Calliostoma (51 taxa). I found an issue which I think is the same as Teresa described above. I was never able to change the classification in the tool. Because they aren't attached to a higher classification, they weren't in my "clams" upload although they are mollusca. If you load Goniobasis (which I did), you can't edit Goniobasis to move it to a new famiily because it's the root. So I edited the genus Goniobasis with a complete classification and reloaded that genus and children but got the same error message. So I tried to load the family Pleuroceridae but, of course, only got the Goniobasis that had been already reclassified to that family in Arctos and not the ones without a classification above genus. Sorry I didn't take any screen clips to attach but I'm sure this will happen again with the list of "no higher classification" terms we're all working on. Ultimately, I created the csv from scratch. Any hints for how to handle this problem? |
I found a huge ball of synonyms and valid/invalid taxa when working with the Aaptolasma. I haven't yet created the relationships between the no longer "accepted" Aaptolasma and the "accepted" Hexelasma that I added by using the taxon name bulkload tool, then the classification bulkload tool. None of this was easy or intuitive and I am not sure that I know exatly which relationships to create because the relationship values aren't very clear. It doesn't appear that anyone is using any of these taxa right now, so it is a bit of an effort in futility, but I feel safe editing them for the same reason. I am really just trying to make what is in Arctos match the information in WoRMS, which seems like a huge waste of time. I really hope we can work out a way to integrate WoRMS into Arctos because we are spending way too much mental energy re-creating the wheel. On that note, we could use a bulkload tool for adding taxon relationships. |
What does hasspace mean as a taxon rank? And can you confirm the CSV that you recommend that we use. Teresa, I was working on a different one from what you posted on DropBox. |
I have no idea and which of the above files are you looking at? I only put the "no_kingdom" file in the Dropbox. |
It doesn't matter which one we use, but it is very likely that anything missing a family is also missing a kingdom. If we start with the missing kingdom file and work our way down I think it will help eliminate any overlap. |
It would be best if we all just work from the missing kingdom file in the Dropbox. Here is the link: https://www.dropbox.com/s/tgtyqrnauah3za7/Taxa%20with%20no%20kingdom.xlsx?dl=0 |
Sounds fine. Will do.
…On Mon, Nov 26, 2018 at 3:43 PM Teresa Mayfield-Meyer < ***@***.***> wrote:
It would be best if we all just work from the missing kingdom file in the
Dropbox.
Here is the link:
https://www.dropbox.com/s/tgtyqrnauah3za7/Taxa%20with%20no%20kingdom.xlsx?dl=0
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1761 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AOqArRylaOnar8p0VELIr3zWNWMONr7Hks5uzG59gaJpZM4X5OlI>
.
|
Dusty, I started to work through the taxa with multiple classifications. It's a helpful list as for several months, I didn't realize I had to delete the first classification if I added a better one. But a lot of them have only one Arctos classification and one Arctos Plant classification. Is that a problem for you? If not, can you write code to eliminate those? Thx. |
If you can stuff it in it's not a problem for me, and hemihomonyms in different classifications should not be a problem for anyone. What's the scope of this - can I just ignore Arctos Plants (for now)? |
As far as I'm concerned, you can ignore Arctos Plants for purposes of this report (multiple classifications), but I think the rest of the committee needs to weigh in on the scope of the taxonomy issues they want/need to tackle. |
|
Thanks. That took out 620 records. Only 2251 to go! |
Above, Dusty distributed a list of taxa that have multiple classifications.
I took a look at it and have corrected all the mollusca taxa and others I'm familiar with, but there are over 2,000 remaining. I have added a descriptor (mammalia, insecta, etc.) to make them easier to find. Can those of you with more knowledge in other taxa try to clean these up so only true homonyms remain? I only found three taxa that are actually homonyms. Most that I corrected just had two classifications because someone (alas, me) forgot to delete the inaccurate one. I've uploaded it to our Dropbox as "Taxa with Multiple Classifications 30 Nov 2018.csv." Dusty, most of those remaining are Insecta and many of them have two classifications, both from CoL 2011 checklist on the same day a few hours apart and the only difference is a more robust classification on one of them. Perhaps there's a way to do a mass correction rather than manually. |
@dustymc , I would like to refine the list of classifications without a Kingdom if possible.
This list is really long and we won't get it cleaned up in any reasonable time, the two bits of information above would help us to focus on names currently in use and to distribute the effort by asking collections to work on the classifications that pertain to them. Any help would be greatly appreciated! |
|
Now that WoRMS is in Arctos, how does that affect the list? |
Which one? It's changing daily now (as stuff in WoRMS changes). I can rerun specific things if you want. (I can rerun everything too, but it's probably changing faster than you can deal with it.) Here's Kingdom.
temp_has_lcl_no_kingdom_2.csv.zip I think this matters less if we have data from GN, so I did...
That's very small (4334 records) and low-quality (NOBODY knows kingdom), but I suspect a fair number of them aren't in GN because they aren't real taxa so it might be a ridiculous amount of work to sort out anyway. I looked at one random record (http://arctos.database.museum/name/Microsorium%20tuanense) and https://archive.org/stream/mobot31753002047071/mobot31753002047071_djvu.txt is the only google result - probably a week (or a lifetime!) of work to figure out what's REALLY going on there. (My first baseless guess involves OCR mistakes....) |
@dustymc Would it be possible to add the kingdom "Animalia" to any classification missing a kingdom, but with the phylum listed as any of the following? Acanthocephala And add Kingdom "Plantae" for any classification missing a kingdom, but with the phylum listed as any of the following? Marchantiophyta I don't know if this will clean up much, but it will (hopefully) cut down on the number of classifications that we need to follow up on. Thanks! |
I updated around 24K records. Here's a new kingdomless list. |
HAHAHAHA! 137K classification with no kingdom. I'm about to just throw in the towel.... |
A suggestion. I took the last list and ran 999 of them against the WoRMS Match Taxa tool. Many were invalid but more than half came up with a valid classification. If, for example, I then go to one on the list Romanocidaris dietli I find that there is no Arctos classification but there is a WoRMS (via Arctos) classification and the aphia ID of 1034447. I had to refresh to get the entire classification, but it's there. This relates to the second item in #1936
If my sample is in anyway representative, we could eliminate many of the taxa without kingdom and link them to automatic update - unless the collection managers using the taxa don't want this. Dusty, can you write a program to clone the WoRMS (via Arctos) classification into Arctos for all taxa without an Arctos classification? Then we can see what's left that needs personal attention. |
Let's have that discussion in #1936 |
Teresa, I don't have your email to reference, but not long ago, you referred us to the "Taxa in use missing kingdom" spreadsheet in DropBox. I've been working on the 202 used by DMNS:Inv today and so far found only one without classification and kingdom - Megalobulimus capillaceus - which now has a WoRMS aphiaID (WoRMS added it March 23, 2019) Several of them aren't in Arctos or WoRMS (via ARctos) so I may have already deleted them. (Acmaea notata, Acteon punctocoelata, Amoria volva.) Others show a kingdom - Anadara nodifera. Some don't have an Arctos classification but they do have an Arctos Relationship classification with a kingdom - Anomia adamas. Lastly, some have two classifications in Arctos Relationships, none in Arctos but one in WoRMS (via Arctos) and all classifications have Animalia as the kingdom - Barbatia lima. In a few cases, World Register of Marine Species shows Biota (but not as a kingdom). It's really helpful to have the list be sorted by which collection is using it, but so far, I'm hitting deadends on all but one. Can you and Dusty reconfirm exactly what this list is? I searched our DMNS:Inv collection for Kingdom = null, got 15 records and cleaned up taxa on those. I did the same for Family = null, and only got 15 records which we cannot identify beyond Order so they will never have a family. What am I missing? |
@sharpphyl I was cleaning stuff in alpha order so I may have gotten to stuff with "A" and I think the list may have been generated before the DMNS:Inv move to Arcots via WoRMS. Because you are using Arctos via WoRMS, I would expect the names used in DMNS:Inv to have a classification so I don't think you are missing anything and you probably don't need to do anything right now. It is all of us in Arctos and Arctos Plants that are missing things. (Just because there is a classification in Arctos via WoRMS, if there isn't one in Arctos and that is my source, it will show up without a classification....) I don't think we should be spending too much time on this right now. First priority is to figure out where taxonomy is going. Also, I don't even know what "Arctos Relationships" is although I see it now occasionally as I add or clean up taxa, so I cannot answer anything about that. |
Yea might be correct
Can you explain why you deleted those? They look useful to me.
#983, https://github.com/ArctosDB/arctos/issues/1880 |
Good question, Dusty, and it gets to the heart of the primary purpose of our taxonomic tables. Acteon punctocoelata. This is an invalid taxon that is not in WoRMS. The "correct" invalid taxon is Acteon punctocaelatus (alphiaID 1333309) which is invalid and now accepted as Rictaxis punctocaelatus. Both are in WoRMS (via Arctos). That doesn't mean that Acteon punctocoelata doesn't show up on the Internet a lot, so purely for search purposes, it may make sense to leave it in our tables as an invalid taxonomic name. But it isn't truly a synonym (or WoRMS doesn't think so) for the valid taxon. Your resource does list this as a historic synonym for Acteon punctocoelatus (Acteon punctocoelata Arnold, R. (1903) so WoRMS may just be hitting the high spots with their synonyms. Amoria volva is similar. Per WoRMS, the "correct" invalid name is Voluta volva (aphiaID 385519) which is now accepted as Amoria maculata. WoRMS doesn't list Amoria volva as ever being in the taxonomic queue. Your reference lists it as Voluta (Amoria) volva. So the question is whether the primary function of our taxon name tables (with the new "fuzzy" feature) is to cast as wide a net as possible during search or whether they are to be the most accurate tables for Arctos members use when adding specimens or updating identifications. Arctos is perfectly happy to accept Acteon punctocoelata as a taxon name during bulk loading or when our volunteers are entering individual records thereby propagating the error further on the internet and into GBIF etc. If Arctos said "whoa! that's not valid" whenever an invalid taxon was entered, I'd be more accepting of leaving "garbage taxa" in the tables. For our collection, we don't try to enter every historic name as the identification. Instead we include it in the remarks as the legacy ID. That way, we don't need to add non-WoRMS taxon names except for (we think valid) terrestrial species or groups that WoRMS doesn't yet address. But, yes, it does make it harder to find species by using an invalid or misspelled name. One approach is for me to add both names back in to Arctos but not into WoRMS (via Arctos). If it is inadvertently used in our collection, there will be no classification for that record and it's easy for me to find the inaccurate record. I'd like to keep such terms out of WoRMS (via Arctos), at least until another collection selects it as their source and wants to add historic (but possibly misspelled, "never been exactly that taxon name," non-WoRMS) taxa. Input from others on the Taxonomic Committee would be helpful. Thanks. |
My interests definitely lean way over towards getting users where they need to be. The "new fuzzy feature" is just new locally - GN has been doing something similar for a while, they're just not very good at it. (Neither am I, I'm sure, but maybe they get some stuff I miss and vise-versa.) It's a way to find existing stuff, that's it - it introduces no new names and changes nothing other than making some new links. "whoa! that's not valid" for CLASSIFICATIONS is why most of http://arctos.database.museum/info/ctDocumentation.cfm?table=CTTAXON_TERM exists. "whoa! that's not valid" for NAMES is at the top of http://handbook.arctosdb.org/documentation/taxonomy.html. I see no evidence that those aren't 'valid' (=published or used in "scientific literature") names, and as always I don't (much) care what you do with them in classifications. Among other things, names bring in GlobalNames data which may get users where they want to be even if "we" don't have any useful data. "Local" classifications can help too, but there's not much in our data that's not also in something else IF WE HAVE THE NAME.
I think that's true of everything, starting with the Codes, and it actively makes it difficult to find anything that uses "alternative" names.
There are 249684 identifications which use one or the other name in my recent "fuzzy list." I don't think ya'll are going to sort through 60K names, pick out the real synonyms, and apply them to a quarter-million specimens. Without that or those relationships, which can't exist if someone's deleted a name, some (most??) users just don't find what they're looking for. |
I'd like to make a request, on behalf of the Taxonomy Committee, for a few summary lists that we can work on to improve the overall classification structure. Could I please get a list (or the SQL to generate such a list) of all the taxa that are not tied to a classification tree? I will take this list and divvy it up among the Taxonomy Committee and we'll try to make the appropriate classifications. If possible, we would also address any taxa that may have holes in their classification (e.g. they have phylum, class and order, but no family). I imagine this might be a tougher ask, but if you can get us a list we'll try to fix it. Finally, we'd also like to identify any taxa with multiple classifications (e.g. one's a fish and one's a plant). At a minimum, we'll try to add author names to these IDs to aid in their differentiation. Thanks.
The text was updated successfully, but these errors were encountered: