-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
report needed: taxa used in IDs by a collection which do not have a preferred classification #1894
Comments
It should work there without the trailing semicolon, and you'll need to put in your actual GUID_Prefix.
|
Thanks. It worked. |
@dustymc is the above possible? |
The error is from the semicolon and the next line. Here's the code by itself
It finds nothing as of now. Yes the purpose of this issue is to define and prioritize a report. |
Just turn that code into a tool in Low Quality Data? |
This report could use a modification or additional report. It is showing as low quality data 779 taxa without a preferred classification. Actually, these are the 779 taxa that are not in WoRMS (via Arctos) so they default to our second source Arctos. Most of them are fossils or terrestrials that WoRMS does not include. This is helpful to know how many taxa are using Arctos classifications. For the low quality data report, can we modify this to look for classifications from all listed sources before reporting that there is no preferred classification? |
From #3541 Suggest we change this to "missing from FLAT" which
|
Sorry, but what does "missing from FLAT" mean? Will the report show a list of taxa used in IDs that have no (preferred) classification? |
A classification might include a remark and nothing else; that would not be detected in a "taxon has classification" test. Etc., etc., etc. - "something there" is what I could potentially detect, that's it, and we know that plenty of classifications do in fact have problems. "Something in flat" is essentially "has a classification, and it does something useful."
finds nothing - all of your IDs found something to use for phylum, yay you!
For those, the scripts have gone through your preferred sources and couldn't find a family value. If any of those are families or below, there's something wrong with all of your preferred sources. If they're not (I think not?) then your preferred classifications supply something useful for all of your records. |
That helps. I always just search the collections periodically for records where the Kingdom (or phylum etc.) is NULL since SQL isn't my first language. It sounds like that's the best way for me for continue to look for problem records. Yes, the ones with a NULL family are known to be that way. We can't ID them to any lower level at this time. Also, this appears in my dashboard as bare_names. Will your "missing from flat" approach remove those from my problem list? DMNS:Inv | bare_names | #1894 | 780 taxa are used by DMNS:Inv and do not have a preferred classification. | {ts '2021-04-07 00:00:00'} |
That's the same data as what I'm thinking, just from a more taxa-centric viewpoint. Yes the Dashboard-->Bare Names thing needs rewritten; it's still working under the idea that a collection will prefer exactly one Source. Some of those are probably a legacy of that, and some may be names used in nonaccepted IDs. The classifications attached to those don't really DO anything so I don't think that's a problem; if it is, maybe it's at least a rare (or strange) enough problem that it can be dealt with via SQL on a case-by-case basis. |
@dustymc I think we can close this assuming that you have a rewrite of the Dashboard "Bare Names" (and same for the cheat sheet) on your To Do list. I'll leave closing it up to you. |
@dustymc the link for taxa without a classification in the something random thingee goes to this issue, but I think it should now go here - https://github.com/ArctosDB/documentation-wiki/blob/gh-pages/_sql_cheats/taxa_without_classification.markdown |
@dustymc the sql to find the taxa with "bare names" doesn't work as I planned. Let's look at ALMNH:EH When I use the code at the top of this issue, I get:
Which I figured was due to the fact there is no longer a single "preferred" classification. So I modified the SQL a bit and I thought it was working based on stuff I did last night, but when I run the revised SQL (which is here), I get no names Help make the SQL better? |
I can rewrite the query (I think!), but it would be very expensive and I don't think it can be made to run in the UI. Suggest checking computed terms in flat - #1894 (comment) |
Well, how does Arctos know there are two names with no classification? And if it knows that, why can't it tell me what they are? If we can't make it easy for people to fix "low quality data" we might as well not tell them there is any. |
Slowly. FLAT exists so the expensive computations don't have to happen at runtime. "Has a classification" and "has a useful classification" are wildly different things. Flat gets at "useful" (eg, has expected ranks), along with performing reasonably well. I don't understand the complaint - this is a more informative approach, yay everybody?! #1894 (comment) |
But the query above only looks for "phylum"? Nobody has time to check all the potential levels of every possible classification? There are plenty of classifications that are missing "phylum"... |
Keep a list as it goes? Is that asking too much? |
This approach just checks things that are used; I don't think there can be anything cleaner.
Some of them should be, some should not. Anyone running this probably recognizes the names that should have some term-at-rank but do not. That information isn't available in any other approach. I still obviously don't understand the problem.
Hu? |
Then why mention family?
|
Family is just one term/rank that some collections/disciplines seem to care about. I'd just replace family with other flat-terms and run multiple queries, but I'm not sure I understand what you're asking for here so that may not be useful. What precisely are the goals? This will get at (or very near) "no classification data" if that's the question.
|
That is way too much work to find two names with missing classifications.
The problem is that I am being told there are two names used as identifications in my collection (ALMNH:EH) but I have NO WAY to figure out what they are....
That works for this and I think would be good for most cases. I'll play with it an see if anything needs added. Thanks! |
Aha! That's where we should be starting - that report is almost certainly not working correctly (#1894 (comment)) - so 2 questions
|
Just what it says - "bare names" a name used as an identification that has no classification in any of the collection's preferred sources. |
for #1419
sql:
The text was updated successfully, but these errors were encountered: