-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force refresh of WoRMS (via Arctos) #3512
Comments
It should be understood that #3311 and talking directly to WoRMS' API are not incompatible. We can do both. #3311 could have two impacts on this.
The below should be set to refresh, that should happen in the next 800 minutes or so if nothing else pops up.
|
I'll check it later today. Thanks for the SQL. @Jegelewicz can we add this to our cheat sheet? @dustymc I'm not sure I understand what "translating" is so I leave that to your magic. |
I have code that tries to make WoRMS data align with https://arctos.database.museum/info/ctDocumentation.cfm?table=cttaxon_term. There are lots of gaps in various places in that process, and it's all hard-coded - when we add of change something the API call needs rebuilt, which often gets skipped. #3311 would (somehow) only require "our" terms for "locally-managed" taxonomy - it would stop you from typing "speeceez," but it would still accept that from "remote" sources, however we pull their data in. #3498 is a first-pass attempt at making those less-predictable data available from the catalog record. |
Added to cheat sheet as
|
@Jegelewicz Thanks for adding to the cheat sheet. It might be better understood as "refresh" as that's the (current) term. |
Changed! |
For clarity, that's just a select. Writing to cf_worms_refreshed calls for a refresh. These seem to have caught up. |
Sooooo - mere mortals can't really DO anything with it? |
So I just tried the SQL with Dicharax and it gave me a nice list of 73 records. I'll give them the same 800 minutes. But are you saying that I still have to manually refresh them or will this nudge them? |
Yup, no mortals. That's still just a select - nudging requires more access. That could probably be made more accessible, but it's just a symptom of some other problem so not sure how I feel about that.... Check back in 140 minutes. |
I think we're done here? |
No. We may be done with this specific issue since there doesn't seem to be a way to refresh more than one taxon name at a time. The bigger issue remains - WoRMS (via Arctos) is not what is in WoRMS (marinespecies.org) and what is in WoRMS (via Global Names) is not what is in WoRMS. Evidently we don't have the processing power to actually have the WoRMS database available within a reasonable time frame (1 week? 1 month?) in WoRMS (via Arctos) and using Global Names doesn't catch us up. I have the time to manage the taxonomy and it's not nearly as much work as before we added WoRMS (via Arctos), but we shouldn't promote Arctos as having the WoRMS taxonomy "built-in." Right now, for the genus Chamalycaeus, we have 47 names in the family Cyclophoridae (former), 59 in Alycaeidae (correct) and total of 106. WoRMS has 225 because they include those with a subgenus and we don't get any of them. Global names hasn't caught up with the change in Dicharax from Cyclophoridae to Alycaeidae. The last change was made in November 2020. If there's not much we can do about any of this, then, yes, we can close this issue. |
Thanks, reopening. I think much of this needs dedicated Issues - it's scattered around, we hit on symptoms here and there, but I don't think we really have a place to get at the core of the issue.
It's more than CPU - it needs time, probably ongoing.
A central question is if we can spend whatever we need on or through GN (which Arctos can easily talk to, and which would address related Issues in the future), or if we're just going to have to figure out how to maintain the connection ourselves, or ??? I don't think it's so much if but how we can best do this.
Set to refresh. (That could probably be an app, but again I'd really like to get at the core of the problem instead of just treating symptoms.)
There's an old issue that could be revived somewhere, from here this looks like a "research grade" problem - doing crazy "traditional" things to names and providing "research grade" data do not seem compatible to me. That doesn't mean we can't do something, but I'm not (yet, I hope) sure what that might be. "Get GlobalNames to figure it out" would be pretty cool.... |
@dustymc would it be possible to build a "little" WoRMS (via Arctos) widget so that people with manage_taxonomy could request a refresh in bulk, maybe at family or genus level? So Phyllis could just select: Please refresh all classifications in taxonomy source = WoRMS (via Arctos) with taxon_term = family and taxon_term_value = Taxon |
Yep that's "(That could probably be an app, but again I'd really like to get at the core of the problem instead of just treating symptoms.)" I think "core of the problem" is this: Best: Can we get WoRMS and GlobalNames to play nice? I know this is probably a daunting task, depending on what exactly "play nice" means to be useful, but it would make WoRMS data available to everyone (but again, maybe we are "everyone" as far as GN is concerned), make WoRMS work like everything else in Arctos, bla bla bla - good stuff. Not-so-best: If we're going with the "sustainable service" approach, I'd like to turn off the automation, or reduce the webservice to refreshing names from a list (which could be built via that new app). I suspect this is the most realistic, does that all sound right if so? I'm assuming the "get more resources" option is right out, but I'd really like to be wrong about that so if it's not please let me know! |
It isn't that they can't "play nice", it is that GN does not have the resources to do what we want. So
Might be a good choice. I will let @sharpphyl weigh in. |
Our collection would be well served if we only refreshed the phylum Mollusca - 148,813 names in WoRMS out of 576,574 names. We can manually update the few taxa we have in other phyla. Or update one class of the 8 in Mollusca each night on a rolling basis. That may not work for other collections using WoRMS (via Arctos) as their primary source. You know which collections would be impacted. I'll close #2808 if we can have the auto refresh function in days or weeks instead of many months. |
That's probably less sustainable than what we're doing now!
I don't want to set up any unrealistic expectations. I'm proposing a form which accepts two inputs
That will flag all records
to refresh. If you try with something too big, it'll time out. I don't know what "too big" means, and whatever it is now it won't be the same later. (I suspect it'll generally be happy with a few tens of thousands.) The refresh won't work within any particular timeframe or in any particular order, and whatever that turns out to be initially, it'll probably change. (Realistically, this is probably at least tens of thousands per day but I won't know until I can get it unwrapped from the other stuff.) You will almost certainly be able to feed that faster than it can process, in which case it might just refresh the same small group of names over and over. (But probably not, just trying to get all the limitations spelled out.) I don't think any of that will be in any way limiting, I just don't want this to seem like maybe it's going to be something it's not actually going to be.
The core of this is just UI to #3512 (comment) - send me data. (And I think the dev will be pretty quick, but I've also got a long list of things that I have to prioritize when they become available, so I don't want to promise anything I can't deliver on.) |
Sounds positive but leaves some questions. Here are some candidates for data to test. There is a new Family (rank) Chauvetiidae (term, aphiaID 1522424). I'm hoping refresh will update all the species within that family. It's tiny - only one genus Chauvetia and 54 species in this family. A bigger refresh would be the superfamily level (Buccinoidea) which contains Chauvetiidae and four other new families and lots of moves of genera from one family to another. The test should answer my question as to whether it can only refresh what's already in WoRMS (via Arctos) with an aphia ID or can it also add the new families and any other taxon names in that superfamily. For example, Retimohniidae (aphiaID 1522369) is another new family that I haven't yet added to WoRMS (via Arctos). I refreshed the genus Fusipagoda so it shows this as the family, but that family doesn't yet exist in WoRMS (via Arctos) so the higher classification is incomplete. Basically, how much manual maintenance will still need to be done or can the system find and update these terms within the term/rank we refresh? Let me know if you need more data to test. |
What's I've proposed is by-record, entirely within Arctos.
That's the entirety of the names where "family=Chauvetiidae according to WoRMS (via Arctos)." If some other names should be included, then we need a different query to find them, or something we can use to find them would need bulkloaded, or SOME source of additional information would need created and/or identified.
And someone should add "Lachesis" to the Official Nonexistent List Of Homonyms... |
These are all the Chauvetia (with aphiaID) already in Arctos that need to be refreshed - 57 names less the four that I already refreshed. WoRMS shows 54 direct children of Chauvetia plus the subspecies, so I think we have them all already in Arctos. A few genera have been refreshed - maybe with the nightly upload. The remainder are probably still in Buccinidae. Do you want to try your refresh code with Buccinidae? It sounds like we would have to use the "old" family rather than the new family in the code since you're working "entirely within Arctos." Chauvetia | 137702 Chauvetia decorata | 138883 - OK The Lachesis in WoRMS is aphiaid: 512176, author: Risso, 1826 and it's invalid now per WoRMS: "invalid: junior homonym of Lachesis Daudin, 1803 [Reptilia]; Donovania and Syntagma are replacement names." I added the author to the Arctos metadata for clarity. If you limit a taxon search to WoRMS (via Arctos) source, then you don't get the reptile ones and same if you search for Current Reptile Classification source. Is there a better way to alert users of the (now invalid) homonym status? |
='family'='Buccinidae' is set to run, but it's still mixed in with the existing code so who knows what will happen or when. I really need a decision here.
Meh, call it "for amusement purposes." The system is broken, there are thousands of these things, but somehow I'm still surprised when I find a strange critter wearing a familiar name....
And I probably just un-did that by calling for a refresh, which starts by deleting everything. "Automatically maintained" needs clarified. (Or maybe you meant the "Arctos" classification? That's potentially useful, but it's a BIG step away for many users, given that tens of thousands of "alternate classifications" refer to wildly different things rather than alternate views.) Relationships are taxon-level and survive anything we'll ever automate - that's the best/only way to add clarity.
If anyone's lost here they don't need to be told they're lost, they need a path out (and relationships do that). This is a great example - Google thinks Donovania is an STD-causing bacteria and Syntagma is a linguistic unit. If Arctos doesn't help, that might be a hard dead end for the casual user. |
Yes, I added the author to the Arctos (valid) classification since it's being used by a reptile collection. The WoRMS (via Arctos) is purely historical and invalid. Is there some way to leave it in for information (e.g., the remark "invalid: junior homonym of Lachesis Daudin, 1803 [Reptilia]; Donovania and Syntagma are replacement names") but mark it as "unusable." Or is it better to delete the WoRMS (via Arctos) classification totally and add that comment to the Arctos classification? This doesn't solve the homonym issue if multiple collections want to use the same term with different classifications but in this case the homonymy has been resolved in favor of the reptiles. If someone wants to use the invalid marine invertebrate version, they could use a string. Is there something the Taxonomy Committee can do about it or is it an Arctos structural issue? |
I don't think it's either, it's a "taxonomy is broken, it's hard not to notice at scale" issue.
"Some committee says pretend those publications don't exist" won't ever seem entirely synonymous with "resolved" to me...
I suspect whoever holds the type material wouldn't agree with that, but either way works fine in Arctos. |
Are you saying @sharpphyl is demanding? :-) |
Who me? Yep. @dustymc THANK YOU!!! I made my first refresh request. Teinostoma has moved from Tornidae to Teinostomidae. Does it take several hours or days or minutes to do the refresh? Any size limit you recommend? I'll check back in the morning. Thanks again. |
WooHoo! It worked. Thanks. I think I'm the only one in the demanding category, so I'll close this issue. "Nevertheless, she demanded" t-shirts will be available soon. |
I'll buy one! |
Yes, maybe! Lots of variability in there and the server is being very polite and does not need a t-shirt at the moment, but I suspect it's still faster than you can click - at least over a period of weeks. (Scripts don't sleep much....)
Nope - if it doesn't eat your browser it should be fine, if it does just break it into chunks (or ping me). Bigger batches will take longer, but still shouldn't plug the toobs. |
Most of the time we need just small chunks so I doubt I'll challenge my browser, but I'll let you know if I set my computer on fire. And, yes, it's definitely much faster than I can click. |
@dustymc
Are you able to force a refresh of a genus or family in WoRMS (via Arctos) at my request? The genus Alycaeus
is in the family Alycaeidae. But an unknown portion of the WoRMS (via Arctos) classifications of this genus (and Dicharax, etc.) are still in Cyclophoridae. The only way I've been able to get them all in Alycaeidae is to manually refresh the ones I'm using which leaves the WoRMS (via Arctos) inconsistent. I didn't know it until I printed labels and they showed up in two different families.
Here's an example.
I don't think the hierarchical tool works for externally managed sources. Can I force a refresh of a family or genus with some type of taxon bulkload without having to manually enter all the aphiaIDs and names?
This relates to #3311 whether to use WoRMS (via ARctos) - slow to update - or WoRMS (via GlobalNames) - only updated every 60 days and cannot be updated directly from WoRMS with aphiaID. But in the meantime, it would be helpful to not have to manually refresh (and add) WoRMS names and classifications.
The text was updated successfully, but these errors were encountered: