-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - attributes and search results options #6111
Comments
From the EH standpoint, the attributes that I use all the time and have posted on our website for search hints are:
Also useful or necessary from a legal/reporting standpoint are:
Sorry that I can't provide any "outside the box" suggestion, unless it's possible to correlate the collection designation with the attributes available to search. Seems silly that I should be able to search for things that are not available to my collection as per the attribute_type code table (e.g., why offer me axillary girth or gonad when I can't even add those attributes to my records!). |
I would think the download option would satisfy those needs better than anything we might do in a table, but please let me know if that's not correct, particularly if its incorrect in some way which might suggest a general solution.
That's definitely in 'there's an API...' territory - profiles and such can be used to shape the initial form, but some user who really wants to search for motorcycles with gonad data will find a way to do so in the shared UI.
For the sake of clarity, this request only involves result columns. Searching the structured data performs well, the two views mentioned above provide full access to the data, this just concerns columns in the results table. |
Sorry, I misunderstood! That said, from my own standpoint, i (and my users) rarely download our search results - we view and manipulate results in the web view, not a csv (mostly because there are no images in those files!). I do searches, customize my results (e.g. with value, culture of origin, culture of use, copyright status, NAGPRA category, etc.), sort the results, and print (on paper or to a PDF). (If i had some forms to export to, I might use the results table a bit differently. I wonder if I can schedule a consultation with someone who is an excellent form-creator?) |
I can get whatever you want in a table without worrying about any of the usual concerns of public forms - and then we can find someone to help with the glitter and shine. |
I'm not sure I understand exactly what you are suggesting. Eliminating the ability to search on different types of attributes? Eliminating the ability to add these attributes to results? Eliminating the saving of these attributes in FLAT? |
No, only eliminating the expensive/dynamic form of them. The full-data version is already cached, the download-table version is not related to this, this issue is a request to add. Definitely no.
No, the opposite: saving them in FLAT allows them to be added to results with near-zero back-end cost (but takes a bit of up-front planning, which I'm hoping to do here). For users, I think this would just mean that it would a few days (at most, eg if its used by millions of records - but those should all be caught here) to add some new attribute to results options. Once that's done, it would make them more accessible - I'd not need to restrict rowcount to protect Arctos. |
What does this look like in the results field?
|
Whatever we decide here, defaulting to whatever it is now. |
Are there mock-ups or specific suggestions? I know @AJLinn had some ideas on things that might help her. Did I get that right? |
Just turn some attributes on. It will be exactly that, unless I'm told something else.
"Don't" I suppose would be my suggestion at the moment - I keep hearing (and believing!) how metadata (method, determiner, etc.) is critically important, this is (probably) going to be bare values. If we all agree on that then this is easy.... |
For dynamic view: Column(s) for standard mammal measurements format; merging some values with details in methods (hind foot with claw for example, ear to notch etc) for RANGES Delete from the Results Profile the following once we have the above: |
column headers can use the standard conversion (Not Examined For --> not_examined_for), values can use standard concatattributevalue function (semicolon-separated value-only concatenation) @campmlc please confirm |
The suite of attributes for mammals which I think could be reduced to simple values (+units) for display in the search results include: sex, age class, total length, tail length, hind foot length, ear length, weight. The expansion of the reproductive data into many columns is also going to be an issue unfortunately because people are going to be working in those a lot as well during RANGES. |
In terms of removal of SNV data: Finally, please add a total records returned at the top of the search results like we used to have. This should give the full number of records a search should return, regardless of how many are going to be displayed or are possible based on the number attributes included. |
There is never a question of access, only instant access - which can potentially be addressed right here. I think the biggest current downside to caching is that adding a bunch of stuff is going to increase our refresh lag time, and that's been 'emotional' a few times lately. Longer-term we may hit structural limitations, but I don't think we're there now. Attributes are expensive because they require a function call, but that's particularly expensive because some attributes are encumbered. #3536 (comment) (some new category of always-encumbered things) is my only idea for mitigating that. Anyway - access:
I'm relatively sure that's in "nothing a $100K/month AWS account won't fix" territory (and if I had the power to do that then all the rest of this would be moot). |
email request @jldunnum here's your data Results/tools/request data |
Thanks Dusty!
This is a good case study for record/attribute cost. In this query I can only see and work with the first 500 records of 7,970 records returned. We need to brainstorm and find options to be able to work with all the records in a query like this.
…______________________________________________________________
Jonathan L. Dunnum Ph.D. (he, him, his)
Senior Collection Manager
Division of Mammals, Museum of Southwestern Biology
University of New Mexico
Albuquerque, NM 87131
(505) 277-9262
Fax (505) 277-1351
Chair, Systematic Collections Committee, American Society of Mammalogists
Latin American Fellowship Committee, ASM
MSB Mammals website: http://www.msb.unm.edu/mammals/index.html
Facebook: http://www.facebook.com/MSBDivisionofMammals
Shipping Address:
Museum of Southwestern Biology
Division of Mammals
University of New Mexico
CERIA Bldg 83, Room 204
Albuquerque, NM 87131
________________________________
From: dustymc ***@***.***>
Sent: Friday, May 19, 2023 4:13 PM
To: ArctosDB/arctos ***@***.***>
Cc: Jonathan Dunnum ***@***.***>; Mention ***@***.***>
Subject: Re: [ArctosDB/arctos] Feature Request - attributes and search results options (Issue #6111)
[EXTERNAL]
email request @jldunnum<https://github.com/jldunnum> here's your data
https://arctos.database.museum/search.cfm?attribute_type_1=examined%20for&attribute_value_1=virus%3A%20Orthohantavirus&country=Panama&sp=attrsandexam
Results/tools/request data
temp_catrecdata_FDBFCFBCF.csv.zip<https://github.com/ArctosDB/arctos/files/11520827/temp_catrecdata_FDBFCFBCF.csv.zip>
—
Reply to this email directly, view it on GitHub<#6111 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AED2PA5UIZ2U6NLFGJYRLYTXG7V7RANCNFSM6AAAAAAWXT6SQY>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Again, immediately and in one particular way - you can see them and work with them, just not as columns in an immediately-available table. This issue is (among other things) an offer to cache stuff in that immediately-available table, so let me know what you want to see. (There's a proposal begging confirmation above.) Anything beyond that probably can't be shoved into the results table, the remainder of this issue is an attempt to discover what is possible and will work for ya'll. |
Could we discuss this at the arctos meeting Thursday? Not being able to query for null attributes, or download in an easily digestable format, will really hamper basic work and the Ranges workflow. If we can't get a simple table to download, perhaps there is some Excel macro that could turn the single column attribute table into multiple columns? Surely this will be a thing that most Arctos users will need. |
Sorta related because I would think it might affect priority: There are a bunch of attribute-related timeouts in the logs, I think I'm going to have to make dynamic attributes more expensive. |
Can we get a summary of where we are and what is needed on this thread? |
#6518 would be a really great time to do this - can we act on this, like NOW? @jldunnum @mkoo help! I don't think there's a useful summary, there are many important details above, but here's a try: Users would like to add some |
This is possibly/probably on the list of to-do's for the interface web dev
I am trying to get working on triats/relationships for Arctos so use cases
and what output will satisfy the majority of users is great!
Let's put on agenda so I can have a clean list moving forward!
…On Wed, Jul 19, 2023 at 7:38 AM dustymc ***@***.***> wrote:
#6518 <#6518> would be a really
great time to do this - can we act on this, like NOW? @jldunnum
<https://github.com/jldunnum> @mkoo <https://github.com/mkoo> help!
I don't think there's a useful summary, there are many important details
above, but here's a try: Users would like to add some attribute_type-labeled
columns consisting of attribute_value=attribute_units[ ;
attribute_value=attribute_units] concatenations (like eg sex is
currently) to the cache, there's not a great candidate list and what there
is contains things that have been requested to change. Several days of CPU
are required for such additions, that's been causing some trouble at TACC
in addition to the usual of things not refreshing or loading during that
time, it would be *very* good to do whatever needs done all at once,
Friday is a rare opportunity to handle this during a scheduled rebuild.
—
Reply to this email directly, view it on GitHub
<#6111 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AATH7UKMOUOMMSVIK2ZRKETXQ7WPZANCNFSM6AAAAAAWXT6SQY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
That would still likely involve the API, which is where the (maybe not aggressive enough) throttle is. |
Everyone to comment here what is needed - also the newly requested reproductive attributes, so wait until then. |
For host/parasite searches: |
Should we create a Google Sheet with the currently cached attributes and then a list of those that are requested for addition? |
See first comment |
Can we put that in the shared drive? Maybe into this document - https://docs.google.com/spreadsheets/d/1HsacHm8HNCLiCrjFp-hKctu2q64k9ekLQfi7hsACmPw/edit?usp=sharing |
put it where you want, just maybe edit my comment if the link has to change |
I can only view only and can't see the comments. |
OK - I copied that stuff to the file in the shared drive and changed the link in the first comment. It will be easier to find there. |
Verbatim host ID, Verbatim host sex, life stage and location in host are already in the list (Google Sheet) |
done |
EDITEDIT: Discussed again https://docs.google.com/document/d/1IilZDCgOV0JbYV87QRmosk3pmXRAK-M1B2iUnLhrc3Q/edit#heading=h.q3pk226foene, will be finalized by 23rd of October @ewommack to coordinate (I think!).
@Jegelewicz the spreadsheet allows anonymous write access so I don't know who requested these, but I question some things
Can someone justify those (and whatever I missed)? Again no technical barriers to including them, just don't seem very in line with what was most recently discussed, I wonder if those were even "us"??
EDIT: Important stuff first, here's what's needed to make this acitonable:
END important stuff
Is your feature request related to a problem? Please describe.
Individual attributes are expensive and carry limited information. Find a better balance, somehow. First step is understanding what users need, so please comment!
Describe what you're trying to accomplish
Happy users and resiliant Arctos.
Describe the solution you'd like
[Edit from linked issues - the original clearly wasn't a sufficient description of the situation, trying again]
(moved to top)
[END EDIT]
Describe alternatives you've considered
Note that
attributedetail
carries full attributes including metadata - it is not necessary (nor possible) to keep hundreds (and growing) attributes as columns. Attributes considered for "columns" should be simple things that don't (much) need metadata, are (generally) are in 1:1 relationships with catalog records, are expected by a large audience, etc. Sex (used by millions of records, expected by many vert-type users) for example is a good candidate. luster (used by one record, see below) is probably not.Note also that a Download Attributes option exists, which provides full attribute information in a normalized (one attribute per row) tabular view.
If we do need something new in FLAT, it could be complex - eg some 'standard_mammal_attributes' column might provide the information needed in a better format, would eliminate ~5 individual columns, and would be about as CPU-expensive as the 5 columns.
Outside-the-box ideas are most welcome.
Additional context
Here are attributes by usage. We really need some sort of 'minimum usage' threshold or something, and several of the little-used things seem to be outright duplicates of other types ( year class===age class?? ). This should probably be a dedicated cleanup issue.
EDIT: spreadsheet: https://docs.google.com/spreadsheets/d/1HsacHm8HNCLiCrjFp-hKctu2q64k9ekLQfi7hsACmPw/edit?usp=sharing
Priority
Relatively high, there's some stability factor in this, but it's not (yet??) a panic situation. Needs reevaluated after the identifier-mess smoke clears.
The text was updated successfully, but these errors were encountered: