Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request - attributes and search results options #6111

Closed
dustymc opened this issue Apr 8, 2023 · 46 comments
Closed

Feature Request - attributes and search results options #6111

dustymc opened this issue Apr 8, 2023 · 46 comments
Labels
Blocked Issue cannot be addressed until another Issue (which should be linked) is addressed. Enhancement I think this would make Arctos even awesomer! Let's talk! GH comments are exhausted; to be discussed in realtime

Comments

@dustymc
Copy link
Contributor

dustymc commented Apr 8, 2023

EDITEDIT: Discussed again https://docs.google.com/document/d/1IilZDCgOV0JbYV87QRmosk3pmXRAK-M1B2iUnLhrc3Q/edit#heading=h.q3pk226foene, will be finalized by 23rd of October @ewommack to coordinate (I think!).

@Jegelewicz the spreadsheet allows anonymous write access so I don't know who requested these, but I question some things

  • reproductive data and unformatted measurements are free-text attributes which can be very limiting
  • crown-rump length is used by 0.0004% of records
  • forearm length is used more than I thought but still not very common

Can someone justify those (and whatever I missed)? Again no technical barriers to including them, just don't seem very in line with what was most recently discussed, I wonder if those were even "us"??


EDIT: Important stuff first, here's what's needed to make this acitonable:

  • A precise description of what could be cached as columns, including sql-compliant headers/column names, for inclusion in search results, and/or
  • A precise description of what could be dynamically included, including sql-compliant headers/column names, for inclusion in search results, with the understanding that, with current resources, this will be extremely limited in live results, and will have significant limitations (hard to define, but something less than "all attributes for a large collection") in asynchronous data requests, and/or
  • Any other technically plausible proposal

END important stuff


Is your feature request related to a problem? Please describe.

Individual attributes are expensive and carry limited information. Find a better balance, somehow. First step is understanding what users need, so please comment!

Describe what you're trying to accomplish

Happy users and resiliant Arctos.

Describe the solution you'd like

[Edit from linked issues - the original clearly wasn't a sufficient description of the situation, trying again]

(moved to top)

[END EDIT]

Describe alternatives you've considered

Note that attributedetail carries full attributes including metadata - it is not necessary (nor possible) to keep hundreds (and growing) attributes as columns. Attributes considered for "columns" should be simple things that don't (much) need metadata, are (generally) are in 1:1 relationships with catalog records, are expected by a large audience, etc. Sex (used by millions of records, expected by many vert-type users) for example is a good candidate. luster (used by one record, see below) is probably not.

Note also that a Download Attributes option exists, which provides full attribute information in a normalized (one attribute per row) tabular view.

If we do need something new in FLAT, it could be complex - eg some 'standard_mammal_attributes' column might provide the information needed in a better format, would eliminate ~5 individual columns, and would be about as CPU-expensive as the 5 columns.

Outside-the-box ideas are most welcome.

Additional context

Here are attributes by usage. We really need some sort of 'minimum usage' threshold or something, and several of the little-used things seem to be outright duplicates of other types ( year class===age class?? ). This should probably be a dedicated cleanup issue.

EDIT: spreadsheet: https://docs.google.com/spreadsheets/d/1HsacHm8HNCLiCrjFp-hKctu2q64k9ekLQfi7hsACmPw/edit?usp=sharing

Priority

Relatively high, there's some stability factor in this, but it's not (yet??) a panic situation. Needs reevaluated after the identifier-mess smoke clears.

@dustymc dustymc added the Enhancement I think this would make Arctos even awesomer! label Apr 8, 2023
@dustymc dustymc added this to the Needs Discussion milestone Apr 8, 2023
@AJLinn
Copy link

AJLinn commented Apr 8, 2023

From the EH standpoint, the attributes that I use all the time and have posted on our website for search hints are:

  • description | 415479
  • culture of origin | 204630

Also useful or necessary from a legal/reporting standpoint are:

  • value | 23798
  • culture of use | 15545
  • NAGPRA category | 726
  • copyright status | 4467
  • materials | 606032
  • copyright status | 4467

Sorry that I can't provide any "outside the box" suggestion, unless it's possible to correlate the collection designation with the attributes available to search. Seems silly that I should be able to search for things that are not available to my collection as per the attribute_type code table (e.g., why offer me axillary girth or gonad when I can't even add those attributes to my records!).

@dustymc
Copy link
Contributor Author

dustymc commented Apr 8, 2023

Also useful or necessary from a legal/reporting standpoint are:

I would think the download option would satisfy those needs better than anything we might do in a table, but please let me know if that's not correct, particularly if its incorrect in some way which might suggest a general solution.

correlate the collection designation with the attributes available to search

That's definitely in 'there's an API...' territory - profiles and such can be used to shape the initial form, but some user who really wants to search for motorcycles with gonad data will find a way to do so in the shared UI.

search

For the sake of clarity, this request only involves result columns. Searching the structured data performs well, the two views mentioned above provide full access to the data, this just concerns columns in the results table.

@AJLinn
Copy link

AJLinn commented Apr 8, 2023

For the sake of clarity, this request only involves result columns.

Sorry, I misunderstood!

That said, from my own standpoint, i (and my users) rarely download our search results - we view and manipulate results in the web view, not a csv (mostly because there are no images in those files!). I do searches, customize my results (e.g. with value, culture of origin, culture of use, copyright status, NAGPRA category, etc.), sort the results, and print (on paper or to a PDF).

(If i had some forms to export to, I might use the results table a bit differently. I wonder if I can schedule a consultation with someone who is an excellent form-creator?)

@dustymc
Copy link
Contributor Author

dustymc commented Apr 9, 2023

schedule a consultation

Screenshot 2023-04-09 at 7 22 32 AM

Screenshot 2023-04-09 at 7 22 20 AM

https://github.com/ArctosDB/arctos/issues/new?assignees=lkvoong&labels=function-Reports&template=report-template-request.md&title=Arctos+Report+Template+Request

I can get whatever you want in a table without worrying about any of the usual concerns of public forms - and then we can find someone to help with the glitter and shine.

@ewommack
Copy link

I'm not sure I understand exactly what you are suggesting. Eliminating the ability to search on different types of attributes? Eliminating the ability to add these attributes to results? Eliminating the saving of these attributes in FLAT?

@dustymc
Copy link
Contributor Author

dustymc commented Apr 10, 2023

Eliminating the ability to search on different types of attributes?

No.

Eliminating the ability to add these attributes to results?

No, only eliminating the expensive/dynamic form of them. The full-data version is already cached, the download-table version is not related to this, this issue is a request to add. Definitely no.

Eliminating the saving of these attributes in FLAT?

No, the opposite: saving them in FLAT allows them to be added to results with near-zero back-end cost (but takes a bit of up-front planning, which I'm hoping to do here).

For users, I think this would just mean that it would a few days (at most, eg if its used by millions of records - but those should all be caught here) to add some new attribute to results options.

Once that's done, it would make them more accessible - I'd not need to restrict rowcount to protect Arctos.

@ewommack
Copy link

What does this look like in the results field?

No, only eliminating the expensive/dynamic form of them

@dustymc
Copy link
Contributor Author

dustymc commented Apr 10, 2023

What does this look like in the results field?

Whatever we decide here, defaulting to whatever it is now.

@ewommack
Copy link

Whatever we decide here, defaulting to whatever it is now.

Are there mock-ups or specific suggestions? I know @AJLinn had some ideas on things that might help her. Did I get that right?

@dustymc
Copy link
Contributor Author

dustymc commented Apr 10, 2023

mock-ups

Just turn some attributes on. It will be exactly that, unless I'm told something else.

Screenshot 2023-04-10 at 4 57 20 PM

specific suggestions

"Don't" I suppose would be my suggestion at the moment - I keep hearing (and believing!) how metadata (method, determiner, etc.) is critically important, this is (probably) going to be bare values. If we all agree on that then this is easy....

@dustymc dustymc changed the title Feature Request - remove individual attributes from search options Feature Request - remove individual attributes from search results options Apr 21, 2023
@dustymc dustymc mentioned this issue May 17, 2023
4 tasks
@campmlc
Copy link

campmlc commented May 18, 2023

For dynamic view:
For TPT and PICANTE
Column "Not Examined For" with values from Examined_Detected Code Table
Column: "Examined For" with values in the Examined_Detected Code Table
Column "Detected" ditto
Column "Not Detected" ditto

Column(s) for standard mammal measurements format; merging some values with details in methods (hind foot with claw for example, ear to notch etc) for RANGES
Column(s) for reproductive traits for RANGES

Delete from the Results Profile the following once we have the above:
"Endoparasite examination"
"Endoparasite detected"
"Ectoparasite examination"
"Ectoparasite detected"
"Examined for Parasites"
"Parasites Detected"

@dustymc
Copy link
Contributor Author

dustymc commented May 18, 2023

Column "Not Examined For" with values from Examined_Detected Code Table
Column: "Examined For" with values in the Examined_Detected Code Table
Column "Detected" ditto
Column "Not Detected" ditto

column headers can use the standard conversion (Not Examined For --> not_examined_for), values can use standard concatattributevalue function (semicolon-separated value-only concatenation)

@campmlc please confirm

@dustymc dustymc changed the title Feature Request - remove individual attributes from search results options Feature Request - attributes and search results options May 19, 2023
@jldunnum
Copy link

The suite of attributes for mammals which I think could be reduced to simple values (+units) for display in the search results include: sex, age class, total length, tail length, hind foot length, ear length, weight. The expansion of the reproductive data into many columns is also going to be an issue unfortunately because people are going to be working in those a lot as well during RANGES.
After that it would be necessary to be able to include one or two of the high cost attributes such as examined for or detected and still have access to a large number of records..

@jldunnum
Copy link

In terms of removal of SNV data:
this attribute can be removed from all Panama specimens because I added the new values already..
All other instances of SNV results will need to be replaced with examined for and detected or not detected virus:Orthohantavirus based on the yes or no
The downside here is that the SNV results attribute provided a positive or negative result in one attribute that could be displayed in results. Now we are switching and the results (detected or not detected virus:Orthohantavirus) are not options for display in results plus each attribute costs more so we get less record returns. That said, can they be added?

Finally, please add a total records returned at the top of the search results like we used to have. This should give the full number of records a search should return, regardless of how many are going to be displayed or are possible based on the number attributes included.
Thanks much!!

@dustymc
Copy link
Contributor Author

dustymc commented May 19, 2023

still have access

There is never a question of access, only instant access - which can potentially be addressed right here.

I think the biggest current downside to caching is that adding a bunch of stuff is going to increase our refresh lag time, and that's been 'emotional' a few times lately. Longer-term we may hit structural limitations, but I don't think we're there now.

Attributes are expensive because they require a function call, but that's particularly expensive because some attributes are encumbered. #3536 (comment) (some new category of always-encumbered things) is my only idea for mitigating that.

Anyway - access:

  • results column Attribute Detail contains all attribute data (as JSON)
  • Results/tools/attributes will provide attribute data (and see Feature Request - Attribute Download Enhancement #6223, I understand the need for a 'viewer' mode a bit more now).
  • Results/tools/request data will provide some stuff in flat-format, asynchronously
  • If this is some one-ish-off thing I can write SQL
  • Reports can do all sorts of things (but I got the idea that nobody believes that yesterday, I'm seriously not going to try to convince anyone that impossible things are possible....)

Finally, please add a total records returned at the top of the search results like we used to have. This should give the full number of records a search should return, regardless of how many are going to be displayed or are possible based on the number attributes included.

I'm relatively sure that's in "nothing a $100K/month AWS account won't fix" territory (and if I had the power to do that then all the rest of this would be moot).

@dustymc
Copy link
Contributor Author

dustymc commented May 19, 2023

@jldunnum
Copy link

jldunnum commented May 22, 2023 via email

@dustymc
Copy link
Contributor Author

dustymc commented May 22, 2023

I can only see and work with

Again, immediately and in one particular way - you can see them and work with them, just not as columns in an immediately-available table.

This issue is (among other things) an offer to cache stuff in that immediately-available table, so let me know what you want to see. (There's a proposal begging confirmation above.)

Anything beyond that probably can't be shoved into the results table, the remainder of this issue is an attempt to discover what is possible and will work for ya'll.

@cjconroy
Copy link

Could we discuss this at the arctos meeting Thursday? Not being able to query for null attributes, or download in an easily digestable format, will really hamper basic work and the Ranges workflow. If we can't get a simple table to download, perhaps there is some Excel macro that could turn the single column attribute table into multiple columns? Surely this will be a thing that most Arctos users will need.

@dustymc
Copy link
Contributor Author

dustymc commented Jul 6, 2023

Sorta related because I would think it might affect priority: There are a bunch of attribute-related timeouts in the logs, I think I'm going to have to make dynamic attributes more expensive.

@campmlc
Copy link

campmlc commented Jul 6, 2023

Can we get a summary of where we are and what is needed on this thread?
I think if we can work this out it would be a hugely useful alternative to JSON for an initial screen of records, e.g. "Attribute summary" vs "Attribute Detail".

@dustymc
Copy link
Contributor Author

dustymc commented Jul 19, 2023

#6518 would be a really great time to do this - can we act on this, like NOW? @jldunnum @mkoo help!

I don't think there's a useful summary, there are many important details above, but here's a try: Users would like to add some attribute_type-labeled columns consisting of attribute_value=attribute_units[ ; attribute_value=attribute_units] concatenations (like eg sex is currently) to the cache, there's not a great candidate list and what there is contains things that have been requested to change. Several days of CPU are required for such additions, that's been causing some trouble at TACC in addition to the usual of things not refreshing or loading during that time, it would be very good to do whatever needs done all at once, Friday is a rare opportunity to handle this during a scheduled rebuild.

@mkoo
Copy link
Member

mkoo commented Jul 19, 2023 via email

@dustymc
Copy link
Contributor Author

dustymc commented Jul 19, 2023

interface web dev

That would still likely involve the API, which is where the (maybe not aggressive enough) throttle is.

@Jegelewicz
Copy link
Member

Everyone to comment here what is needed - also the newly requested reproductive attributes, so wait until then.

@campmlc
Copy link

campmlc commented Aug 14, 2023

For host/parasite searches:
Examined / Not Examined and Detected/ Not detected attributes
Individual count
Verbatim host ID
Verbatim host sex
Location in host
Life stage

@mkoo mkoo added Let's talk! GH comments are exhausted; to be discussed in realtime Blocked Issue cannot be addressed until another Issue (which should be linked) is addressed. and removed Blocked Issue cannot be addressed until another Issue (which should be linked) is addressed. labels Aug 24, 2023
@Jegelewicz
Copy link
Member

Should we create a Google Sheet with the currently cached attributes and then a list of those that are requested for addition?

@dustymc
Copy link
Contributor Author

dustymc commented Aug 24, 2023

@Jegelewicz
Copy link
Member

Can we put that in the shared drive? Maybe into this document - https://docs.google.com/spreadsheets/d/1HsacHm8HNCLiCrjFp-hKctu2q64k9ekLQfi7hsACmPw/edit?usp=sharing

@dustymc
Copy link
Contributor Author

dustymc commented Aug 24, 2023

put it where you want, just maybe edit my comment if the link has to change

@campmlc
Copy link

campmlc commented Aug 24, 2023

I can only view only and can't see the comments.

@Jegelewicz
Copy link
Member

OK - I copied that stuff to the file in the shared drive and changed the link in the first comment. It will be easier to find there.

@Jegelewicz
Copy link
Member

Jegelewicz commented Oct 5, 2023

Verbatim host ID, Verbatim host sex, life stage and location in host are already in the list (Google Sheet)

@dustymc
Copy link
Contributor Author

dustymc commented Oct 27, 2023

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked Issue cannot be addressed until another Issue (which should be linked) is addressed. Enhancement I think this would make Arctos even awesomer! Let's talk! GH comments are exhausted; to be discussed in realtime
Projects
None yet
Development

No branches or pull requests

8 participants