Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help needed in Arctos -- Specimens without locality #8235

Closed
WaigePilson opened this issue Oct 27, 2024 · 6 comments
Closed

Help needed in Arctos -- Specimens without locality #8235

WaigePilson opened this issue Oct 27, 2024 · 6 comments

Comments

@WaigePilson
Copy link

I noticed that 3087 specimens in my collection (UWBM:PB) do not report a locality or collecting event recorded for them in the catalog record search results.

  • In some cases (e.g., https://arctos.database.museum/guid/UWBM:PB:122800), this is because the locality is set to access only by uwbm_pb users. However, even when logged in with uwbm_pb access, the locality does not show up in the catalog record search results and therefore you can't download the data of specimen, collecting event, and locality together. I don't think this is a bug, but it is annoying. Is it possible to add a feature to show the locality data when a user with access is logged in? Or @dustymc are you able to just pull a report of the localities associated with the specimens in the attached .csv?
    uwbmpb_specimens without specloc.csv

  • In other cases (e.g., https://arctos.database.museum/guid/UWBM:PB:78163) the specimen truly has no locality or collecting event associated with it. I can fix this relatively easily, and will do so. But I thought it wasn't possible to add a catalog record without a collecting event and locality; is this a bug?

@dustymc
Copy link
Contributor

dustymc commented Oct 28, 2024

show the locality data when a user with access is logged in?

Possible: Sure, nothing seems impossible to code here. A cost we can pay? Much less sure...

Right now FLAT uses getPrioritySpecimenEvent() (see #862, #1483) which excludes all and attribute_type='locality access'-having data. Changing that would mean two functions (a public and a private), maybe some sort of VPD-adjustment, much more CPU to process filtered_flat, more "private" data to keep track of/more places for bugs to spawn/more attack surface, etc. I think it's probably a significant cost, so I'm going to pass it back to The Community (with a reference to https://github.com/ArctosDB/internal/issues/345) for guidance.

report

Not starting from a CSV and not without specifics, but probably, I can write a report for about anything - or I suspect what you want is available from...

Screenshot 2024-10-28 at 07 57 56

wasn't possible to add a catalog record without a collecting event and locality

https://docs.google.com/spreadsheets/d/1VbNC3k17WAHMum_qD5UYoXxUUWwXXh5gZSM5vfGvRzU/edit?gid=529334279#gid=529334279&range=89:89 is one piece of documentation which at least suggests that this is possible, I'm sure other documentation could use improvement. FWIW this is one of the few places where I'm not sure that supporting NULL in lieu of the old "force 'em to say SOMETHING!" way is better; event type can probably be inferred from the material, geography (continents are MUCH more useful than nothing!) is likely better-known at the time of cataloging than it will be in the future, and one can (assuming Dr. Hawking doesn't pop in with a conflicting opinion) be relatively sure it wasn't collected tomorrow; a fact that loses precision with every passing "NO DATA" day.

@dustymc dustymc added this to the Community Forum milestone Oct 28, 2024
@WaigePilson
Copy link
Author

Not starting from a CSV and not without specifics, but probably, I can write a report for about anything - or I suspect what you want is available from...

This allows me to pull the locality data for specimens which I've pulled up in my search BUT there are two issues:

  1. I cannot figure out how to JUST search for catalog records WITHOUT locality/event: I have tried entering a list of these 3086 catalog numbers, but that is too long for Arctos to accept as a search parameter. I cannot figure out how to search for records with a null value in a field. Maybe I'm just missing something?
  2. Even if I could pull just the 3086 catalog records I am interested in, and pull up the related events or localities, this would then give me a list of all localities associated with those 3086 specimens, but not tell me which of the 3086 specimens is associated with which locality. My goal is to have a list of the 3086 specimens, and the events or localities associated with them, so that I can figure out which specimens have localities (but the access is just set to uwbm_pb) and which did not properly get events/localities associated.

Is there another way? Just let me know what you would need from me and I'll get it!

https://docs.google.com/spreadsheets/d/1VbNC3k17WAHMum_qD5UYoXxUUWwXXh5gZSM5vfGvRzU/edit?gid=529334279#gid=529334279&range=89:89 is one piece of documentation which at least suggests that this is possible

I think this explains it. I must have made a mistake in one of my bulk import spreadsheets while moving legacy data over and forgotten to include the record event type. And now I'm paying a huge price for it...

@jessicatir
Copy link

https://docs.google.com/spreadsheets/d/1VbNC3k17WAHMum_qD5UYoXxUUWwXXh5gZSM5vfGvRzU/edit?gid=529334279#gid=529334279&range=89:89 is one piece of documentation which at least suggests that this is possible

I think this explains it. I must have made a mistake in one of my bulk import spreadsheets while moving legacy data over and forgotten to include the record event type. And now I'm paying a huge price for it...

I had the same experience last week - left this blank and ended up with specimens that didn't have event info. Is it possible to add an error message/warning if record_event_type is blank but record event info is provided?

@dustymc
Copy link
Contributor

dustymc commented Oct 28, 2024

search for catalog records WITHOUT locality/event:

That's actionable, ArctosDB/dev#88

specimens is associated with which locality ... what you would need from me

Let's start with SQL and see where we end up. I have three problems, all pretty much related to the scale at which I work and hopefully all irrelevant for you:

  1. "specimens" can involve an infinite number of columns.
  2. events can involve an infinite number of columns.
  3. localities can involve an infinite number of columns.

That is, a record can have 347 record-events (along with everything else it can have and everything those dependencies can have and etc.), and each of those can have 97432 event attributes and 349738 locality attributes, and I think you're asking for a single row in a spreadsheet. If you can somehow squish that triple infinity of potential complexity down into a finite number of columns in a spreadsheet-like structure (or the start of one, this can evolve and - unlike Arctos UI - doesn't have to anticipate all use cases) i can pull some data.

error message/warning

To the current UI with the current check: Nope, it's pass or fail.

Going forward, there are a few options:

  1. Change the rules eg see last comment in Help needed in Arctos -- Specimens without locality #8235 (comment), I'm not sure this is a good idea (but also can't really enforce "at least one" - but also-also I could for entry)
  2. https://github.com/orgs/ArctosDB/discussions/8052 - there's been inconclusive discussion about having more than one entry form, possibly it/they could have baked-in rules that differ from the global
  3. data entry collection-specific requirements #4384 - there have been also-inconclusive discussions about adding 'when collection then....' types of rules, this would fit within that (but possibly isn't appropriate for your entire collection).
  4. Dynamically Highlight Required Data Entry Fields #7560 is a request for UI rules (which I think would not have picked up on what seems to have happened - considering the scope within which we focus our always-limited resources seems important)
  5. Data entry has been an API for a very long time (before coldfusion!), anyone can write their own application with their own rules
  6. postgres is pretty awesome, I actually could add variable states (eg error vs warning) to the check, but I have no idea what that would look like in a bulk environment - it would open up a significant UI problem which I'd need help solving.

All of those still individually look (somewhat) approachable, but I think bits-and-pieces (eg lots of forms all with increased complexity all maintained by me) would quickly overwhelm available resources (eg I'll run off screaming - but I think @mkoo has some unfocused resources at the moment...).

@WaigePilson
Copy link
Author

Let's start with SQL and see where we end up.

Specifically, I need some way to connect:
Catalog number (or GUID) and all locality_name(s) associated with it

So just two data fields; I don't need any of the other specimen, locality, or event data. I can pull any other data I need to add these collecting events which didn't properly upload. I just need to figure out which specimens did successfully get collecting events added and which did not.

@dustymc
Copy link
Contributor

dustymc commented Oct 28, 2024

two data fields

Here you go:

temp_uwbmpbloc.csv.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants