Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Higher Geog fix work (formerly Many islands lack country) #7660

Closed
cjconroy opened this issue Apr 11, 2024 · 16 comments
Closed

Higher Geog fix work (formerly Many islands lack country) #7660

cjconroy opened this issue Apr 11, 2024 · 16 comments

Comments

@cjconroy
Copy link

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Describe the bug
If I query arctos for country null, in MVZ I get 8763 records in our voucher collection. There are a lot that are unused catalog numbers, truly pelagic records. But, if you map them, you see that thousands are plotted, many of them near shore islands to Mexico and Alaska.
I recall this was a github issue at some point, but I have not found that issue. What is the status of fixing this? If someone queried for country = Mexico, they will not get these island records.

To Reproduce
Steps to reproduce the behavior:
query for country = NULL

Expected behavior
Islands that are in a country should be findable by that country.

Screenshots
attached

Priority
Kinda high since users are potentially not seeing all of our records.

world country null baja country = null Baja country = mexico
@cjconroy cjconroy added the Bug Arctos is not performing as it should. label Apr 11, 2024
@amgunderson
Copy link
Contributor

amgunderson commented Apr 11, 2024

The Aleutian Islands are a total mess with this issue. Country=NULL includes many 1000s of specimens georeferenced over water, even very near land like this one, https://arctos.database.museum/guid/UAM:Mamm:113877. Some specimens are collected in international waters and country=NULL is valid but that is a tiny minority of specimens being classified by Arctos as country=NULL.

@mkoo
Copy link
Member

mkoo commented Apr 13, 2024

I agree we have to fix probably best by island and group by group. I think I'll start with a little GIS work to identify EEZ for country vs international waters, and of course consult with collections. I think the new loc_attribute for waterbody will still allow association with specific bodies of seas and oceans but many collections will want the primary hg to be to country. On the Geog committee project now.

@DerekSikes
Copy link

country + state are 100% expected for all the records I manage. I have many searches that are limited by asserted state = Alaska and would be most unhappy if some were missed because of this.

@dustymc
Copy link
Contributor

dustymc commented Apr 15, 2024

this

If "this" involves geography and you think something should be different, #7666.

@mkoo
Copy link
Member

mkoo commented Jun 5, 2024

We're starting work on this at MVZ, specifically the locs with "North Pacific Ocean, Bering Sea" as HG. Aleutians are firmly part of US:AK so starting there by changing HG to US:AK. We're still fixing and checking out how best to do in bulk, so keep you posted Dusty if we need help.

This is tied to the new loc attribute of "waterbody" ArctosDB/code-table-work#83

@amgunderson
Copy link
Contributor

amgunderson commented Jun 5, 2024

USGS has boundary shapefiles, https://www.sciencebase.gov/catalog/item/59d5b565e4b05fe04cc53a91, that look fully inclusive of all islands and surrounding waters in AK at least. Can all localities falling within the Alaska boundary be given Country=USA and State=AK ?
I am not interested in a long process of overhauling geography, can we not just fix the localities that are not found when searching for states and countries?

@mkoo
Copy link
Member

mkoo commented Jun 5, 2024

Thanks Aren-- looks like I have your ok to do this for locs in AK waters, so I'm going to go ahead. The shapefile will be helpful!

so far most of these are only MVZ and UAM records only and agreed, this is just a clean-up task.

@mkoo mkoo removed this from the Needs Discussion milestone Jun 6, 2024
@mkoo mkoo changed the title Many islands lack country Higher Geog fix work (formerly Many islands lack country) Jun 10, 2024
@mkoo
Copy link
Member

mkoo commented Jun 10, 2024

@dustymc I'm starting a spreadsheet for a bulk update where we can change the HG and the spec_loc. What are the minimum fields needed for that?

  • COLLECTING_EVENT_ID
  • HIGHER_GEOG
  • new_HIGHER_GEOG
    -SPEC_LOCALITY
  • new_SPEC_LOCALITY

Do you need/want localityID? anything else? less or more? Our current working spreadsheet has a lot more since we are verifying with verbatim and locality attributes before any updates. For most we are returning to the verbatim locality but cleaning as needed.

@dustymc
Copy link
Contributor

dustymc commented Jun 10, 2024

COLLECTING_EVENT_ID

Assume those will be gone/merged before you're done typing (because they probably will be).

localityID

Ditto.

else

Maybe better to do this in smaller batches? Spreadsheets like this seem to always find a way to clash with themselves, but I'm up for whatever.

I think just

HIGHER_GEOG
new_HIGHER_GEOG
-SPEC_LOCALITY
new_SPEC_LOCALITY

is sufficient, but see above, I'm always surprised....

Also first line in that spreadsheet

Bristol Bay, no specific locality

that'll break any geolocate-like-thing, and feeding those are (most of) why specloc exists.

Also

no specific locality
specific locality unknown

if we're cleaning anyway.....

@mkoo
Copy link
Member

mkoo commented Jun 10, 2024

ok that's perfect.
Also, that is what I meant by spec_loc cleanup. We havent really started yet!
But now that I know what the end product will be, we'll start working on it. I'll send an email wtih the CSV for you soon
Thanks!

@mkoo
Copy link
Member

mkoo commented Jun 15, 2024

@dustymc For Monday: first CSV for HG batch updating (109 rows to load). Let us know if anything needs tweaking format-wise. Kat applied some python to do overhaul but I am still checking every one. I see some dup localities but am ignoring for now and will fix in another pass (probably with the usual Arctos tools). thx!
HG batch1_toload.csv

@mkoo mkoo self-assigned this Jun 15, 2024
@dustymc
Copy link
Contributor

dustymc commented Jun 18, 2024

@mkoo updates from CSV in #7660 (comment) complete.

@mkoo
Copy link
Member

mkoo commented Jun 26, 2024

OK here's batch #2 and #3 @dustymc
HG cleanup batch3.csv
HG cleanup- batch2.csv

this is the rest for Bering Sea. We tried to keep to origianl spec_loc as much as possible and make the localities consistent so we can merge dups more easily later if desired. Several localities with complicated info were manually edited to make sure all the components were captured (orig forms in attributes and verbatim were checked and left as is for tracking)

ThX!

@dustymc
Copy link
Contributor

dustymc commented Jun 27, 2024

HG cleanup- batch2.csv

I was not able to find a locality ID for these, I removed them from the update:

temp_geo_updt_no_locid.csv

These failed with checkfreetext(new_spec_locality) is false and were also removed:

temp_geo_updt_badnewSL.csv

UPDATE 1834

successfully updated

HG cleanup batch3.csv

I was not able to find a locality ID for these, I removed them from the update:

temp_geo_updt_no_locid(1).csv

These failed with checkfreetext(new_spec_locality) is false and were also removed:

temp_geo_updt_badnewSL(1).csv

UPDATE 1194

successfully updated

@dustymc dustymc added this to the Clean / Migrate milestone Jun 27, 2024
@mkoo
Copy link
Member

mkoo commented Jul 1, 2024

Thanks Dusty, I will go over the rest manually-- a lot are just the same 'no specific locality' business so probably best to review all the geog details anyway.
Very close to being done with the bering sea and thanks for fixing 3000+!

@dustymc dustymc removed the Bug Arctos is not performing as it should. label Jul 22, 2024
@mkoo
Copy link
Member

mkoo commented Oct 22, 2024

I'm closing this issue since the original batch work is done and whatever failed was manually fixed.
MVZ is has another batch of Bering seas to convert to United States:Alaska and will work on that batch next with a new issue for Dusty. (ref:https://github.com/museum-of-vertebrate-zoology/MVZ-Data/issues/986)

@mkoo mkoo closed this as completed Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants