Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PullSpp.fn() takes too long #30

Open
kellijohnson-NOAA opened this issue Sep 26, 2020 · 10 comments
Open

PullSpp.fn() takes too long #30

kellijohnson-NOAA opened this issue Sep 26, 2020 · 10 comments
Assignees
Labels
enhancement warehouse Pertains to getting, documenting, or fixing data in the warehouse.

Comments

@kellijohnson-NOAA
Copy link
Contributor

PullSpp.fn() extracts data for all species from a single year of the WCGBTS, which takes forever. This function acts to supply a lookup data frame for common name to scientific name for species caught within the surveys performed or used by the NWFSC. See its development here in the following pull request: 28#. Via email request, the survey team is trying to make the original lookup table within the warehouse accessible to all users, i.e., downloadable from a web link. This issue is to remind us that the code in URLtext in line 19 will need to be changed for efficiency purposes.

@kellijohnson-NOAA
Copy link
Contributor Author

@Curt-Whitmire-NOAA do you know if there is a way to access a list of species names with their common and scientific name using sql or url. Currently, I download all data and find unique values, this is very costly with respect to time and I am looking for a simple way to perhaps filter upon download rather than after download or a url for an existing table. Thanks.

@Curt-Whitmire-NOAA
Copy link

@kellijohnson-NOAA, I can certainly provide a table that could be uploaded to Github. Do you only want unique fish names, or do you also want invertebrates? This would be a short term fix, and wouldn't update dynamically. For that I can work with our developer to add the taxonomy table to the DW front end. I will likely have some follow-up questions to make this table as useful as possible.

@kellijohnson-NOAA
Copy link
Contributor Author

I think that the current function will work until we can get a dynamic solution going forward.

In short, I am looking for a way to link common names to scientific names and the reverse where it accounts for historical names and species complexes.

@Curt-Whitmire-NOAA
Copy link

@kellijohnson-NOAA , I will email you a CSV with the current list of "fish" in the Data Warehouse taxonomy dimensions table. Please review and let me know if it suits your needs for now. We can then discuss a better dynamic solution.

kellijohnson-NOAA pushed a commit that referenced this issue Apr 8, 2021
Thanks to @Curt-Whitmire-NOAA for providing the sql call.
This is much much faster than unique() around a full data pull.
Includes code to make GetSpp.fn more robust when there are
multiple names returned.
@kellijohnson-NOAA kellijohnson-NOAA added the warehouse Pertains to getting, documenting, or fixing data in the warehouse. label Dec 1, 2022
@kellijohnson-NOAA
Copy link
Contributor Author

@Curt-Whitmire-NOAA any progress on getting the taxonomy dimensions table available as a pull rather than how we have it now with a saved csv that is NEVER updated?

@Curt-Whitmire-NOAA
Copy link

@kellijohnson-NOAA I found that the taxonomy dimension table is already exposed via the API. There are some fields that need to be fully populated (e.g., ITIS Serial #, WoRMS AphiaID) but as far as I can tell the table includes the full list of taxonomic names in for all our FRAM programs.

@Curt-Whitmire-NOAA
Copy link

@Curt-Whitmire-NOAA
Copy link

@kellijohnson-NOAA Potential filters (arguments) to consider for including in the PullSpp.fn() are:

  1. species_category
  2. species_subcategory
  3. grp_reg_depth_category

If there's anything else you need for this enhancement, please let me know.

@kellijohnson-NOAA
Copy link
Contributor Author

Thank you @Curt-Whitmire-NOAA the link you gave me worked flawlessly using

species <- get_json("https://www.webapps.nwfsc.noaa.gov/data/api/v1/source/warehouse.taxonomy_dim/selection.json?filters=species_category=fish")

@Curt-Whitmire-NOAA
Copy link

@kellijohnson-NOAA glad it meets your needs! Now we should both stop working for the night ;^)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement warehouse Pertains to getting, documenting, or fixing data in the warehouse.
Projects
None yet
Development

No branches or pull requests

2 participants