Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy-data-script removing species from the dataset #1

Open
gavinfay opened this issue Oct 26, 2022 · 3 comments
Open

tidy-data-script removing species from the dataset #1

gavinfay opened this issue Oct 26, 2022 · 3 comments
Assignees

Comments

@gavinfay
Copy link
Contributor

@AngeliaMiller @CataRoman It looks like the tidy-data-script.R is hardwiring a selection of ~10 species, thus the 'complete data' is not complete. Is this intentional?

@gavinfay
Copy link
Contributor Author

gavinfay commented Oct 26, 2022

General suggestion, include some notes at the top of each R script summarizing what the code in that file does. ie the objective / part it plays in the analysis workflow, what it takes, what it outputs, etc.
e.g. There seems to be a lot of work in tidy-data-script.R that is redone/reorganized in complete-datasets other than aggregating spiny dogfish and skates.

@AngeliaMiller
Copy link
Contributor

You are correct that part of tidy-data-script.R is hardwiring a selection of 10 species, thereby forming an incomplete 'complete dataset'. The script was initially created for the summary statistics for the workshop. I will make a copy of this script, adjust to tidy the full data set (~1960s and ~1000 species), and include some notes at the top for each script. My intention was to have tidy-data-script.R be a script for tidying the full dataset not just the 10 species.

Some of the work in the tidy-data-script and complete-datasets, may be the same because some information was lost when we use complete(). I will take a look at it again and move anything from complete-datasets.R that could/should be done in tidy-data-script.R.

@AngeliaMiller AngeliaMiller self-assigned this Oct 26, 2022
@gavinfay
Copy link
Contributor Author

Thanks. See the file in the data-cleaning branch referenced in issue #3 that moves towards this. I think we want to have a 'base' raw data set that everyone can then use, with some operations being case-specific. (e.g. the spatial configuration of subsets of data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants