Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recipes #10

Open
vincentarelbundock opened this issue Mar 5, 2014 · 5 comments
Open

Recipes #10

vincentarelbundock opened this issue Mar 5, 2014 · 5 comments

Comments

@vincentarelbundock
Copy link

In terms of repository structure, I think it would be beneficial to split each data source into separate files. The idea would be to create a standarized "recipe" format that would include all info about the dataset (e.g. where to download, bibtex cite, name of cleaning script, date updated), and then a cleaning script that does all the magic we need.

I use something like that locally, where I have a YAML file that specifies all the info and then an accompanying python script that I use for cleaning.

This makes user contributions very easy. They just cut and paste another "recipe" and include an R script that does the cleaning. The only thing psData has to do is provide a proper API to parse the recipe, download the data, and activate the cleaning script.

Think of something like the homebrew install for mac and its library of "formulas":

https://github.com/Homebrew/homebrew/tree/master/Library/Formula

@vincentarelbundock
Copy link
Author

Copied over from #8

You would have 2 files:

database_political_institutions.yaml (download url, bibtex cite, etc.)
database_political_institutions.R (cleaning script with all transformations)
And a standardized function:

get_data(): Parse .yaml file, download data if not already cached, and run R script.
If flagged for caching, then copy yaml, raw data, processed data and R script to specified path.

@vincentarelbundock
Copy link
Author

If we get something like that, I would almost certainly contribute recipes.

@christophergandrud
Copy link
Contributor

Just had a pie in the sky thought: It would be interesting if we could create a really simple website where someone who hosted a data set could fill out a form with specific metadata and information on how to download the data set.

On submission of the web form the recipe would be generated and a pull request would be initiated.

This would make it really easy to contribute new recipes.

@antagomir
Copy link
Member

If someone has the chance to work on the implementation, we could probably arrange server space with rOpenGov.

@christophergandrud
Copy link
Contributor

Sounds good. This can be something to work on in #12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants