Recipes #10

vincentarelbundock · 2014-03-05T13:33:39Z

In terms of repository structure, I think it would be beneficial to split each data source into separate files. The idea would be to create a standarized "recipe" format that would include all info about the dataset (e.g. where to download, bibtex cite, name of cleaning script, date updated), and then a cleaning script that does all the magic we need.

I use something like that locally, where I have a YAML file that specifies all the info and then an accompanying python script that I use for cleaning.

This makes user contributions very easy. They just cut and paste another "recipe" and include an R script that does the cleaning. The only thing psData has to do is provide a proper API to parse the recipe, download the data, and activate the cleaning script.

Think of something like the homebrew install for mac and its library of "formulas":

https://github.com/Homebrew/homebrew/tree/master/Library/Formula

vincentarelbundock · 2014-03-05T13:39:40Z

Copied over from #8

You would have 2 files:

database_political_institutions.yaml (download url, bibtex cite, etc.)
database_political_institutions.R (cleaning script with all transformations)
And a standardized function:

get_data(): Parse .yaml file, download data if not already cached, and run R script.
If flagged for caching, then copy yaml, raw data, processed data and R script to specified path.

vincentarelbundock · 2014-03-05T13:40:28Z

If we get something like that, I would almost certainly contribute recipes.

christophergandrud · 2014-04-17T15:28:36Z

Just had a pie in the sky thought: It would be interesting if we could create a really simple website where someone who hosted a data set could fill out a form with specific metadata and information on how to download the data set.

On submission of the web form the recipe would be generated and a pull request would be initiated.

This would make it really easy to contribute new recipes.

antagomir · 2014-04-17T17:57:36Z

If someone has the chance to work on the implementation, we could probably arrange server space with rOpenGov.

christophergandrud · 2014-04-18T05:51:14Z

Sounds good. This can be something to work on in #12.

vincentarelbundock mentioned this issue Mar 5, 2014

Cache data locally after download #8

Open

briatte mentioned this issue Jun 11, 2014

Data recipe for PITF Worldwide Atrocities Dataset? #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recipes #10

Recipes #10

vincentarelbundock commented Mar 5, 2014

vincentarelbundock commented Mar 5, 2014

vincentarelbundock commented Mar 5, 2014

christophergandrud commented Apr 17, 2014

antagomir commented Apr 17, 2014

christophergandrud commented Apr 18, 2014

Recipes #10

Recipes #10

Comments

vincentarelbundock commented Mar 5, 2014

vincentarelbundock commented Mar 5, 2014

vincentarelbundock commented Mar 5, 2014

christophergandrud commented Apr 17, 2014

antagomir commented Apr 17, 2014

christophergandrud commented Apr 18, 2014