Skip to content

Commit 8627931

Browse files
authored
Add a basic contributing guide
Copied from https://github.com/fatiando/data with a note to make some changes to adding new datasets.
1 parent 0f2e83b commit 8627931

File tree

1 file changed

+98
-0
lines changed

1 file changed

+98
-0
lines changed

CONTRIBUTING.md

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
# Contributing Guidelines
2+
3+
:tada: **First off, thank you for considering contributing to our project!** :tada:
4+
5+
This is a community-driven project, so it's people like you that make it useful and
6+
successful.
7+
8+
If you get stuck at any point you can create an issue on GitHub (look for the *Issues*
9+
tab in the repository) or contact us at one of the other channels mentioned below.
10+
11+
12+
## General Guidelines
13+
14+
For general information about contributing to open-source and the Fatiando a Terra
15+
projects, please refer to our [standard Contributing Guide][contrib].
16+
17+
This document also contains guidelines specific to this repository below.
18+
19+
20+
## Ground Rules
21+
22+
The goal is to maintain a diverse community that's pleasant for everyone.
23+
**Please be considerate and respectful of others**.
24+
Everyone must abide by our [Code of Conduct][coc] and we encourage all to read
25+
it carefully.
26+
27+
28+
## Requirements for datasets
29+
30+
The following are the requirements that datasets need to meet in oder to be
31+
considered for this project.
32+
33+
> **Definitions:**
34+
>
35+
> * *Source dataset*: the original data as distributed by the data owners/creators.
36+
> * *Output dataset*: the modified/repackaged version that we distribute.
37+
> * *FAIR data*: data that follows the [FAIR principles][fair].
38+
39+
Source datasets must:
40+
41+
1. Be FAIR data: either in the public domain or distributed under an open licence that does not
42+
place restrictions on reuse beyond attribution or using the same license.
43+
For example, CC-BY and CC-BY-SA are acceptable but not CC-BY-NC.
44+
1. Represent a common real-world application.
45+
1. Contain interesting features that **lead to teachable moments** for tutorials.
46+
for example, interesting anomalies easily associated with geology, large gaps in
47+
bathymetry lead to interesting interpolation issues, etc.
48+
49+
Output datasets should:
50+
51+
1. Contain standard and descriptive variable names. For example, "longitude"
52+
instead of "LON", "gravity_disturbance_mgal" instead of "FAA", "easting_m"
53+
instead of "x".
54+
1. Include associated metadata (datum, license, source, etc.) if supported
55+
by the format. For example, netCDF metadata following CF conventions
56+
through `.attrs` attributes in xarray.
57+
3. Specify units through appropriate metadata (CF conventions in netCDF or
58+
column names in CSV, like `gravity_disturbance_mgal`). Exceptions are
59+
longitude and latitude coordinates which are always in decimal degrees.
60+
1. Strive to be under 10 Mb in size, if possible. This keeps downloads fast,
61+
particularly when building documentation and testing on CI. Use compression
62+
when appropriate and only if it doesn't add difficult to install dependencies.
63+
Larger files may be considered but should not be used in code that runs on
64+
CI to avoid long build times and overloading the data servers.
65+
66+
67+
## Adding a new dataset
68+
69+
1. **Propose a new dataset:** First, open an Issue in [][issue] with information about the
70+
proposed dataset for discussion.
71+
72+
THE FOLLOWING NEEDS TO BE UPDATED.
73+
74+
Follow these guidelines to prepare the dataset:
75+
76+
* See our [standard Contributing Guide][contrib] for instructions on creating
77+
pull requests and setting up your environment.
78+
* Create a folder following the naming convention `location_datatype` (all lower
79+
case and separated by `_`).
80+
* Inside that folder, create a Jupyter notebook called `prepare.ipynb` with the
81+
code for downloading (using [Pooch](https://github.com/fatiando/pooch)),
82+
formatting (cleaning, slicing, datum conversion, etc), and exporting the
83+
new dataset. Follow the conventions in the other notebooks.
84+
* If any new dependencies are required to prepare the dataset, add them to the
85+
`environment.yml` file.
86+
* The output dataset should follow the same naming convention as the folder:
87+
`location_datatype.extension`.
88+
* The notebook should create a `preview.jpg` image with a plot of the output
89+
dataset for easy inspection.
90+
* If the original data can't be automatically downloaded in the notebook and it
91+
is under 50 Mb, you may include it in the repository. Feel free to use
92+
compression to reduce the size of the file(s).
93+
* Include the information about the new dataset in the `README.md` file.
94+
95+
96+
[contrib]: https://github.com/fatiando/community/blob/main/CONTRIBUTING.md
97+
[coc]: https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md
98+
[fair]: https://www.go-fair.org/fair-principles/

0 commit comments

Comments
 (0)