|
| 1 | +# Contributing Guidelines |
| 2 | + |
| 3 | +:tada: **First off, thank you for considering contributing to our project!** :tada: |
| 4 | + |
| 5 | +This is a community-driven project, so it's people like you that make it useful and |
| 6 | +successful. |
| 7 | + |
| 8 | +If you get stuck at any point you can create an issue on GitHub (look for the *Issues* |
| 9 | +tab in the repository) or contact us at one of the other channels mentioned below. |
| 10 | + |
| 11 | + |
| 12 | +## General Guidelines |
| 13 | + |
| 14 | +For general information about contributing to open-source and the Fatiando a Terra |
| 15 | +projects, please refer to our [standard Contributing Guide][contrib]. |
| 16 | + |
| 17 | +This document also contains guidelines specific to this repository below. |
| 18 | + |
| 19 | + |
| 20 | +## Ground Rules |
| 21 | + |
| 22 | +The goal is to maintain a diverse community that's pleasant for everyone. |
| 23 | +**Please be considerate and respectful of others**. |
| 24 | +Everyone must abide by our [Code of Conduct][coc] and we encourage all to read |
| 25 | +it carefully. |
| 26 | + |
| 27 | + |
| 28 | +## Requirements for datasets |
| 29 | + |
| 30 | +The following are the requirements that datasets need to meet in oder to be |
| 31 | +considered for this project. |
| 32 | + |
| 33 | +> **Definitions:** |
| 34 | +> |
| 35 | +> * *Source dataset*: the original data as distributed by the data owners/creators. |
| 36 | +> * *Output dataset*: the modified/repackaged version that we distribute. |
| 37 | +> * *FAIR data*: data that follows the [FAIR principles][fair]. |
| 38 | +
|
| 39 | +Source datasets must: |
| 40 | + |
| 41 | +1. Be FAIR data: either in the public domain or distributed under an open licence that does not |
| 42 | + place restrictions on reuse beyond attribution or using the same license. |
| 43 | + For example, CC-BY and CC-BY-SA are acceptable but not CC-BY-NC. |
| 44 | +1. Represent a common real-world application. |
| 45 | +1. Contain interesting features that **lead to teachable moments** for tutorials. |
| 46 | + for example, interesting anomalies easily associated with geology, large gaps in |
| 47 | + bathymetry lead to interesting interpolation issues, etc. |
| 48 | + |
| 49 | +Output datasets should: |
| 50 | + |
| 51 | +1. Contain standard and descriptive variable names. For example, "longitude" |
| 52 | + instead of "LON", "gravity_disturbance_mgal" instead of "FAA", "easting_m" |
| 53 | + instead of "x". |
| 54 | +1. Include associated metadata (datum, license, source, etc.) if supported |
| 55 | + by the format. For example, netCDF metadata following CF conventions |
| 56 | + through `.attrs` attributes in xarray. |
| 57 | +3. Specify units through appropriate metadata (CF conventions in netCDF or |
| 58 | + column names in CSV, like `gravity_disturbance_mgal`). Exceptions are |
| 59 | + longitude and latitude coordinates which are always in decimal degrees. |
| 60 | +1. Strive to be under 10 Mb in size, if possible. This keeps downloads fast, |
| 61 | + particularly when building documentation and testing on CI. Use compression |
| 62 | + when appropriate and only if it doesn't add difficult to install dependencies. |
| 63 | + Larger files may be considered but should not be used in code that runs on |
| 64 | + CI to avoid long build times and overloading the data servers. |
| 65 | + |
| 66 | + |
| 67 | +## Adding a new dataset |
| 68 | + |
| 69 | +1. **Propose a new dataset:** First, open an Issue in [][issue] with information about the |
| 70 | + proposed dataset for discussion. |
| 71 | + |
| 72 | +THE FOLLOWING NEEDS TO BE UPDATED. |
| 73 | + |
| 74 | +Follow these guidelines to prepare the dataset: |
| 75 | + |
| 76 | +* See our [standard Contributing Guide][contrib] for instructions on creating |
| 77 | + pull requests and setting up your environment. |
| 78 | +* Create a folder following the naming convention `location_datatype` (all lower |
| 79 | + case and separated by `_`). |
| 80 | +* Inside that folder, create a Jupyter notebook called `prepare.ipynb` with the |
| 81 | + code for downloading (using [Pooch](https://github.com/fatiando/pooch)), |
| 82 | + formatting (cleaning, slicing, datum conversion, etc), and exporting the |
| 83 | + new dataset. Follow the conventions in the other notebooks. |
| 84 | +* If any new dependencies are required to prepare the dataset, add them to the |
| 85 | + `environment.yml` file. |
| 86 | +* The output dataset should follow the same naming convention as the folder: |
| 87 | + `location_datatype.extension`. |
| 88 | +* The notebook should create a `preview.jpg` image with a plot of the output |
| 89 | + dataset for easy inspection. |
| 90 | +* If the original data can't be automatically downloaded in the notebook and it |
| 91 | + is under 50 Mb, you may include it in the repository. Feel free to use |
| 92 | + compression to reduce the size of the file(s). |
| 93 | +* Include the information about the new dataset in the `README.md` file. |
| 94 | + |
| 95 | + |
| 96 | +[contrib]: https://github.com/fatiando/community/blob/main/CONTRIBUTING.md |
| 97 | +[coc]: https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md |
| 98 | +[fair]: https://www.go-fair.org/fair-principles/ |
0 commit comments