A template repository for publishing an ecoacoustics or bioacoustics recognizer
This template is an attempt to set up a standard layout for publishing recognizers.
You should fork (make a copy) of this repository. When it is forked, you'll get your own copy, owned by you, that you can change.
You can also start a new repository using this template by clicking the button that says Use this template and then selecting Create a new repository.
After forking this repository you can make your copy private. See setting repository visibility.
That's up to you. If make your repository private, only you have access.
When it is time to publish you recognizer, you'll need to choose an appropriate license. This repository by default uses the Apache 2.0 license but there are a number of suitable licenses available. See choosing a license.
Other good choices are:
- the Creative Commons Attribution-ShareAlike 4.0 International license
- This is a better choice for your data rather than your code. Different licenses are good for different things. The choosing a license link can help you choose.
- and the Academic Free License 3.0
The citation file is a new standard recognized by GitHub and other tools as the place to put citation information. You should modify the citation file in your forked repository so that the information is correct.
See help about CITATION files.
You can clone your repository. See cloning a repository.
We recommend using GitHub Desktop.
Yeah, but you can delete it! Open the README.md file now can delete this sentence.
It is! This template is a work in progress. You can help by adding more documentation or by suggestion improvements.
Head on over to the discussions tab and ask us a question!
.
├── LICENSE.md - the license for this recognizer
├── README.md - the first page people see when they visit the repository
├── CITATION.cff - citation information for this repository
├── src - [optional] if you want to publish code with your recognizer,
│ put it in this folder
├── artifacts - [optional] if you have a trained model or other artifacts
│ produced while developing your recognizer, put them in this folder
├── data - contains or describes your data set
│ ├── training
│ │ ├── xxx - the name of the species or target you are training on
│ │ ├── yyy - [optional] further folders containing training samples
│ │ └── zzz
│ ├── test
│ │ ├── xxx - the name of the species or target you are evaluating your recognizer against
│ │ ├── yyy - [optional] further folders containing testing samples
│ │ └── zzz
| └── README.md - information on the included datasets or on how to obtain them
Storing data in a repository is not always the right choice. See the Tips for audio data section below.
In each folder where it is relevant you should include:
- Small sets of audio samples
- A README.md containing
- provenance of any data included
- instructions on how to obtain more data
- Any scripts needed to download data from remote repositories
You don't have to store your audio data in your repository. If you don't, you need to ensure that your data is accessible and stored in an appropriate place.
You can store small datasets in this repository.
Don't add audio files directly to Git, rather, use Git-LFS which makes it look like you are adding files directly to Git. GitHub offers 1GB of bandwidth per month per user for free. I'd suggest keeping no more than 100MiB of data directly in the repository.
For larger datasets you can use:
- An ecoacoustics repository
- like Ecosounds, the A2O, or ...others...
- A bioacoustics repository
- like Xeno Canto
- Cloud storage options like DropBox, OneDrive, etc.
- Commercial services like Amazon S3, Google Cloud Storage, etc.
Egret, included in this template, can download samples from the internet for you.