Suggested citation: XXX.
The dataset is structured according to the ChildProject package standards detailed here.
Number of participants, general info on them.
Number of recordings, specificities.
What sort of annotations are presents.
We strongly recommend the use of the converted/
versions of the annotations, to avoid issues of time stamping and category assignment.
keep relevant, add new ones if necessary
- alice: automated counts of phonemes, syllables, and words done by ALICE; all recordings have been analyzed with this system
- its: automated analyses using the LENA software; only a small proportion of recordings (those gathered with a LENA device) have this annotation
- vcm: automated analyses aimed at categorization the key child's vocalizations into: canonical, non-canonical, crying, and other (which includes both "junk" = not the child at all; and laughing, which occurred infrequently), done using VCM; all recordings have been analyzed with this system
- vtc: automated analyses that distinguish key child, other children, male adult, female adult, using VTC; all recordings have been analyzed with this system
explain what each set of manual annotation is and how it was produced Example from tsimane2017:
- eaf_2021: A small number of recordings were selected for coding by two speech-and-language students unfamiliar with the language and the families recorded, who annotated randomly sampled 15-second sections, skipping any that failed to contain speech by male adults or other children, following this coding manual, derived from the ACLEW Annotation Scheme, to do only segmentation.
To gain access to the data, please email XXXX? or Homebank or ... ?
You will first need to install the ChildProject package as well as DataLad. Instructions to install these packages can be found here.
This step should only be done once for all.
- Copy your SSH public key to your clipboard (usually located in ~/.ssh/id_rsa.pub). If you don't have one, please create one following these instructions.
- In your browser, go to GIN > Your parameters > SSH keys.
- Click on the blue "Add a key" button, then paste the content of your public key in the Content field, and submit.
Your key should now appear in your list of SSH keys - you can add as many as necessary.
The next step is to clone the dataset :
datalad install [email protected]:/LAAC-LSCP/XXX.git
cd XXX
You can get data from a dataset using the datalad get
command, e.g.:
datalad get recordings/converted/ # download converted recordings
datalad get annotations/*/converted/ # get converted annotations
Or:
datalad get . # get everything
You can download many files in parallel using the -J or --jobs parameters:
datalad get . -J 4 # get everything, with 4 parallel transfers
For more help with using DataLad, please refer to our cheatsheet or DataLad's own cheatsheet. If this is not enough, check DataLad's documentation and Handbook.
If you are notified of changes to the data, please retrieve them by issuing the following commands:
datalad update --merge
datalad get .
It is important that you delete the data once your project is complete.
This can be done with datalad remove
:
datalad remove -r path/to/your/dataset
Maintainers should install the dataset from LAAC-LSCP and run the setup procedure as follows:
datalad install [email protected]:/LAAC-LSCP/tsimane2017.git
cd tsimane2017
datalad run-procedure setup --public --confidential
Changes should be pushed to origin, that will trigger a push to the others:
datalad push
History of the dataset, add an entry to explain what was done at this point in time, example (from tsimane2017):
- 2022-04 Alex Cristia processed files from Camila Scaff, resulting in data from XX children. These data came from copies of the back-up drives Lacie, hand-written notes, and re-listening to the audio files to determine date and child ID. Some of the processing was done by hand; other via a script called
gen-recordings.R
, which used to live in a dropbox folder shared between Alex and Camila.