Skip to content

LAAC-LSCP/el1000-template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset Name

Suggested citation: XXX.

Data description

Data documentation

The dataset is structured according to the ChildProject package standards detailed here.

Participants & recordings

Number of participants, general info on them.

Number of recordings, specificities.

Available annotations

What sort of annotations are presents.

We strongly recommend the use of the converted/ versions of the annotations, to avoid issues of time stamping and category assignment.

Automated annotations

keep relevant, add new ones if necessary

  • alice: automated counts of phonemes, syllables, and words done by ALICE; all recordings have been analyzed with this system
  • its: automated analyses using the LENA software; only a small proportion of recordings (those gathered with a LENA device) have this annotation
  • vcm: automated analyses aimed at categorization the key child's vocalizations into: canonical, non-canonical, crying, and other (which includes both "junk" = not the child at all; and laughing, which occurred infrequently), done using VCM; all recordings have been analyzed with this system
  • vtc: automated analyses that distinguish key child, other children, male adult, female adult, using VTC; all recordings have been analyzed with this system

Manual annotations

explain what each set of manual annotation is and how it was produced Example from tsimane2017:

  • eaf_2021: A small number of recordings were selected for coding by two speech-and-language students unfamiliar with the language and the families recorded, who annotated randomly sampled 15-second sections, skipping any that failed to contain speech by male adults or other children, following this coding manual, derived from the ACLEW Annotation Scheme, to do only segmentation.

Getting access to the data

To gain access to the data, please email XXXX? or Homebank or ... ?

Re-using the dataset

Requirements

You will first need to install the ChildProject package as well as DataLad. Instructions to install these packages can be found here.

Configuring your SSH key on GIN

This step should only be done once for all.

  1. Copy your SSH public key to your clipboard (usually located in ~/.ssh/id_rsa.pub). If you don't have one, please create one following these instructions.
  2. In your browser, go to GIN > Your parameters > SSH keys.
  3. Click on the blue "Add a key" button, then paste the content of your public key in the Content field, and submit.

Your key should now appear in your list of SSH keys - you can add as many as necessary.

Installing the dataset

The next step is to clone the dataset :

datalad install [email protected]:/LAAC-LSCP/XXX.git
cd XXX

Getting data

You can get data from a dataset using the datalad get command, e.g.:

datalad get recordings/converted/ # download converted recordings
datalad get annotations/*/converted/ # get converted annotations

Or:

datalad get . # get everything

You can download many files in parallel using the -J or --jobs parameters:

datalad get . -J 4 # get everything, with 4 parallel transfers

For more help with using DataLad, please refer to our cheatsheet or DataLad's own cheatsheet. If this is not enough, check DataLad's documentation and Handbook.

Fetching updates

If you are notified of changes to the data, please retrieve them by issuing the following commands:

datalad update --merge
datalad get .

Removing the data

It is important that you delete the data once your project is complete. This can be done with datalad remove:

datalad remove -r path/to/your/dataset

Maintainers

Maintainers should install the dataset from LAAC-LSCP and run the setup procedure as follows:

datalad install [email protected]:/LAAC-LSCP/tsimane2017.git
cd tsimane2017
datalad run-procedure setup --public --confidential

Changes should be pushed to origin, that will trigger a push to the others:

datalad push

History

History of the dataset, add an entry to explain what was done at this point in time, example (from tsimane2017):

  • 2022-04 Alex Cristia processed files from Camila Scaff, resulting in data from XX children. These data came from copies of the back-up drives Lacie, hand-written notes, and re-listening to the audio files to determine date and child ID. Some of the processing was done by hand; other via a script called gen-recordings.R, which used to live in a dropbox folder shared between Alex and Camila.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages