A simple R script to download and tidy trial-level lexical decision data from the OSF page for the English Lexicon Project (ELP).
The trial-level data is available on the OSF page https://osf.io/eu5ca/, but the data format is a pain to work with, and there a few cases of false starts of sessions, data errors, or inconsistent formatting. There is also a dead link in the wiki to a script for reading the data into R.
This script, trial-level-ldt.R
, downloads the data from the OSF page, and produces a single dataframe with one trial per row. Demographics and additional information about each subject are also stored in this trial-level dataframe. The script makes a hacky attempt to standardise the date of birth information from the original data (which was entered manually by participants) with the read_elp_date()
function, and recodes the universities from numeric representations into their full names.
By default the dataframe will be written to elp.csv
(452 MB).
The following columns are created. I tried to keep original column names where possible.
column | explanation |
---|---|
Univ | The number assigned to the university. |
Univ_Name | The name of the university. |
Date | Date of data collection. |
Time | Time of data collection. |
Orig_Subject | Original subject IDs (Subject ). Some are reused across different testing locations. |
Subject_ID | Fixed subject IDs (paste(Univ, Orig_Subject) ) with 1 ID per individual participant. |
DOB | Date of birth (standardised format). |
Education | Years of education. |
Trial_Order | The number of this trial for this participant. |
Item_Serial_Number | An item ID number. |
Lexicality | 0 (nonword) or 1 (word). |
Lexicality_label | "nonword" or "word". |
Accuracy | Accuracy of response. Mostly 0 (incorrect) and 1 (correct). |
LDT_RT | Response time in milliseconds. |
Item | The text displayed to the participant. |
Session_nr | The number of the session (assuming sets of csv values signify separate sessions). |
Gender | Recorded participant gender. |
Task | The task completed (all LDT, but may be useful if joining to naming data). |
Date_Demog | The date associated with the participant's demographics data. |
Time_Demog | The time associated with the participant's demographics data. |
MEQ | Score from the Morningness-Eveningness Questionnaire |
Shipley_numCorrect | Score from the Shipley Institute of Living Scale. |
Shipley_rawScore | Score from the Shipley Institute of Living Scale. |
Shipley_vocabAge | Score from the Shipley Institute of Living Scale. |
Shipley_shipTime | Score from the Shipley Institute of Living Scale. |
Shipley_readTime | Score from the Shipley Institute of Living Scale. |
presHealth | A Likert Rating (1-7) of the participant's present health(?) |
pastHealth | A Likert Rating (1-7) of the participant's past health(?) |
vision | A Likert Rating (1-7) of the participant's vision(?) |
hearing | A Likert Rating (1-7) of the participant's hearing(?) |
firstLang | The participant's first language. |
file | The name of the file associated with the data in the OSF. |