Skip to content

ML models for exploring infant vocabulary development using WordBank data.

Notifications You must be signed in to change notification settings

RodDalBen/edx_wordbank

Repository files navigation

HarvardX PH125.9x - Data Science: Capstone

In this project, I explore the WordBank open database of children's vocabulary development and growth. The database contains data from 75,000+ kids from 25+ languages. I use machine learning algorithms (i.e., regression trees, random forests, linear regressions) to investigate potential relationships between demographic/linguistic variables (our predictors) and vocabulary growth, as measured by productive vocabulary (our outcome measure) on the The MacArthur-Bates Communicative Development Inventories. All analyses are exploratory in nature and no hypotheses or predictions are made. First, I curate the wordbank dataset, moving to descriptive analyses and visualizations, and finally to the machine learning algorithms.

This repository contains:

  • PDF report (knitted from Rmd)
  • Rmd script
  • R script
  • Reference list bibtex

For more information or questions, please e-mail me at [email protected]

About

ML models for exploring infant vocabulary development using WordBank data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published