This repository contains the 2018 public-facing ACQDIV database that was used in our paper on frequent frames:
Moran, Steven, Damián E. Blasi, Robert Schikowski, Aylin C. Küntay, Barbara Pfeiler, Shanley Allen and Sabine Stoll. 2018. A universal cue for grammatical categories in the input to children: frequent frames. Cognition, 175, 131–140. DOI:
Please find the latest public-facing ACQDIV database now archived in Zenodo:
Below is the overview to the 2018 (deprecated) version.
This repository hosts the public-facing ACQDIV database. Currently, it includes longitudinal child language acquisition corpora in the ACQDIV database format for:
- Indonesian (Gil & Tadmor, 2007)
- Japanese MiiPro (Miyata & Nisisawa 2009, Nisisawa & Miyata 2009, Miyata & Nisisawa 2010, Nisisawa & Miyata 2010, Miyata 2012)
- Japanese Miyata (Miyata 2004a,b,c, 2012)
- Sesotho (Demuth 1992, 2015)
If you use our database or additional annotations in your research, please cite it as:
Moran, Steven, Robert Schikowski, Danica Pajović, Cazim Hysi and Sabine Stoll. 2016. The ACQDIV Database: Min(d)ing the Ambient Language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4423–4429. May 23-28, Portorož, Slovenia. Online:
These corpora are available in various original input formats (e.g. CHAT, CHAT-XML) via the Child Language Data Exchange System (CHILDES) component of the TalkBank system and are made openly available under the Creative Commons license BY-NC-SA 3.0..
We have converted the original data formats into easily accessible tables and we have enriched the annotation data, in particular at the morpheme level. We also provide a linguistically-informed subset of grammatical classes for cross-linguistic research. For detailed information about the corpora and the data structures that we use, see the ACQDIV corpus manual.
The ground rules for using corpora from TalkBank and CHILDES are stipulated here and the use of individual corpora used in research should be cited accordingly:
The ACQDIV database also contains privately owned longitudinal child language acquisition corpora from languages that were selected from five clusters calculated via maximum diversity sampling (Stoll & Bickel, 2013) to achieve a typologically maximally diverse language sample:
- Chintang (Stoll et al. 2015)
- Cree (Brittain 2015)
- Inuktitut (Allen Unpublished)
- Russian (Stoll & Meyer 2008)
- Turkish (Küntay et al. Unpublished)
- Yucatec (Pfeiler Unpublished)
Access to these corpora is restricted by the project's Terms of Agreement. Contact Prof. Sabine Stoll for more information.
The research leading to these results has received funding from the European Unions Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 615988 (PI Sabine Stoll).
Allen, Shanley. Unpublished. Allen Inuktitut Child Language Corpus.
Brittain, Julie. Corpus of the Chisasibi Child Language Acquisition Study (CCLAS).
Demuth, Katherine. Demuth Sesotho Corpus.
Demuth, Katherine. 1992. Acquisition of Sesotho. In Dan Slobin (ed.), The Cross-Linguistic Study of Language Acquisition, vol. 3, 557-638. Hillsdale, N.J.: Lawrence Erlbaum Associates.
Gil, David & Uri Tadmor. 2007. The MPI-EVA Jakarta Child Language Database. A joint project of the Department of Linguistics, Max Planck Institute for Evolutionary Anthropology and the Center for Language and Culture Studies, Atma Jaya Catholic University.
Küntay, Aylin Copty, Dilara Koçbaş, Süleyman Sabri Taşçı. Unpublished. Koç University Longitudinal Language Development Database on language acquisition of 8 children from 8 to 36 months of age.
Miyata, Susanne. 2004. Aki Corpus. Pittsburgh, PA: TalkBank. 1-59642-055-3.
Miyata, Susanne. 2004. Ryo Corpus. Pittsburgh, PA: TalkBank. 1-59642-056-1.
Miyata, Susanne. 2004. Tai Corpus. Pittsburgh, PA: TalkBank. 1-59642-057-X.
Miyata, Susanne. 2012. Japanese CHILDES: The 2012 CHILDES manual for Japanese.
Miyata, Susanne & Hiro Yuki Nisisawa. 2009. MiiPro – Asato Corpus. Pittsburgh, PA: TalkBank.
Miyata, Susanne & Hiro Yuki Nisisawa. 2010. MiiPro – Tomito Corpus. Pittsburgh, PA: TalkBank.
Miyata, Susanne. 2012. Japanese CHILDES: The 2012 CHILDES manual for Japanese.
Nisisawa, Hiro Yuki & Susanne Miyata. 2009. MiiPro – Nanami Corpus. Pittsburgh, PA: TalkBank.
Nisisawa, Hiro Yuki & Susanne Miyata. 2010. MiiPro – ArikaM Corpus. Pittsburgh, PA: TalkBank.
Pfeiler, Barbara. Unpublished. Pfeiler Yucatec Child Language Corpus.
Stoll, Sabine & Bickel, Balthasar. 2013. Capturing diversity in language acquisition research. Language Typology and Historical Contingency: In Honor of Johanna Nichols. Amsterdam: John Benjamins, pages 195–216.
Stoll, Sabine & Roland Meyer. 2008. Audio-visual longitudinal corpus on the acquisition of Russian by 5 children.
Stoll, Sabine, Elena Lieven, Goma Banjade, Toya Nath Bhatta, Martin Gaenszle, Netra P. Paudyal, Manoj Rai, Novel Kishor Rai, Ichchha P. Rai, Taras Zakharko, Robert Schikowski & Balthasar Bickel. 2015. Audiovisual corpus on the acquisition of Chintang by six children.