Skip to content

ArCADE: An Arabic Corpus of Auditory Dictation Errors

License

Notifications You must be signed in to change notification settings

paulrodrigues/ArCADE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArCADE

ArCADE: An Arabic Corpus of Auditory Dictation Errors

Introduction

The Arabic Corpus of Auditory Dictation Errors (ArCADE) is a corpus of Arabic words as transcribed by 62 native English speakers learning Arabic. This corpus is designed to assist researchers in investigating non-native spelling errors in Arabic, and particularly for spelling errors due to listening difficulties. Unlike error corpora collected from non-native Arabic writing samples, it is designed to elicit spelling errors arising from perceptual errors. A principal purpose for creating the corpus was to aid in the development and evaluation of tools for detecting and correcting listening errors to aid in dictionary lookup of words learners encountered in spoken language (cf. Rytting et al., 2010).

The ArCADE corpus was created through an elicitation experiment, similar in structure to an American-style spelling test. The principal difference (other than the language) is that in this case, the participants are expected to be unfamiliar with the words, and thus forced to rely on what they hear in the moment, rather than their lexical knowledge. Participants listened to 261 words presented over headphones and wrote their responses to the audio stimuli on a response sheet that contained numbered boxes. They were asked to use Arabic orthography with full diacritics and short vowels (fatha, damma, kasra, shadda, and sukun).

While the stimuli words were specifically chosen to facilitate the study of non-glide consonants (for which the mapping between orthography and phonology is relatively straightforward), we hope that the corpus will prove useful for studies beyond its original design.

Copyright

This corpus is copyright 2014 University of Maryland.

License

Please see the LICENSE file for how to license this corpus. If you have any questions, contact the University of Maryland (UMD) Office of Technology Commercialization (OTC): Office of Technology Commercialization 2130 Mitchell Building University of Maryland College Park, MD 20742 Phone: 301-405-3947 | Fax: 301-314-9502 Email: [email protected] http://www.otc.umd.edu/

How to cite

When referencing this corpus, please cite the following paper:

  • https://www.academia.edu/37578270/ArCADE_An_Arabic_Corpus_of_Auditory_Dictation_Errors
  • MLA: Rytting, C. Anton, Paul Rodrigues, Tim Buckwalter, Valerie Novak, Aric Bills, Noah H. Silbert, and Mohini Madgavkar. “ArCADE: An Arabic Corpus of Auditory Dictation Errors.” Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications. 2014.
  • APA: Rytting, C.A., Rodrigues, P., Buckwalter, T., Novak, V., Bills, A., Silbert, N.H., & Madgavkar, M. (2014, June). ArCADE: An Arabic Corpus of Auditory Dictation Errors. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications.
  • BibTeX: @inproceedings{rytting_etal:2014bea, title={ArCADE: An Arabic Corpus of Auditory Dictation Errors}, author={C. Anton Rytting and Paul Rodrigues and Tim Buckwalter and Valerie Novak and Aric Bills and Noah H. Silbert and Mohini Madgavkar}, booktitle={Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications}, year={2014} }