ArCADE

ArCADE: An Arabic Corpus of Auditory Dictation Errors

Introduction

The Arabic Corpus of Auditory Dictation Errors (ArCADE) is a corpus of Arabic words as transcribed by 62 native English speakers learning Arabic. This corpus is designed to assist researchers in investigating non-native spelling errors in Arabic, and particularly for spelling errors due to listening difficulties. Unlike error corpora collected from non-native Arabic writing samples, it is designed to elicit spelling errors arising from perceptual errors. A principal purpose for creating the corpus was to aid in the development and evaluation of tools for detecting and correcting listening errors to aid in dictionary lookup of words learners encountered in spoken language (cf. Rytting et al., 2010).

The ArCADE corpus was created through an elicitation experiment, similar in structure to an American-style spelling test. The principal difference (other than the language) is that in this case, the participants are expected to be unfamiliar with the words, and thus forced to rely on what they hear in the moment, rather than their lexical knowledge. Participants listened to 261 words presented over headphones and wrote their responses to the audio stimuli on a response sheet that contained numbered boxes. They were asked to use Arabic orthography with full diacritics and short vowels (fatha, damma, kasra, shadda, and sukun).

While the stimuli words were specifically chosen to facilitate the study of non-glide consonants (for which the mapping between orthography and phonology is relatively straightforward), we hope that the corpus will prove useful for studies beyond its original design.

Copyright

License

Please see the LICENSE file for how to license this corpus. If you have any questions, contact the University of Maryland (UMD) Office of Technology Commercialization (OTC): Office of Technology Commercialization 2130 Mitchell Building University of Maryland College Park, MD 20742 Phone: 301-405-3947 | Fax: 301-314-9502 Email: [email protected] http://www.otc.umd.edu/

How to cite

When referencing this corpus, please cite the following paper:

https://www.academia.edu/37578270/ArCADE_An_Arabic_Corpus_of_Auditory_Dictation_Errors
MLA: Rytting, C. Anton, Paul Rodrigues, Tim Buckwalter, Valerie Novak, Aric Bills, Noah H. Silbert, and Mohini Madgavkar. “ArCADE: An Arabic Corpus of Auditory Dictation Errors.” Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications. 2014.
APA: Rytting, C.A., Rodrigues, P., Buckwalter, T., Novak, V., Bills, A., Silbert, N.H., & Madgavkar, M. (2014, June). ArCADE: An Arabic Corpus of Auditory Dictation Errors. In Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications.
BibTeX: @inproceedings{rytting_etal:2014bea, title={ArCADE: An Arabic Corpus of Auditory Dictation Errors}, author={C. Anton Rytting and Paul Rodrigues and Tim Buckwalter and Valerie Novak and Aric Bills and Noah H. Silbert and Mohini Madgavkar}, booktitle={Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications}, year={2014} }

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
audio		audio
scanned_responses		scanned_responses
LICENSE		LICENSE
README.docx		README.docx
README.md		README.md
README.pdf		README.pdf
arcade.xml		arcade.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArCADE

ArCADE: An Arabic Corpus of Auditory Dictation Errors

Introduction

Copyright

License

How to cite

About

Releases

Packages

License

paulrodrigues/ArCADE

Folders and files

Latest commit

History

Repository files navigation

ArCADE

ArCADE: An Arabic Corpus of Auditory Dictation Errors

Introduction

Copyright

License

How to cite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages