Wanjiku Nganga ; Ikechukwu Achebe - Spoken word corpus and dictionary definition for an African language

jdmdh:6703 - Journal of Data Mining & Digital Humanities, 2 décembre 2020, Numéro spécial sur la collecte, la préservation et la diffusion du patrimoine culturel menacé pour de nouvelles compréhensions grâce à des approches multilingues - https://doi.org/10.46298/jdmdh.6703
Spoken word corpus and dictionary definition for an African languageArticle

Auteurs : Wanjiku Nganga 1; Ikechukwu Achebe 2

  • 1 School of Computing & Informatics, University of Nairobi, Kenya
  • 2 Nnamdi Azikiwe University

The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures.


Volume : Numéro spécial sur la collecte, la préservation et la diffusion du patrimoine culturel menacé pour de nouvelles compréhensions grâce à des approches multilingues
Rubrique : Humanités numériques en langues
Publié le : 2 décembre 2020
Accepté le : 2 décembre 2020
Soumis le : 7 août 2020
Mots-clés : audio corpus,dictionary definition,digital resources,Igbo,oral tradition,under-resourced languages,[INFO]Computer Science [cs],[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL],[SHS.MUSEO]Humanities and Social Sciences/Cultural heritage and museology,[SHS.LANGUE]Humanities and Social Sciences/Linguistics

Statistiques de consultation

Cette page a été consultée 2069 fois.
Le PDF de cet article a été téléchargé 591 fois.