Numéro spécial sur la collecte, la préservation et la diffusion du patrimoine culturel menacé pour de nouvelles compréhensions grâce à des approches multilingues

Le réservoir existant de traductions de textes littéraires appartenant au domaine public, une fois repéré et numérisé, offre une nouvelle richesse de ressources linguistiques pour soutenir et sauver les langues en danger et nous aider à cartographier la circulation et la réception mondiales des textes.

1. ekdosis: Using LuaLaTeX for Producing TEI xml Compliant Critical Editions and Highlighting Parallel Writings

Robert Alessi.

ekdosis is a LuaLaTeX package written by R. Alessi designed for multilingual critical editions. It can be used to typeset texts and different layers of critical notes in any direction accepted by LuaTeX. Texts can be arranged in running paragraphs or on facing pages, in any number of columns which in turn can be synchronized or not. Database-driven encoding under LaTeX allows extraction of texts entered segment by segment according to various criteria: main edited text, variant readings, translations or annotated borrowings between texts. In addition to printed texts, ekdosis can convert .tex source files so as to produce TEI xml compliant critical editions. It will be published under the terms of the GNU General Public License (GPL) version 3.

Rubrique : Visualisation de l'intertextualité et de la réutilisation des textes

2. Individual vs. Collaborative Methods of Crowdsourced Transcription

Samantha Blickhan ; Coleman Krawczyk ; Daniel Hanson ; Amy Boyer ; Andrea Simenstad ; Victoria van Hyning.

While online crowdsourced text transcription projects have proliferated in the last decade, there is a need within the broader field to understand differences in project outcomes as they relate to task design, as well as to experiment with different models of online crowdsourced transcription that have not yet been explored. The experiment discussed in this paper involves the evaluation of newly-built tools on the Zooniverse.org crowdsourcing platform, attempting to answer the research question: "Does the current Zooniverse methodology of multiple independent transcribers and aggregation of results render higher-quality outcomes than allowing volunteers to see previous transcriptions and/or markings by other users? How does each methodology impact the quality and depth of analysis and participation?" To answer these questions, the Zooniverse team ran an A/B experiment on the project Anti-Slavery Manuscripts at the Boston Public Library. This paper will share results of this study, and also describe the process of designing the experiment and the metrics used to evaluate each transcription method. These include the comparison of aggregate transcription results with ground truth data; evaluation of annotation methods; the time it took for volunteers to complete transcribing each dataset; and the level of engagement with other project elements such as posting on the message board or reading supporting documentation. Particular focus will be given to the (at times) […]

3. A Collaborative Ecosystem for Digital Coptic Studies

Caroline T. Schroeder ; Amir Zeldes.

Scholarship on underresourced languages bring with them a variety of challenges which make access to the full spectrum of source materials and their evaluation difficult. For Coptic in particular, large scale analyses and any kind of quantitative work become difficult due to the fragmentation of manuscripts, the highly fusional nature of an incorporational morphology, and the complications of dealing with influences from Hellenistic era Greek, among other concerns. Many of these challenges, however, can be addressed using Digital Humanities tools and standards. In this paper, we outline some of the latest developments in Coptic Scriptorium, a DH project dedicated to bringing Coptic resources online in uniform, machine readable, and openly available formats. Collaborative web-based tools create online 'virtual departments' in which scholars dispersed sparsely across the globe can collaborate, and natural language processing tools counterbalance the scarcity of trained editors by enabling machine processing of Coptic text to produce searchable, annotated corpora.

4. Spoken word corpus and dictionary definition for an African language

Wanjiku Nganga ; Ikechukwu Achebe.

The preservation of languages is critical to maintaining and strengthening the cultures and identities of communities, and this is especially true for under-resourced languages with a predominantly oral culture. Most African languages have a relatively short literary past, and as such the task of dictionary making cannot rely on textual corpora as has been the standard practice in lexicography. This paper emphasizes the significance of the spoken word and the oral tradition as repositories of vocabulary, and argues that spoken word corpora greatly outweigh the value of printed texts for lexicography. We describe a methodology for creating a digital dialectal dictionary for the Igbo language from such a spoken word corpus. We also highlight the language technology tools and resources that have been created to support the transcription of thousands of hours of Igbo speech and the subsequent compilation of these transcriptions into an XML-encoded textual corpus of Igbo dialects. The methodology described in this paper can serve as a blueprint that can be adopted for other under-resourced languages that have predominantly oral cultures.

Rubrique : Humanités numériques en langues

5. Linguistic Fingerprints on Translation's Lens

J.D. Porter ; Yulia Ilchuk ; Quinn Dombrowski.

What happens to the language fingerprints of a work when it is translated into another language? While translation studies has often prioritized concepts of equivalence (of form and function), and of textual function, digital humanities methodologies can provide a new analytical lens onto ways that stylistic traces of a text's source language can persist in a translated text. This paper presents initial findings of a project undertaken by the Stanford Literary Lab, which has identified distinctive grammatical features in short stories that have been translated into English. While the phenomenon of "translationese" has been well established particularly in corpus translation studies, we argue that digital humanities methods can be valuable for identifying specific traits for a vision of a world atlas of literary style.

Rubrique : Projet