Transcribing Medieval Manuscripts for Machine Learning

Estelle Guéville; David Joseph Wrisley

doi:10.46298/jdmdh.9805

Estelle Guéville ; David Joseph Wrisley - Transcribing Medieval Manuscripts for Machine Learning

jdmdh:9805 - Journal of Data Mining & Digital Humanities, 2 juillet 2024, On the Way to the Future of Digital Manuscript Studies - https://doi.org/10.46298/jdmdh.9805

Transcribing Medieval Manuscripts for Machine LearningArticle

Auteurs : Estelle Guéville ¹; David Joseph Wrisley ²

This article focuses on the transcription of medieval manuscripts. Whereas problems of transcription have long interested medievalists, few workable options in the era of printed editions were available besides normalisation.
The automation of this process, known as handwritten text recognition (HTR), has made new kinds of digital text creation possible, but also has foregrounded the necessity of theorising transcription in our scholarly practices. We reflect here on different notions of transcription against the backdrop of changing text technologies. Moreover, drawing on our own research on medieval Latin Bibles, we present general guidelines for customizing transcription schemes, arguing that they must be designed with specific research questions and scholarly end use in mind. Since we are particularly interested in the scribal contribution to the production of codices, our transcription guidelines aim to capture abbreviations and orthographic variation between different textual witnesses for downstream machine learning tasks. In the final section of the article, we discuss a few examples of how the HTR-created transcriptions allow us to address new questions at scale in medieval manuscripts, such as textual variance across witnesses, the prediction of a change in scribal hands within a single manuscript as well as the profiling of individual and regional scribal characteristics.

https://doi.org/10.46298/jdmdh.9805

Source : arXiv.org:2207.07726

Volume : On the Way to the Future of Digital Manuscript Studies

Publié le : 2 juillet 2024

Accepté le : 1 mai 2023

Soumis le : 19 juillet 2022

Mots-clés : Computer Science - Digital Libraries

Licence : Attribution - Pas d'Utilisation Commerciale - Partage dans les Mêmes Conditions 4.0 International (CC BY-NC-SA 4.0)

Datasets

Référence

Wrisley, D. J., & Guéville, E. (2022). List of Paris Bibles in the World (Version 1.0) [Dataset]. Zenodo. 10.5281/ZENODO.7274506 ¹

Wrisley, D. J., & Guéville, E. (2022). List of Paris Bibles in the World (Version 1.0) [Dataset]. Zenodo. 10.5281/ZENODO.7274507 ¹

Est lié à

Pinche, A., Camps, J.-B., & Clérice, T. (2019). Stylometry for Noisy Medieval Data: Evaluating Paul Meyer’s Hagiographic Hypothesis (Versions 2.0) [Dataset]. DataverseNL. 10.34894/F9KSXJ ¹

1 ScholeXplorer

Références bibliographiques

3 Documents citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 1876 fois.

Le PDF de cet article a été téléchargé 852 fois.