La reconnaissance de l'écriture pour les manuscrits documentaires du Moyen Âge

Sergio Torres Aguilar; Vincent Jolivet

doi:10.46298/jdmdh.10484

Sergio Torres Aguilar ; Vincent Jolivet - La reconnaissance de l'écriture pour les manuscrits documentaires du Moyen Âge

jdmdh:10484 - Journal of Data Mining & Digital Humanities, 22 décembre 2023, Documents historiques et reconnaissance automatique de texte - https://doi.org/10.46298/jdmdh.10484

La reconnaissance de l'écriture pour les manuscrits documentaires du Moyen ÂgeArticle

Auteurs : Sergio Torres Aguilar ^1,^2,³; Vincent Jolivet ^4,⁵

1 Université du Luxembourg
2 Centre Jean Mabillon
3 Université du Luxembourg = University of Luxembourg = Universität Luxemburg [uni.lu]
4 Centre Jean Mabillon [ENC]
5 École nationale des chartes [ENC]

Handwritten Text Recognition (HTR) techniques aim to accurately recognize sequences of characters in input manuscript images by training artificial intelligence models to capture historical writing features. Efficient HTR models can transform digitized manuscript collections into indexed and quotable corpora, providing valuable research insight for various historical inquiries. However, several challenges must be addressed, including the scarcity of relevant training corpora, the consequential variability introduced by different scribal hands and writing scripts, and the complexity of page layouts. This paper presents two models and one cross-model approach for automatic transcription of Latin and French medieval documentary manuscripts, particularly charters and registers, written between the 12th and 15th centuries and classified into two major writing scripts: Textualis (from the late-11th to 13th century) and Cursiva (from the 13th to the 15th century). The architecture of the models is based on a Convolutional Recurrent Neural Network (CRNN) coupled with a Connectionist Temporal Classification (CTC) loss. The training and evaluation of the models, involving 120k lines of text and almost 1M tokens, were conducted using three available ground-truth corpora : The e-NDP corpus, the Alcar-HOME database and the Himanis project. This paper describes the training architecture and corpora used, while discussing the main training challenges, results, and potential applications of HTR techniques on medieval documentary manuscripts.

https://doi.org/10.46298/jdmdh.10484

Source : HAL:hal-03892163v3

Volume : Documents historiques et reconnaissance automatique de texte

Publié le : 22 décembre 2023

Accepté le : 16 octobre 2023

Soumis le : 14 décembre 2022

Mots-clés : [SHS.HIST]Humanities and Social Sciences/History, [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI], [en] HTR for historical documents, HTR for medieval Latin manuscripts, digital diplomatics, medieval digital studies, HTR for medieval French manuscripts, medieval charters

Licence : Hal authorisation v1

Publications

autre

10.5281/zenodo.7547438

1 HAL

Datasets

Référence

Stutzmann, D., Torres Aguilar, S., & Chaffenet, P. (2021). HOME-Alcar: Aligned and Annotated Cartularies [Dataset]. Zenodo. 10.5281/ZENODO.5600883 ²

Stutzmann, D., Torres Aguilar, S., & Chaffenet, P. (2021). HOME-Alcar: Aligned and Annotated Cartularies [Dataset]. Zenodo. 10.5281/ZENODO.5600884 ²

Torres Aguilar, S., & Jolivet, V. (2023). Dataset and evaluation for HTR models for Latin and French Medieval Documentary Manuscripts (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.7401832 ²

Torres Aguilar, S., & Jolivet, V. (2023). Dataset and evaluation for HTR models for Latin and French Medieval Documentary Manuscripts (Version 0.1) [Dataset]. Zenodo. 10.5281/ZENODO.7401833 ²

Torres Aguilar, S., & Jolivet, V. (2023). HTR model for Latin and French Medieval Documentary Manuscripts (12th-15th) (Version 1) [Dataset]. Zenodo. 10.5281/ZENODO.7547438 ²

2 ScholeXplorer

Références bibliographiques

6 Documents citant cet article

Partager et exporter

Statistiques de consultation

Cette page a été consultée 2123 fois.

Le PDF de cet article a été téléchargé 775 fois.